We would also need to consider cross-language pipelines that (currently) assume the interaction with an expansion service at construction time.
On Thu, Aug 8, 2019, 4:38 PM Kyle Weaver <kcwea...@google.com> wrote: > > It might also be useful to have the option to just output the proto and > artifacts, as alternative to the jar file. > > Sure, that wouldn't be too big a change if we were to decide to go the SDK > route. > > > For the Flink entry point we would need to allow for the job server to > be used as a library. > > We don't need the whole job server, we only need to add a main method to > FlinkPipelineRunner [1] as the entry point, which would basically just do > the setup described in the doc then call FlinkPipelineRunner::run. > > [1] > https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java#L53 > > Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com > > > On Thu, Aug 8, 2019 at 4:21 PM Thomas Weise <t...@apache.org> wrote: > >> Hi Kyle, >> >> It might also be useful to have the option to just output the proto and >> artifacts, as alternative to the jar file. >> >> For the Flink entry point we would need to allow for the job server to be >> used as a library. It would probably not be too hard to have the Flink job >> constructed via the context execution environment, which would require no >> changes on the Flink side. >> >> Thanks, >> Thomas >> >> >> On Thu, Aug 8, 2019 at 9:52 AM Kyle Weaver <kcwea...@google.com> wrote: >> >>> Re Javaless/serverless solution: >>> I take it this would probably mean that we would construct the jar >>> directly from the SDK. There are advantages to this: full separation of >>> Python and Java environments, no need for a job server, and likely a >>> simpler implementation, since we'd no longer have to work within the >>> constraints of the existing job server infrastructure. The only downside I >>> can think of is the additional cost of implementing/maintaining jar >>> creation code in each SDK, but that cost may be acceptable if it's simple >>> enough. >>> >>> Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com >>> >>> >>> On Thu, Aug 8, 2019 at 9:31 AM Thomas Weise <t...@apache.org> wrote: >>> >>>> >>>> >>>> On Thu, Aug 8, 2019 at 8:29 AM Robert Bradshaw <rober...@google.com> >>>> wrote: >>>> >>>>> > Before assembling the jar, the job server runs to create the >>>>> ingredients. That requires the (matching) Java environment on the Python >>>>> developers machine. >>>>> >>>>> We can run the job server and have it create the jar (and if we keep >>>>> the job server running we can use it to interact with the running >>>>> job). However, if the jar layout is simple enough, there's no need to >>>>> even build it from Java. >>>>> >>>>> Taken to the extreme, this is a one-shot, jar-based JobService API. We >>>>> choose a standard layout of where to put the pipeline description and >>>>> artifacts, and can "augment" an existing jar (that has a >>>>> runner-specific main class whose entry point knows how to read this >>>>> data to kick off a pipeline as if it were a users driver code) into >>>>> one that has a portable pipeline packaged into it for submission to a >>>>> cluster. >>>>> >>>> >>>> It would be nice if the Python developer doesn't have to run anything >>>> Java at all. >>>> >>>> As we just discussed offline, this could be accomplished by including >>>> the proto that is produced by the SDK into the pre-existing jar. >>>> >>>> And if the jar has an entry point that creates the Flink job in the >>>> prescribed manner [1], it can be directly submitted to the Flink REST API. >>>> That would allow for Java free client. >>>> >>>> [1] >>>> https://lists.apache.org/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E >>>> >>>>