Luke, thanks for explaining the idea, that’s what I expected, I was a
bit confused after seeing that the proto transformation of the
Pipeline happens in runners/core-contruction-java. I thought this was
intended (maybe to keep the SDK smaller). So what is the future ? move
all this machinery into sdks/java/core or a different module and make
sdks/java/core depend on that ?

In any case this looks like a not public API matter. What I would like
is to be vigilant about the exposure of Protobuf stuff in the public
API to avoid some of the issues we have had in the past (and still
have on GCPIO). Do we have this kind of validation in place after the
move to gradle (like not expose in public APIs guava/protobuf) ?
On Wed, Jul 18, 2018 at 6:29 PM Lukasz Cwik <lc...@google.com> wrote:
>
> Ismael, the SDK should perform the pipeline translation to proto because I 
> expect the flow to be:
> User Code -> SDK -> Proto Translation -> Job API -> Runner
> I don't expect "runners" to live within the users process anymore (excluding 
> the direct runner). There will be one portable "runner" and it will be 
> responsible for communicating with the job management APIs. It shouldn't be 
> called a runner but for backwards compatibility it will behave like a runner 
> does today. Flink/Spark/... will all live on the other side of the job 
> management API.
>
> Thomas, I can run RemoteExectuionTest from commit 
> ae2bebaf8b277e99840fa63f1b95d828f2093d16 without needing to modify the 
> project/module structure in Intellij. Adding the jars manually only helps 
> with code completion.
>
> https://github.com/apache/beam/pull/5977 works around the duplicate content 
> root issue in Intellij. I have also run into the -Werror issue occasionally 
> and don't know any fix or why it gets triggered as it doesn't happen to me 
> all the time.
>
> On Tue, Jul 17, 2018 at 7:01 PM Thomas Weise <t...@apache.org> wrote:
>>
>> Thanks, the classpath order matters indeed.
>>
>> Still not able to run RemoteExecutionTest, but I was able to get the Flink 
>> portable test to work by adding the following to the top of the dependency 
>> list of beam-runners-flink_2.11_test
>>
>> vendor/sdks-java-extensions-protobuf/build/libs/beam-vendor-sdks-java-extensions-protobuf-2.6.0-SNAPSHOT.jar
>> model/fn-execution/build/libs/beam-model-fn-execution-2.6.0-SNAPSHOT.jar
>>
>>
>> On Tue, Jul 17, 2018 at 6:00 PM Ankur Goenka <goe...@google.com> wrote:
>>>
>>> Yes, I am able to run it.
>>>
>>> For tests, you also need to add dependencies to 
>>> ":beam-runners-java-fn-execution/beam-runners-java-fn-execution_test" 
>>> module.
>>>
>>> Also, I only added
>>> :beam-model-job-management-2.6.0-SNAPSHOT.jar
>>> :beam-model-fn-execution-2.6.0-SNAPSHOT.jar
>>> to the dependencies manually so not sure if you want to add
>>> io.grpc:grpc-core:1.12.0 and com.google.protobuf:protobuf-java:3.5.1 to the 
>>> dependencies.
>>>
>>> Note, you need to move them up in the dependencies list.
>>>
>>>
>>> On Tue, Jul 17, 2018 at 5:54 PM Thomas Weise <t...@apache.org> wrote:
>>>>
>>>> Are you able to run 
>>>> org.apache.beam.runners.fnexecution.control.RemoteExecutionTest from 
>>>> within Intellij ?
>>>>
>>>> I can get the compile errors to disappear by adding 
>>>> beam-model-job-management-2.6.0-SNAPSHOT.jar, io.grpc:grpc-core:1.12.0 and 
>>>> com.google.protobuf:protobuf-java:3.5.1
>>>>
>>>> Running the test still fails since other dependencies are missing.
>>>>
>>>>
>>>> On Tue, Jul 17, 2018 at 4:02 PM Ankur Goenka <goe...@google.com> wrote:
>>>>>
>>>>> For reference:
>>>>> I was able to make intellij work with the master by doing following steps
>>>>>
>>>>> Remove module :beam:vendor-sdks-java-extensions-protobuf from intellij.
>>>>> Adding 
>>>>> :beam-model-fn-execution/build/libs/beam-model-fn-execution-2.6.0-SNAPSHOT.jar
>>>>>  and 
>>>>> :beam-model-job-management/build/libs/beam-model-job-management-2.6.0-SNAPSHOT.jar
>>>>>  to the appropriate modules at the top of the dependency list.
>>>>>
>>>>>
>>>>> On Tue, Jul 17, 2018 at 2:29 PM Thomas Weise <t...@apache.org> wrote:
>>>>>>
>>>>>> Adding the external jar in Intellij (2018.1) currently fails due to a 
>>>>>> duplicate source directory (sdks/java/extensions/protobuf/src/main/java).
>>>>>>
>>>>>> The build as such also fails, with:  error: warnings found and -Werror 
>>>>>> specified
>>>>>>
>>>>>> Ismaël found removing 
>>>>>> https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L538
>>>>>>  as workaround.
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 12, 2018 at 1:55 PM Ismaël Mejía <ieme...@gmail.com> wrote:
>>>>>>>
>>>>>>> Seems reasonable, but why exactly may we need the model (or protobuf
>>>>>>> related things) in the future in the SDK ? wasn’t it supposed to be
>>>>>>> translated into the Pipeline proto representation via the runners (and
>>>>>>> in this case the dep reside in the runner side) ?
>>>>>>> On Thu, Jul 12, 2018 at 2:50 AM Lukasz Cwik <lc...@google.com> wrote:
>>>>>>> >
>>>>>>> > Got a fix[1] for Andrews issue which turned out to be a release 
>>>>>>> > blocker since it broke performing the release. Also fixed several 
>>>>>>> > minor things like javadoc that were wrong with the release. Solving 
>>>>>>> > it allowed me to do the publishing in parallel and cut the release 
>>>>>>> > time from 20+ mins to 8 mins on my machine.
>>>>>>> >
>>>>>>> > 1: https://github.com/apache/beam/pull/5936
>>>>>>> >
>>>>>>> > On Wed, Jul 11, 2018 at 3:51 PM Andrew Pilloud <apill...@google.com> 
>>>>>>> > wrote:
>>>>>>> >>
>>>>>>> >> We discussed this in person, sounds like my issue is known and will 
>>>>>>> >> be fixed shortly. I'm running builds with '-Ppublishing' because I 
>>>>>>> >> need to generate release artifacts for bundling the Beam SQL shell 
>>>>>>> >> with the Google Cloud SDK. Hope to eventually just use the Beam 
>>>>>>> >> release, but we are currently cutting a release off master every 
>>>>>>> >> week to quickly iterate on bug fixes.
>>>>>>> >>
>>>>>>> >> Andrew
>>>>>>> >>
>>>>>>> >> On Wed, Jul 11, 2018 at 1:39 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>>>> >>>
>>>>>>> >>> Andrew, to my knowledge it seems as though your running into 
>>>>>>> >>> BEAM-4744, is there a reason you need to specify -Ppublishing?
>>>>>>> >>>
>>>>>>> >>> No particular reason to using ByteString within ByteKey and 
>>>>>>> >>> TextSource. Note that we currently do shade away protobuf in 
>>>>>>> >>> sdks/java/core so we could either migrate to using a vendored 
>>>>>>> >>> version or re-implement the functionality to not use ByteString. 
>>>>>>> >>> Note that sdks/java/core can now dependend on the model/* classes 
>>>>>>> >>> and perform the Pipeline -> Proto translation as this will be 
>>>>>>> >>> needed to support portability efforts so I would prefer just 
>>>>>>> >>> migrating to use the vendored versions of the code. Filed BEAM-4766.
>>>>>>> >>>
>>>>>>> >>> As for the IO module, I was referring to the upstream 
>>>>>>> >>> bigtable/bigquery/... libraries vended by Google. If they trimmed 
>>>>>>> >>> their API surface to not expose gRPC or protobuf, then we wouldn't 
>>>>>>> >>> have to worry about having the shading logic within 
>>>>>>> >>> sdks/java/io/google-cloud-platform. I know that this will be 
>>>>>>> >>> impossible for some connectors without backwards incompatible 
>>>>>>> >>> changes since they exposed protobuf on their API surface. I know 
>>>>>>> >>> that Chamikara was looking to shade this away in the 
>>>>>>> >>> sdks/java/io/google-cloud-platform but only had limited success in 
>>>>>>> >>> the past.
>>>>>>> >>>
>>>>>>> >>> On Wed, Jul 11, 2018 at 1:14 PM Ismaël Mejía <ieme...@gmail.com> 
>>>>>>> >>> wrote:
>>>>>>> >>>>
>>>>>>> >>>> This is great news in particular for runners (Spark) where the 
>>>>>>> >>>> leaking of some grpc subdependencies caused stability issues and 
>>>>>>> >>>> required extra shading. Great !
>>>>>>> >>>>
>>>>>>> >>>> About the other modules
>>>>>>> >>>>
>>>>>>> >>>> > Note, these are the following modules that still depend on 
>>>>>>> >>>> > protobuf that are shaded away and could move to use a vendored 
>>>>>>> >>>> > variant of protobuf:
>>>>>>> >>>> > * sdks/java/core
>>>>>>> >>>> > * sdks/java/extensions/sql
>>>>>>> >>>>
>>>>>>> >>>> For sdks/java/core the dependency in protobuf seems to be minor, 
>>>>>>> >>>> from a quick look it seems that it is only used to import 
>>>>>>> >>>> ByteString in two classes: ByteKey and TextSource so hopefully we 
>>>>>>> >>>> can rewrite both and get rid of the dependency altogether (making 
>>>>>>> >>>> core smaller which is always a win).
>>>>>>> >>>> Can we fill a JIRA for this or do I miss other reasons to depend 
>>>>>>> >>>> on protobuf in core?
>>>>>>> >>>>
>>>>>>> >>>> For sdks/java/extensions/sql I don’t know if I am missing 
>>>>>>> >>>> something, but I don’t see any code use of protobuf and I doubt 
>>>>>>> >>>> that calcite uses protobuf so maybe it is there just because it 
>>>>>>> >>>> was leaking from somewhere else in Beam, we should better check 
>>>>>>> >>>> this first.
>>>>>>> >>>>
>>>>>>> >>>> > These modules expose protobuf because it is part of the API 
>>>>>>> >>>> > surface:
>>>>>>> >>>> > * sdks/java/extensions/protobuf
>>>>>>> >>>> > * sdks/java/io/google-cloud-platform (I believe that gRPC could 
>>>>>>> >>>> > be shaded here but preferrably the IO module would do it so we 
>>>>>>> >>>> > wouldn't have this maintenance burden.)
>>>>>>> >>>>
>>>>>>> >>>> Can you please elaborate on ‘but preferrably the IO module would 
>>>>>>> >>>> do it so we wouldn't have this maintenance burden’. I remember 
>>>>>>> >>>> there was an issue when running the examples in the spark runner 
>>>>>>> >>>> examples because of sdks/java/io/google-cloud-platform leaking 
>>>>>>> >>>> netty via gRPC (BEAM-3519) [Note that this is hidden at this 
>>>>>>> >>>> moment because of pure luck Spark 2.3.x and Beam are aligned on 
>>>>>>> >>>> netty version but this can change in the future so hopefully this 
>>>>>>> >>>> can be shaded/controlled].
>>>>>>> >>>>
>>>>>>> >>>> On Wed, Jul 11, 2018 at 8:55 PM Andrew Pilloud 
>>>>>>> >>>> <apill...@google.com> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> This is really cool and should cut down our artifact size 
>>>>>>> >>>>> significantly! Thanks Luke!
>>>>>>> >>>>>
>>>>>>> >>>>> I am running into one issue after this: builds with the 
>>>>>>> >>>>> publishing flag no longer work. (We run './gradlew -Ppublishing 
>>>>>>> >>>>> shadowJar' to generate release artifacts for the Beam SQL shell.) 
>>>>>>> >>>>> I get a bunch of errors like this:
>>>>>>> >>>>>
>>>>>>> >>>>> model/job-management/build/generated/source/proto/main/java/org/apache/beam/model/jobmanagement/v1/JobApi.java:148:
>>>>>>> >>>>>  error: no suitable method found for 
>>>>>>> >>>>> readMessage(org.apache.beam.vendor.protobuf.v3.com.google.protobuf.Parser<Pipeline>,ExtensionRegistryLite)
>>>>>>> >>>>>
>>>>>>> >>>>> Is there something I need to change in my build?
>>>>>>> >>>>>
>>>>>>> >>>>> Andrew
>>>>>>> >>>>>
>>>>>>> >>>>> On Tue, Jul 10, 2018 at 2:10 PM Lukasz Cwik <lc...@google.com> 
>>>>>>> >>>>> wrote:
>>>>>>> >>>>>>
>>>>>>> >>>>>> With the merge of PR #5594[1], we started shading all gRPC / 
>>>>>>> >>>>>> Protobuf dependencies within all the modules that depended on 
>>>>>>> >>>>>> the model/* dependencies by vendoring them. The vendored 
>>>>>>> >>>>>> versions are built and packaged into the model jars (they should 
>>>>>>> >>>>>> be separated out once I figure out how to generate proto code 
>>>>>>> >>>>>> using a shaded import path). Note that this cleaned up several 
>>>>>>> >>>>>> issues where we were incorrectly built shaded jars without 
>>>>>>> >>>>>> repackaging in some locations or the shading process was 
>>>>>>> >>>>>> corrupting the contents of some of the jars.
>>>>>>> >>>>>>
>>>>>>> >>>>>> Note that the majority of the code base (especially related to 
>>>>>>> >>>>>> portability) should be using imports under the 
>>>>>>> >>>>>> org.apache.beam.vendor.protobuf.v3 or 
>>>>>>> >>>>>> org.apache.beam.vendor.grpc.v1 paths. I have yet to figure out a 
>>>>>>> >>>>>> clean way to get Intellij to recognize these vendored paths. My 
>>>>>>> >>>>>> only solution so far has been to manually add one of the built 
>>>>>>> >>>>>> model jars to the compile classpath of the module being worked 
>>>>>>> >>>>>> on in Intellij as described here[2]. I would greatly appreciate 
>>>>>>> >>>>>> some ideas on how to improve this integration because from a few 
>>>>>>> >>>>>> attempts configuring the intellij gradle pluglin scope sections 
>>>>>>> >>>>>> didn't produce the result that I was expecting.
>>>>>>> >>>>>>
>>>>>>> >>>>>> I also added a simple test task 
>>>>>>> >>>>>> validateShadedJarDoesntLeakNonOrgApacheBeamClasses that 
>>>>>>> >>>>>> validates the shaded jar doesn't contain classes without 
>>>>>>> >>>>>> repackaging which we should apply to any module that performs 
>>>>>>> >>>>>> shading to ensure that classes are relocated and we don't 
>>>>>>> >>>>>> accidentally expose stuff. I filed BEAM-4753[3] to this end.
>>>>>>> >>>>>>
>>>>>>> >>>>>> Note, these are the following modules that still depend on 
>>>>>>> >>>>>> protobuf that are shaded away and could move to use a vendored 
>>>>>>> >>>>>> variant of protobuf:
>>>>>>> >>>>>> * sdks/java/core
>>>>>>> >>>>>> * sdks/java/extensions/sql
>>>>>>> >>>>>>
>>>>>>> >>>>>> These modules expose protobuf because it is part of the API 
>>>>>>> >>>>>> surface:
>>>>>>> >>>>>> * sdks/java/extensions/protobuf
>>>>>>> >>>>>> * sdks/java/io/google-cloud-platform (I believe that gRPC could 
>>>>>>> >>>>>> be shaded here but preferrably the IO module would do it so we 
>>>>>>> >>>>>> wouldn't have this maintenance burden.)
>>>>>>> >>>>>>
>>>>>>> >>>>>> 1: https://github.com/apache/beam/pull/5594
>>>>>>> >>>>>> 2: 
>>>>>>> >>>>>> https://stackoverflow.com/questions/1051640/correct-way-to-add-external-jars-lib-jar-to-an-intellij-idea-project
>>>>>>> >>>>>> 3: https://issues.apache.org/jira/browse/BEAM-4753

Reply via email to