Re: Vendoring / Shading Protobuf and gRPC

2018-07-19 Thread Lukasz Cwik
A lot of the package structure is a mess because we have a bunch of runner
support code in various runner/* packages which was responsible for
translating pipelines. I don't really have a strong opinion about whether
the translation happens within sdks/java/core or another package under
sdks/java/... but I do believe that all of this logic should get moved out
of runners/... The issue right now is that we are in a mode where we are
supporting both the old way and the portable way and we were placing
classes close to the old way because gRPC/Protobuf wasn't being shaded. Now
that they are shaded we have some more freedom but moving them around
doesn't seem to be a high priority.

Unfortunately, I don't know what we can do about GCP IO in a backwards
compatible manner since I believe one of the GCP connector exposes protobuf
generated classes as part of their API surface.

We still have the ApiSurface testing like SdkCoreApiSurfaceTest to ensure
that we aren't exposing classes that we shouldn't. Since we are vendoring
gRPC/protobuf for the model/job-management/fn-execution packages, does it
still matter to not expose vendored classes?


On Thu, Jul 19, 2018 at 1:44 AM Ismaël Mejía  wrote:

> Luke, thanks for explaining the idea, that’s what I expected, I was a
> bit confused after seeing that the proto transformation of the
> Pipeline happens in runners/core-contruction-java. I thought this was
> intended (maybe to keep the SDK smaller). So what is the future ? move
> all this machinery into sdks/java/core or a different module and make
> sdks/java/core depend on that ?
>
> In any case this looks like a not public API matter. What I would like
> is to be vigilant about the exposure of Protobuf stuff in the public
> API to avoid some of the issues we have had in the past (and still
> have on GCPIO). Do we have this kind of validation in place after the
> move to gradle (like not expose in public APIs guava/protobuf) ?
> On Wed, Jul 18, 2018 at 6:29 PM Lukasz Cwik  wrote:
> >
> > Ismael, the SDK should perform the pipeline translation to proto because
> I expect the flow to be:
> > User Code -> SDK -> Proto Translation -> Job API -> Runner
> > I don't expect "runners" to live within the users process anymore
> (excluding the direct runner). There will be one portable "runner" and it
> will be responsible for communicating with the job management APIs. It
> shouldn't be called a runner but for backwards compatibility it will behave
> like a runner does today. Flink/Spark/... will all live on the other side
> of the job management API.
> >
> > Thomas, I can run RemoteExectuionTest from commit
> ae2bebaf8b277e99840fa63f1b95d828f2093d16 without needing to modify the
> project/module structure in Intellij. Adding the jars manually only helps
> with code completion.
> >
> > https://github.com/apache/beam/pull/5977 works around the duplicate
> content root issue in Intellij. I have also run into the -Werror issue
> occasionally and don't know any fix or why it gets triggered as it doesn't
> happen to me all the time.
> >
> > On Tue, Jul 17, 2018 at 7:01 PM Thomas Weise  wrote:
> >>
> >> Thanks, the classpath order matters indeed.
> >>
> >> Still not able to run RemoteExecutionTest, but I was able to get the
> Flink portable test to work by adding the following to the top of the
> dependency list of beam-runners-flink_2.11_test
> >>
> >>
> vendor/sdks-java-extensions-protobuf/build/libs/beam-vendor-sdks-java-extensions-protobuf-2.6.0-SNAPSHOT.jar
> >> model/fn-execution/build/libs/beam-model-fn-execution-2.6.0-SNAPSHOT.jar
> >>
> >>
> >> On Tue, Jul 17, 2018 at 6:00 PM Ankur Goenka  wrote:
> >>>
> >>> Yes, I am able to run it.
> >>>
> >>> For tests, you also need to add dependencies to
> ":beam-runners-java-fn-execution/beam-runners-java-fn-execution_test"
> module.
> >>>
> >>> Also, I only added
> >>> :beam-model-job-management-2.6.0-SNAPSHOT.jar
> >>> :beam-model-fn-execution-2.6.0-SNAPSHOT.jar
> >>> to the dependencies manually so not sure if you want to add
> >>> io.grpc:grpc-core:1.12.0 and com.google.protobuf:protobuf-java:3.5.1
> to the dependencies.
> >>>
> >>> Note, you need to move them up in the dependencies list.
> >>>
> >>>
> >>> On Tue, Jul 17, 2018 at 5:54 PM Thomas Weise  wrote:
> 
>  Are you able to run
> org.apache.beam.runners.fnexecution.control.RemoteExecutionTest from within
> Intellij ?
> 
>  I can get the compile errors to disappear by adding
> beam-model-job-management-2.6.0-SNAPSHOT.jar, io.grpc:grpc-core:1.12.0 and
> com.google.protobuf:protobuf-java:3.5.1
> 
>  Running the test still fails since other dependencies are missing.
> 
> 
>  On Tue, Jul 17, 2018 at 4:02 PM Ankur Goenka 
> wrote:
> >
> > For reference:
> > I was able to make intellij work with the master by doing following
> steps
> >
> > Remove module :beam:vendor-sdks-java-extensions-protobuf from
> intellij.
> > Adding
> 

Re: Vendoring / Shading Protobuf and gRPC

2018-07-19 Thread Ismaël Mejía
Luke, thanks for explaining the idea, that’s what I expected, I was a
bit confused after seeing that the proto transformation of the
Pipeline happens in runners/core-contruction-java. I thought this was
intended (maybe to keep the SDK smaller). So what is the future ? move
all this machinery into sdks/java/core or a different module and make
sdks/java/core depend on that ?

In any case this looks like a not public API matter. What I would like
is to be vigilant about the exposure of Protobuf stuff in the public
API to avoid some of the issues we have had in the past (and still
have on GCPIO). Do we have this kind of validation in place after the
move to gradle (like not expose in public APIs guava/protobuf) ?
On Wed, Jul 18, 2018 at 6:29 PM Lukasz Cwik  wrote:
>
> Ismael, the SDK should perform the pipeline translation to proto because I 
> expect the flow to be:
> User Code -> SDK -> Proto Translation -> Job API -> Runner
> I don't expect "runners" to live within the users process anymore (excluding 
> the direct runner). There will be one portable "runner" and it will be 
> responsible for communicating with the job management APIs. It shouldn't be 
> called a runner but for backwards compatibility it will behave like a runner 
> does today. Flink/Spark/... will all live on the other side of the job 
> management API.
>
> Thomas, I can run RemoteExectuionTest from commit 
> ae2bebaf8b277e99840fa63f1b95d828f2093d16 without needing to modify the 
> project/module structure in Intellij. Adding the jars manually only helps 
> with code completion.
>
> https://github.com/apache/beam/pull/5977 works around the duplicate content 
> root issue in Intellij. I have also run into the -Werror issue occasionally 
> and don't know any fix or why it gets triggered as it doesn't happen to me 
> all the time.
>
> On Tue, Jul 17, 2018 at 7:01 PM Thomas Weise  wrote:
>>
>> Thanks, the classpath order matters indeed.
>>
>> Still not able to run RemoteExecutionTest, but I was able to get the Flink 
>> portable test to work by adding the following to the top of the dependency 
>> list of beam-runners-flink_2.11_test
>>
>> vendor/sdks-java-extensions-protobuf/build/libs/beam-vendor-sdks-java-extensions-protobuf-2.6.0-SNAPSHOT.jar
>> model/fn-execution/build/libs/beam-model-fn-execution-2.6.0-SNAPSHOT.jar
>>
>>
>> On Tue, Jul 17, 2018 at 6:00 PM Ankur Goenka  wrote:
>>>
>>> Yes, I am able to run it.
>>>
>>> For tests, you also need to add dependencies to 
>>> ":beam-runners-java-fn-execution/beam-runners-java-fn-execution_test" 
>>> module.
>>>
>>> Also, I only added
>>> :beam-model-job-management-2.6.0-SNAPSHOT.jar
>>> :beam-model-fn-execution-2.6.0-SNAPSHOT.jar
>>> to the dependencies manually so not sure if you want to add
>>> io.grpc:grpc-core:1.12.0 and com.google.protobuf:protobuf-java:3.5.1 to the 
>>> dependencies.
>>>
>>> Note, you need to move them up in the dependencies list.
>>>
>>>
>>> On Tue, Jul 17, 2018 at 5:54 PM Thomas Weise  wrote:

 Are you able to run 
 org.apache.beam.runners.fnexecution.control.RemoteExecutionTest from 
 within Intellij ?

 I can get the compile errors to disappear by adding 
 beam-model-job-management-2.6.0-SNAPSHOT.jar, io.grpc:grpc-core:1.12.0 and 
 com.google.protobuf:protobuf-java:3.5.1

 Running the test still fails since other dependencies are missing.


 On Tue, Jul 17, 2018 at 4:02 PM Ankur Goenka  wrote:
>
> For reference:
> I was able to make intellij work with the master by doing following steps
>
> Remove module :beam:vendor-sdks-java-extensions-protobuf from intellij.
> Adding 
> :beam-model-fn-execution/build/libs/beam-model-fn-execution-2.6.0-SNAPSHOT.jar
>  and 
> :beam-model-job-management/build/libs/beam-model-job-management-2.6.0-SNAPSHOT.jar
>  to the appropriate modules at the top of the dependency list.
>
>
> On Tue, Jul 17, 2018 at 2:29 PM Thomas Weise  wrote:
>>
>> Adding the external jar in Intellij (2018.1) currently fails due to a 
>> duplicate source directory (sdks/java/extensions/protobuf/src/main/java).
>>
>> The build as such also fails, with:  error: warnings found and -Werror 
>> specified
>>
>> Ismaël found removing 
>> https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L538
>>  as workaround.
>>
>>
>> On Thu, Jul 12, 2018 at 1:55 PM Ismaël Mejía  wrote:
>>>
>>> Seems reasonable, but why exactly may we need the model (or protobuf
>>> related things) in the future in the SDK ? wasn’t it supposed to be
>>> translated into the Pipeline proto representation via the runners (and
>>> in this case the dep reside in the runner side) ?
>>> On Thu, Jul 12, 2018 at 2:50 AM Lukasz Cwik  wrote:
>>> >
>>> > Got a fix[1] for Andrews issue which turned out to be a release 
>>> > blocker since it broke performing the 

Re: Vendoring / Shading Protobuf and gRPC

2018-07-18 Thread Ankur Goenka
FYI: The beam version just got updated today to 2.7-SNAPSHOT
.
This require re-adding the dependency jars for 2.7-SNAPSHOT version to the
projects.

On Wed, Jul 18, 2018 at 1:22 PM Lukasz Cwik  wrote:

> Are you delegating your test runs to Gradle or using Intellij's test
> runner?
>
> I have been delegating my test runs to Gradle and that has been working
> well for me. The platform test runner fails due to the missing classes as
> you mentioned.
>
> I also spent two hours trying to muck around with the Gradle Idea plugin
> to get it to use the shaded jars instead of the output classes directly
> without much success. Reached out to the Gradle community forum[1] to see
> what they say.
>
> 1:
> https://discuss.gradle.org/t/how-to-get-intellij-to-use-module-output-jars-instead-of-output-classes/27794
>
> On Wed, Jul 18, 2018 at 11:06 AM Thomas Weise  wrote:
>
>> Thanks for fixing the duplicate root issue, PR is merged.
>>
>> I cannot run RemoteExecutionTest from the same commit (Intellij
>> v2018.1.6). Here are some of the class loading errors before I give up
>> adding dependencies manually :(
>>
>> java.lang.NoClassDefFoundError:
>> org/apache/beam/vendor/grpc/v1/io/grpc/BindableService
>> java.lang.NoClassDefFoundError: com/google/protobuf/ByteString
>> java.lang.NoClassDefFoundError:
>> org/apache/beam/vendor/sdk/v2/sdk/extensions/protobuf/ByteStringCoder
>> Caused by: java.lang.ClassNotFoundException: net.bytebuddy.NamingStrategy
>>
>> Our Intellij results are all over the board, possibly due to different
>> versions or other settings/plugins? It will be important to sort it out to
>> make the contributor experience smoother.
>>
>> Thanks,
>> Thomas
>>
>>
>> On Wed, Jul 18, 2018 at 9:29 AM Lukasz Cwik  wrote:
>>
>>> Ismael, the SDK should perform the pipeline translation to proto because
>>> I expect the flow to be:
>>> User Code -> SDK -> Proto Translation -> Job API -> Runner
>>> I don't expect "runners" to live within the users process anymore
>>> (excluding the direct runner). There will be one portable "runner" and it
>>> will be responsible for communicating with the job management APIs. It
>>> shouldn't be called a runner but for backwards compatibility it will behave
>>> like a runner does today. Flink/Spark/... will all live on the other side
>>> of the job management API.
>>>
>>> Thomas, I can run RemoteExectuionTest from commit
>>> ae2bebaf8b277e99840fa63f1b95d828f2093d16 without needing to modify the
>>> project/module structure in Intellij. Adding the jars manually only helps
>>> with code completion.
>>>
>>> https://github.com/apache/beam/pull/5977 works around the duplicate
>>> content root issue in Intellij. I have also run into the -Werror issue
>>> occasionally and don't know any fix or why it gets triggered as it doesn't
>>> happen to me all the time.
>>>
>>> On Tue, Jul 17, 2018 at 7:01 PM Thomas Weise  wrote:
>>>
 Thanks, the classpath order matters indeed.

 Still not able to run RemoteExecutionTest, but I was able to get the
 Flink portable test to work by adding the following to the *top* of
 the dependency list of *beam-runners-flink_2.11_test*


 vendor/sdks-java-extensions-protobuf/build/libs/beam-vendor-sdks-java-extensions-protobuf-2.6.0-SNAPSHOT.jar
 model/fn-execution/build/libs/beam-model-fn-execution-2.6.0-SNAPSHOT.jar


 On Tue, Jul 17, 2018 at 6:00 PM Ankur Goenka  wrote:

> Yes, I am able to run it.
>
> For tests, you also need to add dependencies to
> ":beam-runners-java-fn-execution/beam-runners-java-fn-execution_*test*"
> module.
>
> Also, I only added
> :beam-model-job-management-2.6.0-SNAPSHOT.jar
> :beam-model-fn-execution-2.6.0-SNAPSHOT.jar
> to the dependencies manually so not sure if you want to add
> io.grpc:grpc-core:1.12.0 and com.google.protobuf:protobuf-java:3.5.1
> to the dependencies.
>
> Note, you need to move them up in the dependencies list.
>
>
> On Tue, Jul 17, 2018 at 5:54 PM Thomas Weise  wrote:
>
>> Are you able to
>> run org.apache.beam.runners.fnexecution.control.RemoteExecutionTest from
>> within Intellij ?
>>
>> I can get the compile errors to disappear by adding
>> beam-model-job-management-2.6.0-SNAPSHOT.jar, io.grpc:grpc-core:1.12.0
>> and com.google.protobuf:protobuf-java:3.5.1
>>
>> Running the test still fails since other dependencies are missing.
>>
>>
>> On Tue, Jul 17, 2018 at 4:02 PM Ankur Goenka 
>> wrote:
>>
>>> For reference:
>>> I was able to make intellij work with the master by doing following
>>> steps
>>>
>>>1. Remove module :beam:vendor-sdks-java-extensions-protobuf from
>>>intellij.
>>>2. Adding
>>>
>>> :beam-model-fn-execution/build/libs/beam-model-fn-execution-2.6.0-SNAPSHOT.jar
>>>and 
>>> 

Re: Vendoring / Shading Protobuf and gRPC

2018-07-18 Thread Lukasz Cwik
Are you delegating your test runs to Gradle or using Intellij's test runner?

I have been delegating my test runs to Gradle and that has been working
well for me. The platform test runner fails due to the missing classes as
you mentioned.

I also spent two hours trying to muck around with the Gradle Idea plugin to
get it to use the shaded jars instead of the output classes directly
without much success. Reached out to the Gradle community forum[1] to see
what they say.

1:
https://discuss.gradle.org/t/how-to-get-intellij-to-use-module-output-jars-instead-of-output-classes/27794

On Wed, Jul 18, 2018 at 11:06 AM Thomas Weise  wrote:

> Thanks for fixing the duplicate root issue, PR is merged.
>
> I cannot run RemoteExecutionTest from the same commit (Intellij
> v2018.1.6). Here are some of the class loading errors before I give up
> adding dependencies manually :(
>
> java.lang.NoClassDefFoundError:
> org/apache/beam/vendor/grpc/v1/io/grpc/BindableService
> java.lang.NoClassDefFoundError: com/google/protobuf/ByteString
> java.lang.NoClassDefFoundError:
> org/apache/beam/vendor/sdk/v2/sdk/extensions/protobuf/ByteStringCoder
> Caused by: java.lang.ClassNotFoundException: net.bytebuddy.NamingStrategy
>
> Our Intellij results are all over the board, possibly due to different
> versions or other settings/plugins? It will be important to sort it out to
> make the contributor experience smoother.
>
> Thanks,
> Thomas
>
>
> On Wed, Jul 18, 2018 at 9:29 AM Lukasz Cwik  wrote:
>
>> Ismael, the SDK should perform the pipeline translation to proto because
>> I expect the flow to be:
>> User Code -> SDK -> Proto Translation -> Job API -> Runner
>> I don't expect "runners" to live within the users process anymore
>> (excluding the direct runner). There will be one portable "runner" and it
>> will be responsible for communicating with the job management APIs. It
>> shouldn't be called a runner but for backwards compatibility it will behave
>> like a runner does today. Flink/Spark/... will all live on the other side
>> of the job management API.
>>
>> Thomas, I can run RemoteExectuionTest from commit
>> ae2bebaf8b277e99840fa63f1b95d828f2093d16 without needing to modify the
>> project/module structure in Intellij. Adding the jars manually only helps
>> with code completion.
>>
>> https://github.com/apache/beam/pull/5977 works around the duplicate
>> content root issue in Intellij. I have also run into the -Werror issue
>> occasionally and don't know any fix or why it gets triggered as it doesn't
>> happen to me all the time.
>>
>> On Tue, Jul 17, 2018 at 7:01 PM Thomas Weise  wrote:
>>
>>> Thanks, the classpath order matters indeed.
>>>
>>> Still not able to run RemoteExecutionTest, but I was able to get the
>>> Flink portable test to work by adding the following to the *top* of the
>>> dependency list of *beam-runners-flink_2.11_test*
>>>
>>>
>>> vendor/sdks-java-extensions-protobuf/build/libs/beam-vendor-sdks-java-extensions-protobuf-2.6.0-SNAPSHOT.jar
>>> model/fn-execution/build/libs/beam-model-fn-execution-2.6.0-SNAPSHOT.jar
>>>
>>>
>>> On Tue, Jul 17, 2018 at 6:00 PM Ankur Goenka  wrote:
>>>
 Yes, I am able to run it.

 For tests, you also need to add dependencies to
 ":beam-runners-java-fn-execution/beam-runners-java-fn-execution_*test*"
 module.

 Also, I only added
 :beam-model-job-management-2.6.0-SNAPSHOT.jar
 :beam-model-fn-execution-2.6.0-SNAPSHOT.jar
 to the dependencies manually so not sure if you want to add
 io.grpc:grpc-core:1.12.0 and com.google.protobuf:protobuf-java:3.5.1 to
 the dependencies.

 Note, you need to move them up in the dependencies list.


 On Tue, Jul 17, 2018 at 5:54 PM Thomas Weise  wrote:

> Are you able to
> run org.apache.beam.runners.fnexecution.control.RemoteExecutionTest from
> within Intellij ?
>
> I can get the compile errors to disappear by adding
> beam-model-job-management-2.6.0-SNAPSHOT.jar, io.grpc:grpc-core:1.12.0
> and com.google.protobuf:protobuf-java:3.5.1
>
> Running the test still fails since other dependencies are missing.
>
>
> On Tue, Jul 17, 2018 at 4:02 PM Ankur Goenka 
> wrote:
>
>> For reference:
>> I was able to make intellij work with the master by doing following
>> steps
>>
>>1. Remove module :beam:vendor-sdks-java-extensions-protobuf from
>>intellij.
>>2. Adding
>>
>> :beam-model-fn-execution/build/libs/beam-model-fn-execution-2.6.0-SNAPSHOT.jar
>>and 
>> :beam-model-job-management/build/libs/beam-model-job-management-2.6.0-SNAPSHOT.jar
>>to the appropriate modules at the top of the dependency list.
>>
>>
>> On Tue, Jul 17, 2018 at 2:29 PM Thomas Weise  wrote:
>>
>>> Adding the external jar in Intellij (2018.1) currently fails due to
>>> a duplicate source directory 
>>> (sdks/java/extensions/protobuf/src/main/java).
>>>

Re: Vendoring / Shading Protobuf and gRPC

2018-07-18 Thread Thomas Weise
Thanks for fixing the duplicate root issue, PR is merged.

I cannot run RemoteExecutionTest from the same commit (Intellij v2018.1.6).
Here are some of the class loading errors before I give up adding
dependencies manually :(

java.lang.NoClassDefFoundError:
org/apache/beam/vendor/grpc/v1/io/grpc/BindableService
java.lang.NoClassDefFoundError: com/google/protobuf/ByteString
java.lang.NoClassDefFoundError:
org/apache/beam/vendor/sdk/v2/sdk/extensions/protobuf/ByteStringCoder
Caused by: java.lang.ClassNotFoundException: net.bytebuddy.NamingStrategy

Our Intellij results are all over the board, possibly due to different
versions or other settings/plugins? It will be important to sort it out to
make the contributor experience smoother.

Thanks,
Thomas


On Wed, Jul 18, 2018 at 9:29 AM Lukasz Cwik  wrote:

> Ismael, the SDK should perform the pipeline translation to proto because I
> expect the flow to be:
> User Code -> SDK -> Proto Translation -> Job API -> Runner
> I don't expect "runners" to live within the users process anymore
> (excluding the direct runner). There will be one portable "runner" and it
> will be responsible for communicating with the job management APIs. It
> shouldn't be called a runner but for backwards compatibility it will behave
> like a runner does today. Flink/Spark/... will all live on the other side
> of the job management API.
>
> Thomas, I can run RemoteExectuionTest from commit
> ae2bebaf8b277e99840fa63f1b95d828f2093d16 without needing to modify the
> project/module structure in Intellij. Adding the jars manually only helps
> with code completion.
>
> https://github.com/apache/beam/pull/5977 works around the duplicate
> content root issue in Intellij. I have also run into the -Werror issue
> occasionally and don't know any fix or why it gets triggered as it doesn't
> happen to me all the time.
>
> On Tue, Jul 17, 2018 at 7:01 PM Thomas Weise  wrote:
>
>> Thanks, the classpath order matters indeed.
>>
>> Still not able to run RemoteExecutionTest, but I was able to get the
>> Flink portable test to work by adding the following to the *top* of the
>> dependency list of *beam-runners-flink_2.11_test*
>>
>>
>> vendor/sdks-java-extensions-protobuf/build/libs/beam-vendor-sdks-java-extensions-protobuf-2.6.0-SNAPSHOT.jar
>> model/fn-execution/build/libs/beam-model-fn-execution-2.6.0-SNAPSHOT.jar
>>
>>
>> On Tue, Jul 17, 2018 at 6:00 PM Ankur Goenka  wrote:
>>
>>> Yes, I am able to run it.
>>>
>>> For tests, you also need to add dependencies to
>>> ":beam-runners-java-fn-execution/beam-runners-java-fn-execution_*test*"
>>> module.
>>>
>>> Also, I only added
>>> :beam-model-job-management-2.6.0-SNAPSHOT.jar
>>> :beam-model-fn-execution-2.6.0-SNAPSHOT.jar
>>> to the dependencies manually so not sure if you want to add
>>> io.grpc:grpc-core:1.12.0 and com.google.protobuf:protobuf-java:3.5.1 to
>>> the dependencies.
>>>
>>> Note, you need to move them up in the dependencies list.
>>>
>>>
>>> On Tue, Jul 17, 2018 at 5:54 PM Thomas Weise  wrote:
>>>
 Are you able to
 run org.apache.beam.runners.fnexecution.control.RemoteExecutionTest from
 within Intellij ?

 I can get the compile errors to disappear by adding
 beam-model-job-management-2.6.0-SNAPSHOT.jar, io.grpc:grpc-core:1.12.0
 and com.google.protobuf:protobuf-java:3.5.1

 Running the test still fails since other dependencies are missing.


 On Tue, Jul 17, 2018 at 4:02 PM Ankur Goenka  wrote:

> For reference:
> I was able to make intellij work with the master by doing following
> steps
>
>1. Remove module :beam:vendor-sdks-java-extensions-protobuf from
>intellij.
>2. Adding
>
> :beam-model-fn-execution/build/libs/beam-model-fn-execution-2.6.0-SNAPSHOT.jar
>and 
> :beam-model-job-management/build/libs/beam-model-job-management-2.6.0-SNAPSHOT.jar
>to the appropriate modules at the top of the dependency list.
>
>
> On Tue, Jul 17, 2018 at 2:29 PM Thomas Weise  wrote:
>
>> Adding the external jar in Intellij (2018.1) currently fails due to a
>> duplicate source directory (sdks/java/extensions/protobuf/src/main/java).
>>
>> The build as such also fails, with:  error: warnings found and
>> -Werror specified
>>
>> Ismaël found removing
>> https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L538
>> as workaround.
>>
>>
>> On Thu, Jul 12, 2018 at 1:55 PM Ismaël Mejía 
>> wrote:
>>
>>> Seems reasonable, but why exactly may we need the model (or protobuf
>>> related things) in the future in the SDK ? wasn’t it supposed to be
>>> translated into the Pipeline proto representation via the runners
>>> (and
>>> in this case the dep reside in the runner side) ?
>>> On Thu, Jul 12, 2018 at 2:50 AM Lukasz Cwik 
>>> wrote:
>>> >
>>> > Got a fix[1] for Andrews issue 

Re: Vendoring / Shading Protobuf and gRPC

2018-07-18 Thread Lukasz Cwik
Ismael, the SDK should perform the pipeline translation to proto because I
expect the flow to be:
User Code -> SDK -> Proto Translation -> Job API -> Runner
I don't expect "runners" to live within the users process anymore
(excluding the direct runner). There will be one portable "runner" and it
will be responsible for communicating with the job management APIs. It
shouldn't be called a runner but for backwards compatibility it will behave
like a runner does today. Flink/Spark/... will all live on the other side
of the job management API.

Thomas, I can run RemoteExectuionTest from commit
ae2bebaf8b277e99840fa63f1b95d828f2093d16 without needing to modify the
project/module structure in Intellij. Adding the jars manually only helps
with code completion.

https://github.com/apache/beam/pull/5977 works around the duplicate content
root issue in Intellij. I have also run into the -Werror issue occasionally
and don't know any fix or why it gets triggered as it doesn't happen to me
all the time.

On Tue, Jul 17, 2018 at 7:01 PM Thomas Weise  wrote:

> Thanks, the classpath order matters indeed.
>
> Still not able to run RemoteExecutionTest, but I was able to get the Flink
> portable test to work by adding the following to the *top* of the
> dependency list of *beam-runners-flink_2.11_test*
>
>
> vendor/sdks-java-extensions-protobuf/build/libs/beam-vendor-sdks-java-extensions-protobuf-2.6.0-SNAPSHOT.jar
> model/fn-execution/build/libs/beam-model-fn-execution-2.6.0-SNAPSHOT.jar
>
>
> On Tue, Jul 17, 2018 at 6:00 PM Ankur Goenka  wrote:
>
>> Yes, I am able to run it.
>>
>> For tests, you also need to add dependencies to
>> ":beam-runners-java-fn-execution/beam-runners-java-fn-execution_*test*"
>> module.
>>
>> Also, I only added
>> :beam-model-job-management-2.6.0-SNAPSHOT.jar
>> :beam-model-fn-execution-2.6.0-SNAPSHOT.jar
>> to the dependencies manually so not sure if you want to add
>> io.grpc:grpc-core:1.12.0 and com.google.protobuf:protobuf-java:3.5.1 to
>> the dependencies.
>>
>> Note, you need to move them up in the dependencies list.
>>
>>
>> On Tue, Jul 17, 2018 at 5:54 PM Thomas Weise  wrote:
>>
>>> Are you able to
>>> run org.apache.beam.runners.fnexecution.control.RemoteExecutionTest from
>>> within Intellij ?
>>>
>>> I can get the compile errors to disappear by adding
>>> beam-model-job-management-2.6.0-SNAPSHOT.jar, io.grpc:grpc-core:1.12.0
>>> and com.google.protobuf:protobuf-java:3.5.1
>>>
>>> Running the test still fails since other dependencies are missing.
>>>
>>>
>>> On Tue, Jul 17, 2018 at 4:02 PM Ankur Goenka  wrote:
>>>
 For reference:
 I was able to make intellij work with the master by doing following
 steps

1. Remove module :beam:vendor-sdks-java-extensions-protobuf from
intellij.
2. Adding

 :beam-model-fn-execution/build/libs/beam-model-fn-execution-2.6.0-SNAPSHOT.jar
and 
 :beam-model-job-management/build/libs/beam-model-job-management-2.6.0-SNAPSHOT.jar
to the appropriate modules at the top of the dependency list.


 On Tue, Jul 17, 2018 at 2:29 PM Thomas Weise  wrote:

> Adding the external jar in Intellij (2018.1) currently fails due to a
> duplicate source directory (sdks/java/extensions/protobuf/src/main/java).
>
> The build as such also fails, with:  error: warnings found and -Werror
> specified
>
> Ismaël found removing
> https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L538
> as workaround.
>
>
> On Thu, Jul 12, 2018 at 1:55 PM Ismaël Mejía 
> wrote:
>
>> Seems reasonable, but why exactly may we need the model (or protobuf
>> related things) in the future in the SDK ? wasn’t it supposed to be
>> translated into the Pipeline proto representation via the runners (and
>> in this case the dep reside in the runner side) ?
>> On Thu, Jul 12, 2018 at 2:50 AM Lukasz Cwik  wrote:
>> >
>> > Got a fix[1] for Andrews issue which turned out to be a release
>> blocker since it broke performing the release. Also fixed several minor
>> things like javadoc that were wrong with the release. Solving it allowed 
>> me
>> to do the publishing in parallel and cut the release time from 20+ mins 
>> to
>> 8 mins on my machine.
>> >
>> > 1: https://github.com/apache/beam/pull/5936
>> >
>> > On Wed, Jul 11, 2018 at 3:51 PM Andrew Pilloud 
>> wrote:
>> >>
>> >> We discussed this in person, sounds like my issue is known and
>> will be fixed shortly. I'm running builds with '-Ppublishing' because I
>> need to generate release artifacts for bundling the Beam SQL shell with 
>> the
>> Google Cloud SDK. Hope to eventually just use the Beam release, but we 
>> are
>> currently cutting a release off master every week to quickly iterate on 
>> bug
>> fixes.
>> >>
>> >> Andrew
>> >>

Re: Vendoring / Shading Protobuf and gRPC

2018-07-17 Thread Thomas Weise
Thanks, the classpath order matters indeed.

Still not able to run RemoteExecutionTest, but I was able to get the Flink
portable test to work by adding the following to the *top* of the
dependency list of *beam-runners-flink_2.11_test*

vendor/sdks-java-extensions-protobuf/build/libs/beam-vendor-sdks-java-extensions-protobuf-2.6.0-SNAPSHOT.jar
model/fn-execution/build/libs/beam-model-fn-execution-2.6.0-SNAPSHOT.jar


On Tue, Jul 17, 2018 at 6:00 PM Ankur Goenka  wrote:

> Yes, I am able to run it.
>
> For tests, you also need to add dependencies to
> ":beam-runners-java-fn-execution/beam-runners-java-fn-execution_*test*"
> module.
>
> Also, I only added
> :beam-model-job-management-2.6.0-SNAPSHOT.jar
> :beam-model-fn-execution-2.6.0-SNAPSHOT.jar
> to the dependencies manually so not sure if you want to add
> io.grpc:grpc-core:1.12.0 and com.google.protobuf:protobuf-java:3.5.1 to
> the dependencies.
>
> Note, you need to move them up in the dependencies list.
>
>
> On Tue, Jul 17, 2018 at 5:54 PM Thomas Weise  wrote:
>
>> Are you able to
>> run org.apache.beam.runners.fnexecution.control.RemoteExecutionTest from
>> within Intellij ?
>>
>> I can get the compile errors to disappear by adding
>> beam-model-job-management-2.6.0-SNAPSHOT.jar, io.grpc:grpc-core:1.12.0
>> and com.google.protobuf:protobuf-java:3.5.1
>>
>> Running the test still fails since other dependencies are missing.
>>
>>
>> On Tue, Jul 17, 2018 at 4:02 PM Ankur Goenka  wrote:
>>
>>> For reference:
>>> I was able to make intellij work with the master by doing following steps
>>>
>>>1. Remove module :beam:vendor-sdks-java-extensions-protobuf from
>>>intellij.
>>>2. Adding
>>>
>>> :beam-model-fn-execution/build/libs/beam-model-fn-execution-2.6.0-SNAPSHOT.jar
>>>and 
>>> :beam-model-job-management/build/libs/beam-model-job-management-2.6.0-SNAPSHOT.jar
>>>to the appropriate modules at the top of the dependency list.
>>>
>>>
>>> On Tue, Jul 17, 2018 at 2:29 PM Thomas Weise  wrote:
>>>
 Adding the external jar in Intellij (2018.1) currently fails due to a
 duplicate source directory (sdks/java/extensions/protobuf/src/main/java).

 The build as such also fails, with:  error: warnings found and -Werror
 specified

 Ismaël found removing
 https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L538
 as workaround.


 On Thu, Jul 12, 2018 at 1:55 PM Ismaël Mejía  wrote:

> Seems reasonable, but why exactly may we need the model (or protobuf
> related things) in the future in the SDK ? wasn’t it supposed to be
> translated into the Pipeline proto representation via the runners (and
> in this case the dep reside in the runner side) ?
> On Thu, Jul 12, 2018 at 2:50 AM Lukasz Cwik  wrote:
> >
> > Got a fix[1] for Andrews issue which turned out to be a release
> blocker since it broke performing the release. Also fixed several minor
> things like javadoc that were wrong with the release. Solving it allowed 
> me
> to do the publishing in parallel and cut the release time from 20+ mins to
> 8 mins on my machine.
> >
> > 1: https://github.com/apache/beam/pull/5936
> >
> > On Wed, Jul 11, 2018 at 3:51 PM Andrew Pilloud 
> wrote:
> >>
> >> We discussed this in person, sounds like my issue is known and will
> be fixed shortly. I'm running builds with '-Ppublishing' because I need to
> generate release artifacts for bundling the Beam SQL shell with the Google
> Cloud SDK. Hope to eventually just use the Beam release, but we are
> currently cutting a release off master every week to quickly iterate on 
> bug
> fixes.
> >>
> >> Andrew
> >>
> >> On Wed, Jul 11, 2018 at 1:39 PM Lukasz Cwik 
> wrote:
> >>>
> >>> Andrew, to my knowledge it seems as though your running into
> BEAM-4744, is there a reason you need to specify -Ppublishing?
> >>>
> >>> No particular reason to using ByteString within ByteKey and
> TextSource. Note that we currently do shade away protobuf in 
> sdks/java/core
> so we could either migrate to using a vendored version or re-implement the
> functionality to not use ByteString. Note that sdks/java/core can now
> dependend on the model/* classes and perform the Pipeline -> Proto
> translation as this will be needed to support portability efforts so I
> would prefer just migrating to use the vendored versions of the code. 
> Filed
> BEAM-4766.
> >>>
> >>> As for the IO module, I was referring to the upstream
> bigtable/bigquery/... libraries vended by Google. If they trimmed their 
> API
> surface to not expose gRPC or protobuf, then we wouldn't have to worry
> about having the shading logic within sdks/java/io/google-cloud-platform. 
> I
> know that this will be impossible for some connectors without backwards

Re: Vendoring / Shading Protobuf and gRPC

2018-07-17 Thread Ankur Goenka
Yes, I am able to run it.

For tests, you also need to add dependencies to
":beam-runners-java-fn-execution/beam-runners-java-fn-execution_*test*"
module.

Also, I only added
:beam-model-job-management-2.6.0-SNAPSHOT.jar
:beam-model-fn-execution-2.6.0-SNAPSHOT.jar
to the dependencies manually so not sure if you want to add
io.grpc:grpc-core:1.12.0 and com.google.protobuf:protobuf-java:3.5.1 to the
dependencies.

Note, you need to move them up in the dependencies list.


On Tue, Jul 17, 2018 at 5:54 PM Thomas Weise  wrote:

> Are you able to
> run org.apache.beam.runners.fnexecution.control.RemoteExecutionTest from
> within Intellij ?
>
> I can get the compile errors to disappear by adding
> beam-model-job-management-2.6.0-SNAPSHOT.jar, io.grpc:grpc-core:1.12.0
> and com.google.protobuf:protobuf-java:3.5.1
>
> Running the test still fails since other dependencies are missing.
>
>
> On Tue, Jul 17, 2018 at 4:02 PM Ankur Goenka  wrote:
>
>> For reference:
>> I was able to make intellij work with the master by doing following steps
>>
>>1. Remove module :beam:vendor-sdks-java-extensions-protobuf from
>>intellij.
>>2. Adding
>>
>> :beam-model-fn-execution/build/libs/beam-model-fn-execution-2.6.0-SNAPSHOT.jar
>>and 
>> :beam-model-job-management/build/libs/beam-model-job-management-2.6.0-SNAPSHOT.jar
>>to the appropriate modules at the top of the dependency list.
>>
>>
>> On Tue, Jul 17, 2018 at 2:29 PM Thomas Weise  wrote:
>>
>>> Adding the external jar in Intellij (2018.1) currently fails due to a
>>> duplicate source directory (sdks/java/extensions/protobuf/src/main/java).
>>>
>>> The build as such also fails, with:  error: warnings found and -Werror
>>> specified
>>>
>>> Ismaël found removing
>>> https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L538
>>> as workaround.
>>>
>>>
>>> On Thu, Jul 12, 2018 at 1:55 PM Ismaël Mejía  wrote:
>>>
 Seems reasonable, but why exactly may we need the model (or protobuf
 related things) in the future in the SDK ? wasn’t it supposed to be
 translated into the Pipeline proto representation via the runners (and
 in this case the dep reside in the runner side) ?
 On Thu, Jul 12, 2018 at 2:50 AM Lukasz Cwik  wrote:
 >
 > Got a fix[1] for Andrews issue which turned out to be a release
 blocker since it broke performing the release. Also fixed several minor
 things like javadoc that were wrong with the release. Solving it allowed me
 to do the publishing in parallel and cut the release time from 20+ mins to
 8 mins on my machine.
 >
 > 1: https://github.com/apache/beam/pull/5936
 >
 > On Wed, Jul 11, 2018 at 3:51 PM Andrew Pilloud 
 wrote:
 >>
 >> We discussed this in person, sounds like my issue is known and will
 be fixed shortly. I'm running builds with '-Ppublishing' because I need to
 generate release artifacts for bundling the Beam SQL shell with the Google
 Cloud SDK. Hope to eventually just use the Beam release, but we are
 currently cutting a release off master every week to quickly iterate on bug
 fixes.
 >>
 >> Andrew
 >>
 >> On Wed, Jul 11, 2018 at 1:39 PM Lukasz Cwik 
 wrote:
 >>>
 >>> Andrew, to my knowledge it seems as though your running into
 BEAM-4744, is there a reason you need to specify -Ppublishing?
 >>>
 >>> No particular reason to using ByteString within ByteKey and
 TextSource. Note that we currently do shade away protobuf in sdks/java/core
 so we could either migrate to using a vendored version or re-implement the
 functionality to not use ByteString. Note that sdks/java/core can now
 dependend on the model/* classes and perform the Pipeline -> Proto
 translation as this will be needed to support portability efforts so I
 would prefer just migrating to use the vendored versions of the code. Filed
 BEAM-4766.
 >>>
 >>> As for the IO module, I was referring to the upstream
 bigtable/bigquery/... libraries vended by Google. If they trimmed their API
 surface to not expose gRPC or protobuf, then we wouldn't have to worry
 about having the shading logic within sdks/java/io/google-cloud-platform. I
 know that this will be impossible for some connectors without backwards
 incompatible changes since they exposed protobuf on their API surface. I
 know that Chamikara was looking to shade this away in the
 sdks/java/io/google-cloud-platform but only had limited success in the 
 past.
 >>>
 >>> On Wed, Jul 11, 2018 at 1:14 PM Ismaël Mejía 
 wrote:
 
  This is great news in particular for runners (Spark) where the
 leaking of some grpc subdependencies caused stability issues and required
 extra shading. Great !
 
  About the other modules
 
  > Note, these are the following modules that still depend on
 protobuf 

Re: Vendoring / Shading Protobuf and gRPC

2018-07-17 Thread Thomas Weise
Are you able to
run org.apache.beam.runners.fnexecution.control.RemoteExecutionTest from
within Intellij ?

I can get the compile errors to disappear by adding
beam-model-job-management-2.6.0-SNAPSHOT.jar, io.grpc:grpc-core:1.12.0
and com.google.protobuf:protobuf-java:3.5.1

Running the test still fails since other dependencies are missing.


On Tue, Jul 17, 2018 at 4:02 PM Ankur Goenka  wrote:

> For reference:
> I was able to make intellij work with the master by doing following steps
>
>1. Remove module :beam:vendor-sdks-java-extensions-protobuf from
>intellij.
>2. Adding
>
> :beam-model-fn-execution/build/libs/beam-model-fn-execution-2.6.0-SNAPSHOT.jar
>and 
> :beam-model-job-management/build/libs/beam-model-job-management-2.6.0-SNAPSHOT.jar
>to the appropriate modules at the top of the dependency list.
>
>
> On Tue, Jul 17, 2018 at 2:29 PM Thomas Weise  wrote:
>
>> Adding the external jar in Intellij (2018.1) currently fails due to a
>> duplicate source directory (sdks/java/extensions/protobuf/src/main/java).
>>
>> The build as such also fails, with:  error: warnings found and -Werror
>> specified
>>
>> Ismaël found removing
>> https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L538
>> as workaround.
>>
>>
>> On Thu, Jul 12, 2018 at 1:55 PM Ismaël Mejía  wrote:
>>
>>> Seems reasonable, but why exactly may we need the model (or protobuf
>>> related things) in the future in the SDK ? wasn’t it supposed to be
>>> translated into the Pipeline proto representation via the runners (and
>>> in this case the dep reside in the runner side) ?
>>> On Thu, Jul 12, 2018 at 2:50 AM Lukasz Cwik  wrote:
>>> >
>>> > Got a fix[1] for Andrews issue which turned out to be a release
>>> blocker since it broke performing the release. Also fixed several minor
>>> things like javadoc that were wrong with the release. Solving it allowed me
>>> to do the publishing in parallel and cut the release time from 20+ mins to
>>> 8 mins on my machine.
>>> >
>>> > 1: https://github.com/apache/beam/pull/5936
>>> >
>>> > On Wed, Jul 11, 2018 at 3:51 PM Andrew Pilloud 
>>> wrote:
>>> >>
>>> >> We discussed this in person, sounds like my issue is known and will
>>> be fixed shortly. I'm running builds with '-Ppublishing' because I need to
>>> generate release artifacts for bundling the Beam SQL shell with the Google
>>> Cloud SDK. Hope to eventually just use the Beam release, but we are
>>> currently cutting a release off master every week to quickly iterate on bug
>>> fixes.
>>> >>
>>> >> Andrew
>>> >>
>>> >> On Wed, Jul 11, 2018 at 1:39 PM Lukasz Cwik  wrote:
>>> >>>
>>> >>> Andrew, to my knowledge it seems as though your running into
>>> BEAM-4744, is there a reason you need to specify -Ppublishing?
>>> >>>
>>> >>> No particular reason to using ByteString within ByteKey and
>>> TextSource. Note that we currently do shade away protobuf in sdks/java/core
>>> so we could either migrate to using a vendored version or re-implement the
>>> functionality to not use ByteString. Note that sdks/java/core can now
>>> dependend on the model/* classes and perform the Pipeline -> Proto
>>> translation as this will be needed to support portability efforts so I
>>> would prefer just migrating to use the vendored versions of the code. Filed
>>> BEAM-4766.
>>> >>>
>>> >>> As for the IO module, I was referring to the upstream
>>> bigtable/bigquery/... libraries vended by Google. If they trimmed their API
>>> surface to not expose gRPC or protobuf, then we wouldn't have to worry
>>> about having the shading logic within sdks/java/io/google-cloud-platform. I
>>> know that this will be impossible for some connectors without backwards
>>> incompatible changes since they exposed protobuf on their API surface. I
>>> know that Chamikara was looking to shade this away in the
>>> sdks/java/io/google-cloud-platform but only had limited success in the past.
>>> >>>
>>> >>> On Wed, Jul 11, 2018 at 1:14 PM Ismaël Mejía 
>>> wrote:
>>> 
>>>  This is great news in particular for runners (Spark) where the
>>> leaking of some grpc subdependencies caused stability issues and required
>>> extra shading. Great !
>>> 
>>>  About the other modules
>>> 
>>>  > Note, these are the following modules that still depend on
>>> protobuf that are shaded away and could move to use a vendored variant of
>>> protobuf:
>>>  > * sdks/java/core
>>>  > * sdks/java/extensions/sql
>>> 
>>>  For sdks/java/core the dependency in protobuf seems to be minor,
>>> from a quick look it seems that it is only used to import ByteString in two
>>> classes: ByteKey and TextSource so hopefully we can rewrite both and get
>>> rid of the dependency altogether (making core smaller which is always a
>>> win).
>>>  Can we fill a JIRA for this or do I miss other reasons to depend on
>>> protobuf in core?
>>> 
>>>  For sdks/java/extensions/sql I don’t know if I am missing

Re: Vendoring / Shading Protobuf and gRPC

2018-07-17 Thread Ankur Goenka
For reference:
I was able to make intellij work with the master by doing following steps

   1. Remove module :beam:vendor-sdks-java-extensions-protobuf from
   intellij.
   2. Adding
   
:beam-model-fn-execution/build/libs/beam-model-fn-execution-2.6.0-SNAPSHOT.jar
   and 
:beam-model-job-management/build/libs/beam-model-job-management-2.6.0-SNAPSHOT.jar
   to the appropriate modules at the top of the dependency list.


On Tue, Jul 17, 2018 at 2:29 PM Thomas Weise  wrote:

> Adding the external jar in Intellij (2018.1) currently fails due to a
> duplicate source directory (sdks/java/extensions/protobuf/src/main/java).
>
> The build as such also fails, with:  error: warnings found and -Werror
> specified
>
> Ismaël found removing
> https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L538
> as workaround.
>
>
> On Thu, Jul 12, 2018 at 1:55 PM Ismaël Mejía  wrote:
>
>> Seems reasonable, but why exactly may we need the model (or protobuf
>> related things) in the future in the SDK ? wasn’t it supposed to be
>> translated into the Pipeline proto representation via the runners (and
>> in this case the dep reside in the runner side) ?
>> On Thu, Jul 12, 2018 at 2:50 AM Lukasz Cwik  wrote:
>> >
>> > Got a fix[1] for Andrews issue which turned out to be a release blocker
>> since it broke performing the release. Also fixed several minor things like
>> javadoc that were wrong with the release. Solving it allowed me to do the
>> publishing in parallel and cut the release time from 20+ mins to 8 mins on
>> my machine.
>> >
>> > 1: https://github.com/apache/beam/pull/5936
>> >
>> > On Wed, Jul 11, 2018 at 3:51 PM Andrew Pilloud 
>> wrote:
>> >>
>> >> We discussed this in person, sounds like my issue is known and will be
>> fixed shortly. I'm running builds with '-Ppublishing' because I need to
>> generate release artifacts for bundling the Beam SQL shell with the Google
>> Cloud SDK. Hope to eventually just use the Beam release, but we are
>> currently cutting a release off master every week to quickly iterate on bug
>> fixes.
>> >>
>> >> Andrew
>> >>
>> >> On Wed, Jul 11, 2018 at 1:39 PM Lukasz Cwik  wrote:
>> >>>
>> >>> Andrew, to my knowledge it seems as though your running into
>> BEAM-4744, is there a reason you need to specify -Ppublishing?
>> >>>
>> >>> No particular reason to using ByteString within ByteKey and
>> TextSource. Note that we currently do shade away protobuf in sdks/java/core
>> so we could either migrate to using a vendored version or re-implement the
>> functionality to not use ByteString. Note that sdks/java/core can now
>> dependend on the model/* classes and perform the Pipeline -> Proto
>> translation as this will be needed to support portability efforts so I
>> would prefer just migrating to use the vendored versions of the code. Filed
>> BEAM-4766.
>> >>>
>> >>> As for the IO module, I was referring to the upstream
>> bigtable/bigquery/... libraries vended by Google. If they trimmed their API
>> surface to not expose gRPC or protobuf, then we wouldn't have to worry
>> about having the shading logic within sdks/java/io/google-cloud-platform. I
>> know that this will be impossible for some connectors without backwards
>> incompatible changes since they exposed protobuf on their API surface. I
>> know that Chamikara was looking to shade this away in the
>> sdks/java/io/google-cloud-platform but only had limited success in the past.
>> >>>
>> >>> On Wed, Jul 11, 2018 at 1:14 PM Ismaël Mejía 
>> wrote:
>> 
>>  This is great news in particular for runners (Spark) where the
>> leaking of some grpc subdependencies caused stability issues and required
>> extra shading. Great !
>> 
>>  About the other modules
>> 
>>  > Note, these are the following modules that still depend on
>> protobuf that are shaded away and could move to use a vendored variant of
>> protobuf:
>>  > * sdks/java/core
>>  > * sdks/java/extensions/sql
>> 
>>  For sdks/java/core the dependency in protobuf seems to be minor,
>> from a quick look it seems that it is only used to import ByteString in two
>> classes: ByteKey and TextSource so hopefully we can rewrite both and get
>> rid of the dependency altogether (making core smaller which is always a
>> win).
>>  Can we fill a JIRA for this or do I miss other reasons to depend on
>> protobuf in core?
>> 
>>  For sdks/java/extensions/sql I don’t know if I am missing something,
>> but I don’t see any code use of protobuf and I doubt that calcite uses
>> protobuf so maybe it is there just because it was leaking from somewhere
>> else in Beam, we should better check this first.
>> 
>>  > These modules expose protobuf because it is part of the API
>> surface:
>>  > * sdks/java/extensions/protobuf
>>  > * sdks/java/io/google-cloud-platform (I believe that gRPC could be
>> shaded here but preferrably the IO module would do it so we wouldn't have
>> this 

Re: Vendoring / Shading Protobuf and gRPC

2018-07-17 Thread Thomas Weise
Adding the external jar in Intellij (2018.1) currently fails due to a
duplicate source directory (sdks/java/extensions/protobuf/src/main/java).

The build as such also fails, with:  error: warnings found and -Werror
specified

Ismaël found removing
https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L538
as workaround.


On Thu, Jul 12, 2018 at 1:55 PM Ismaël Mejía  wrote:

> Seems reasonable, but why exactly may we need the model (or protobuf
> related things) in the future in the SDK ? wasn’t it supposed to be
> translated into the Pipeline proto representation via the runners (and
> in this case the dep reside in the runner side) ?
> On Thu, Jul 12, 2018 at 2:50 AM Lukasz Cwik  wrote:
> >
> > Got a fix[1] for Andrews issue which turned out to be a release blocker
> since it broke performing the release. Also fixed several minor things like
> javadoc that were wrong with the release. Solving it allowed me to do the
> publishing in parallel and cut the release time from 20+ mins to 8 mins on
> my machine.
> >
> > 1: https://github.com/apache/beam/pull/5936
> >
> > On Wed, Jul 11, 2018 at 3:51 PM Andrew Pilloud 
> wrote:
> >>
> >> We discussed this in person, sounds like my issue is known and will be
> fixed shortly. I'm running builds with '-Ppublishing' because I need to
> generate release artifacts for bundling the Beam SQL shell with the Google
> Cloud SDK. Hope to eventually just use the Beam release, but we are
> currently cutting a release off master every week to quickly iterate on bug
> fixes.
> >>
> >> Andrew
> >>
> >> On Wed, Jul 11, 2018 at 1:39 PM Lukasz Cwik  wrote:
> >>>
> >>> Andrew, to my knowledge it seems as though your running into
> BEAM-4744, is there a reason you need to specify -Ppublishing?
> >>>
> >>> No particular reason to using ByteString within ByteKey and
> TextSource. Note that we currently do shade away protobuf in sdks/java/core
> so we could either migrate to using a vendored version or re-implement the
> functionality to not use ByteString. Note that sdks/java/core can now
> dependend on the model/* classes and perform the Pipeline -> Proto
> translation as this will be needed to support portability efforts so I
> would prefer just migrating to use the vendored versions of the code. Filed
> BEAM-4766.
> >>>
> >>> As for the IO module, I was referring to the upstream
> bigtable/bigquery/... libraries vended by Google. If they trimmed their API
> surface to not expose gRPC or protobuf, then we wouldn't have to worry
> about having the shading logic within sdks/java/io/google-cloud-platform. I
> know that this will be impossible for some connectors without backwards
> incompatible changes since they exposed protobuf on their API surface. I
> know that Chamikara was looking to shade this away in the
> sdks/java/io/google-cloud-platform but only had limited success in the past.
> >>>
> >>> On Wed, Jul 11, 2018 at 1:14 PM Ismaël Mejía 
> wrote:
> 
>  This is great news in particular for runners (Spark) where the
> leaking of some grpc subdependencies caused stability issues and required
> extra shading. Great !
> 
>  About the other modules
> 
>  > Note, these are the following modules that still depend on protobuf
> that are shaded away and could move to use a vendored variant of protobuf:
>  > * sdks/java/core
>  > * sdks/java/extensions/sql
> 
>  For sdks/java/core the dependency in protobuf seems to be minor, from
> a quick look it seems that it is only used to import ByteString in two
> classes: ByteKey and TextSource so hopefully we can rewrite both and get
> rid of the dependency altogether (making core smaller which is always a
> win).
>  Can we fill a JIRA for this or do I miss other reasons to depend on
> protobuf in core?
> 
>  For sdks/java/extensions/sql I don’t know if I am missing something,
> but I don’t see any code use of protobuf and I doubt that calcite uses
> protobuf so maybe it is there just because it was leaking from somewhere
> else in Beam, we should better check this first.
> 
>  > These modules expose protobuf because it is part of the API surface:
>  > * sdks/java/extensions/protobuf
>  > * sdks/java/io/google-cloud-platform (I believe that gRPC could be
> shaded here but preferrably the IO module would do it so we wouldn't have
> this maintenance burden.)
> 
>  Can you please elaborate on ‘but preferrably the IO module would do
> it so we wouldn't have this maintenance burden’. I remember there was an
> issue when running the examples in the spark runner examples because of
> sdks/java/io/google-cloud-platform leaking netty via gRPC (BEAM-3519) [Note
> that this is hidden at this moment because of pure luck Spark 2.3.x and
> Beam are aligned on netty version but this can change in the future so
> hopefully this can be shaded/controlled].
> 
>  On Wed, Jul 11, 2018 at 8:55 PM Andrew Pilloud 
> wrote:
> 

Re: Vendoring / Shading Protobuf and gRPC

2018-07-12 Thread Ismaël Mejía
Seems reasonable, but why exactly may we need the model (or protobuf
related things) in the future in the SDK ? wasn’t it supposed to be
translated into the Pipeline proto representation via the runners (and
in this case the dep reside in the runner side) ?
On Thu, Jul 12, 2018 at 2:50 AM Lukasz Cwik  wrote:
>
> Got a fix[1] for Andrews issue which turned out to be a release blocker since 
> it broke performing the release. Also fixed several minor things like javadoc 
> that were wrong with the release. Solving it allowed me to do the publishing 
> in parallel and cut the release time from 20+ mins to 8 mins on my machine.
>
> 1: https://github.com/apache/beam/pull/5936
>
> On Wed, Jul 11, 2018 at 3:51 PM Andrew Pilloud  wrote:
>>
>> We discussed this in person, sounds like my issue is known and will be fixed 
>> shortly. I'm running builds with '-Ppublishing' because I need to generate 
>> release artifacts for bundling the Beam SQL shell with the Google Cloud SDK. 
>> Hope to eventually just use the Beam release, but we are currently cutting a 
>> release off master every week to quickly iterate on bug fixes.
>>
>> Andrew
>>
>> On Wed, Jul 11, 2018 at 1:39 PM Lukasz Cwik  wrote:
>>>
>>> Andrew, to my knowledge it seems as though your running into BEAM-4744, is 
>>> there a reason you need to specify -Ppublishing?
>>>
>>> No particular reason to using ByteString within ByteKey and TextSource. 
>>> Note that we currently do shade away protobuf in sdks/java/core so we could 
>>> either migrate to using a vendored version or re-implement the 
>>> functionality to not use ByteString. Note that sdks/java/core can now 
>>> dependend on the model/* classes and perform the Pipeline -> Proto 
>>> translation as this will be needed to support portability efforts so I 
>>> would prefer just migrating to use the vendored versions of the code. Filed 
>>> BEAM-4766.
>>>
>>> As for the IO module, I was referring to the upstream bigtable/bigquery/... 
>>> libraries vended by Google. If they trimmed their API surface to not expose 
>>> gRPC or protobuf, then we wouldn't have to worry about having the shading 
>>> logic within sdks/java/io/google-cloud-platform. I know that this will be 
>>> impossible for some connectors without backwards incompatible changes since 
>>> they exposed protobuf on their API surface. I know that Chamikara was 
>>> looking to shade this away in the sdks/java/io/google-cloud-platform but 
>>> only had limited success in the past.
>>>
>>> On Wed, Jul 11, 2018 at 1:14 PM Ismaël Mejía  wrote:

 This is great news in particular for runners (Spark) where the leaking of 
 some grpc subdependencies caused stability issues and required extra 
 shading. Great !

 About the other modules

 > Note, these are the following modules that still depend on protobuf that 
 > are shaded away and could move to use a vendored variant of protobuf:
 > * sdks/java/core
 > * sdks/java/extensions/sql

 For sdks/java/core the dependency in protobuf seems to be minor, from a 
 quick look it seems that it is only used to import ByteString in two 
 classes: ByteKey and TextSource so hopefully we can rewrite both and get 
 rid of the dependency altogether (making core smaller which is always a 
 win).
 Can we fill a JIRA for this or do I miss other reasons to depend on 
 protobuf in core?

 For sdks/java/extensions/sql I don’t know if I am missing something, but I 
 don’t see any code use of protobuf and I doubt that calcite uses protobuf 
 so maybe it is there just because it was leaking from somewhere else in 
 Beam, we should better check this first.

 > These modules expose protobuf because it is part of the API surface:
 > * sdks/java/extensions/protobuf
 > * sdks/java/io/google-cloud-platform (I believe that gRPC could be 
 > shaded here but preferrably the IO module would do it so we wouldn't 
 > have this maintenance burden.)

 Can you please elaborate on ‘but preferrably the IO module would do it so 
 we wouldn't have this maintenance burden’. I remember there was an issue 
 when running the examples in the spark runner examples because of 
 sdks/java/io/google-cloud-platform leaking netty via gRPC (BEAM-3519) 
 [Note that this is hidden at this moment because of pure luck Spark 2.3.x 
 and Beam are aligned on netty version but this can change in the future so 
 hopefully this can be shaded/controlled].

 On Wed, Jul 11, 2018 at 8:55 PM Andrew Pilloud  wrote:
>
> This is really cool and should cut down our artifact size significantly! 
> Thanks Luke!
>
> I am running into one issue after this: builds with the publishing flag 
> no longer work. (We run './gradlew -Ppublishing shadowJar' to generate 
> release artifacts for the Beam SQL shell.) I get a bunch of errors like 
> this:
>
> 

Re: Vendoring / Shading Protobuf and gRPC

2018-07-11 Thread Lukasz Cwik
Got a fix[1] for Andrews issue which turned out to be a release blocker
since it broke performing the release. Also fixed several minor things like
javadoc that were wrong with the release. Solving it allowed me to do the
publishing in parallel and cut the release time from 20+ mins to 8 mins on
my machine.

1: https://github.com/apache/beam/pull/5936

On Wed, Jul 11, 2018 at 3:51 PM Andrew Pilloud  wrote:

> We discussed this in person, sounds like my issue is known and will be
> fixed shortly. I'm running builds with '-Ppublishing' because I need to
> generate release artifacts for bundling the Beam SQL shell with the Google
> Cloud SDK. Hope to eventually just use the Beam release, but we are
> currently cutting a release off master every week to quickly iterate on bug
> fixes.
>
> Andrew
>
> On Wed, Jul 11, 2018 at 1:39 PM Lukasz Cwik  wrote:
>
>> Andrew, to my knowledge it seems as though your running into BEAM-4744,
>> is there a reason you need to specify -Ppublishing?
>>
>> No particular reason to using ByteString within ByteKey and TextSource.
>> Note that we currently do shade away protobuf in sdks/java/core so we could
>> either migrate to using a vendored version or re-implement the
>> functionality to not use ByteString. Note that sdks/java/core can now
>> dependend on the model/* classes and perform the Pipeline -> Proto
>> translation as this will be needed to support portability efforts so I
>> would prefer just migrating to use the vendored versions of the code. Filed
>> BEAM-4766.
>>
>> As for the IO module, I was referring to the upstream
>> bigtable/bigquery/... libraries vended by Google. If they trimmed their API
>> surface to not expose gRPC or protobuf, then we wouldn't have to worry
>> about having the shading logic within sdks/java/io/google-cloud-platform. I
>> know that this will be impossible for some connectors without backwards
>> incompatible changes since they exposed protobuf on their API surface. I
>> know that Chamikara was looking to shade this away in the
>> sdks/java/io/google-cloud-platform but only had limited success in the past.
>>
>> On Wed, Jul 11, 2018 at 1:14 PM Ismaël Mejía  wrote:
>>
>>> This is great news in particular for runners (Spark) where the leaking
>>> of some grpc subdependencies caused stability issues and required extra
>>> shading. Great !
>>>
>>> About the other modules
>>>
>>> > Note, these are the following modules that still depend on protobuf
>>> that are shaded away and could move to use a vendored variant of protobuf:
>>> > * sdks/java/core
>>> > * sdks/java/extensions/sql
>>>
>>> For sdks/java/core the dependency in protobuf seems to be minor, from a
>>> quick look it seems that it is only used to import ByteString in two
>>> classes: ByteKey and TextSource so hopefully we can rewrite both and get
>>> rid of the dependency altogether (making core smaller which is always a
>>> win).
>>> Can we fill a JIRA for this or do I miss other reasons to depend on
>>> protobuf in core?
>>>
>>> For sdks/java/extensions/sql I don’t know if I am missing something, but
>>> I don’t see any code use of protobuf and I doubt that calcite uses protobuf
>>> so maybe it is there just because it was leaking from somewhere else in
>>> Beam, we should better check this first.
>>>
>>> > These modules expose protobuf because it is part of the API surface:
>>> > * sdks/java/extensions/protobuf
>>> > * sdks/java/io/google-cloud-platform (I believe that gRPC could be
>>> shaded here but preferrably the IO module would do it so we wouldn't have
>>> this maintenance burden.)
>>>
>>> Can you please elaborate on ‘but preferrably the IO module would do it
>>> so we wouldn't have this maintenance burden’. I remember there was an issue
>>> when running the examples in the spark runner examples because of
>>> sdks/java/io/google-cloud-platform leaking netty via gRPC (BEAM-3519) [Note
>>> that this is hidden at this moment because of pure luck Spark 2.3.x and
>>> Beam are aligned on netty version but this can change in the future so
>>> hopefully this can be shaded/controlled].
>>>
>>> On Wed, Jul 11, 2018 at 8:55 PM Andrew Pilloud 
>>> wrote:
>>>
 This is really cool and should cut down our artifact size
 significantly! Thanks Luke!

 I am running into one issue after this: builds with the publishing flag
 no longer work. (We run './gradlew -Ppublishing shadowJar' to generate
 release artifacts for the Beam SQL shell.) I get a bunch of errors like
 this:

 model/job-management/build/generated/source/proto/main/java/org/apache/beam/model/jobmanagement/v1/JobApi.java:148:
 error: no suitable method found for
 readMessage(org.apache.beam.vendor.protobuf.v3.com.google.protobuf.Parser,ExtensionRegistryLite)

 Is there something I need to change in my build?

 Andrew

 On Tue, Jul 10, 2018 at 2:10 PM Lukasz Cwik  wrote:

> With the merge of PR #5594[1], we started shading all gRPC / Protobuf
> 

Re: Vendoring / Shading Protobuf and gRPC

2018-07-11 Thread Andrew Pilloud
We discussed this in person, sounds like my issue is known and will be
fixed shortly. I'm running builds with '-Ppublishing' because I need to
generate release artifacts for bundling the Beam SQL shell with the Google
Cloud SDK. Hope to eventually just use the Beam release, but we are
currently cutting a release off master every week to quickly iterate on bug
fixes.

Andrew

On Wed, Jul 11, 2018 at 1:39 PM Lukasz Cwik  wrote:

> Andrew, to my knowledge it seems as though your running into BEAM-4744, is
> there a reason you need to specify -Ppublishing?
>
> No particular reason to using ByteString within ByteKey and TextSource.
> Note that we currently do shade away protobuf in sdks/java/core so we could
> either migrate to using a vendored version or re-implement the
> functionality to not use ByteString. Note that sdks/java/core can now
> dependend on the model/* classes and perform the Pipeline -> Proto
> translation as this will be needed to support portability efforts so I
> would prefer just migrating to use the vendored versions of the code. Filed
> BEAM-4766.
>
> As for the IO module, I was referring to the upstream
> bigtable/bigquery/... libraries vended by Google. If they trimmed their API
> surface to not expose gRPC or protobuf, then we wouldn't have to worry
> about having the shading logic within sdks/java/io/google-cloud-platform. I
> know that this will be impossible for some connectors without backwards
> incompatible changes since they exposed protobuf on their API surface. I
> know that Chamikara was looking to shade this away in the
> sdks/java/io/google-cloud-platform but only had limited success in the past.
>
> On Wed, Jul 11, 2018 at 1:14 PM Ismaël Mejía  wrote:
>
>> This is great news in particular for runners (Spark) where the leaking of
>> some grpc subdependencies caused stability issues and required extra
>> shading. Great !
>>
>> About the other modules
>>
>> > Note, these are the following modules that still depend on protobuf
>> that are shaded away and could move to use a vendored variant of protobuf:
>> > * sdks/java/core
>> > * sdks/java/extensions/sql
>>
>> For sdks/java/core the dependency in protobuf seems to be minor, from a
>> quick look it seems that it is only used to import ByteString in two
>> classes: ByteKey and TextSource so hopefully we can rewrite both and get
>> rid of the dependency altogether (making core smaller which is always a
>> win).
>> Can we fill a JIRA for this or do I miss other reasons to depend on
>> protobuf in core?
>>
>> For sdks/java/extensions/sql I don’t know if I am missing something, but
>> I don’t see any code use of protobuf and I doubt that calcite uses protobuf
>> so maybe it is there just because it was leaking from somewhere else in
>> Beam, we should better check this first.
>>
>> > These modules expose protobuf because it is part of the API surface:
>> > * sdks/java/extensions/protobuf
>> > * sdks/java/io/google-cloud-platform (I believe that gRPC could be
>> shaded here but preferrably the IO module would do it so we wouldn't have
>> this maintenance burden.)
>>
>> Can you please elaborate on ‘but preferrably the IO module would do it so
>> we wouldn't have this maintenance burden’. I remember there was an issue
>> when running the examples in the spark runner examples because of
>> sdks/java/io/google-cloud-platform leaking netty via gRPC (BEAM-3519) [Note
>> that this is hidden at this moment because of pure luck Spark 2.3.x and
>> Beam are aligned on netty version but this can change in the future so
>> hopefully this can be shaded/controlled].
>>
>> On Wed, Jul 11, 2018 at 8:55 PM Andrew Pilloud 
>> wrote:
>>
>>> This is really cool and should cut down our artifact size significantly!
>>> Thanks Luke!
>>>
>>> I am running into one issue after this: builds with the publishing flag
>>> no longer work. (We run './gradlew -Ppublishing shadowJar' to generate
>>> release artifacts for the Beam SQL shell.) I get a bunch of errors like
>>> this:
>>>
>>> model/job-management/build/generated/source/proto/main/java/org/apache/beam/model/jobmanagement/v1/JobApi.java:148:
>>> error: no suitable method found for
>>> readMessage(org.apache.beam.vendor.protobuf.v3.com.google.protobuf.Parser,ExtensionRegistryLite)
>>>
>>> Is there something I need to change in my build?
>>>
>>> Andrew
>>>
>>> On Tue, Jul 10, 2018 at 2:10 PM Lukasz Cwik  wrote:
>>>
 With the merge of PR #5594[1], we started shading all gRPC / Protobuf
 dependencies within all the modules that depended on the model/*
 dependencies by vendoring them. The vendored versions are built and
 packaged into the model jars (they should be separated out once I figure
 out how to generate proto code using a shaded import path). Note that this
 cleaned up several issues where we were incorrectly built shaded jars
 without repackaging in some locations or the shading process was corrupting
 the contents of some of the jars.

 Note that the 

Re: Vendoring / Shading Protobuf and gRPC

2018-07-11 Thread Lukasz Cwik
Andrew, to my knowledge it seems as though your running into BEAM-4744, is
there a reason you need to specify -Ppublishing?

No particular reason to using ByteString within ByteKey and TextSource.
Note that we currently do shade away protobuf in sdks/java/core so we could
either migrate to using a vendored version or re-implement the
functionality to not use ByteString. Note that sdks/java/core can now
dependend on the model/* classes and perform the Pipeline -> Proto
translation as this will be needed to support portability efforts so I
would prefer just migrating to use the vendored versions of the code. Filed
BEAM-4766.

As for the IO module, I was referring to the upstream bigtable/bigquery/...
libraries vended by Google. If they trimmed their API surface to not expose
gRPC or protobuf, then we wouldn't have to worry about having the shading
logic within sdks/java/io/google-cloud-platform. I know that this will be
impossible for some connectors without backwards incompatible changes since
they exposed protobuf on their API surface. I know that Chamikara was
looking to shade this away in the sdks/java/io/google-cloud-platform but
only had limited success in the past.

On Wed, Jul 11, 2018 at 1:14 PM Ismaël Mejía  wrote:

> This is great news in particular for runners (Spark) where the leaking of
> some grpc subdependencies caused stability issues and required extra
> shading. Great !
>
> About the other modules
>
> > Note, these are the following modules that still depend on protobuf that
> are shaded away and could move to use a vendored variant of protobuf:
> > * sdks/java/core
> > * sdks/java/extensions/sql
>
> For sdks/java/core the dependency in protobuf seems to be minor, from a
> quick look it seems that it is only used to import ByteString in two
> classes: ByteKey and TextSource so hopefully we can rewrite both and get
> rid of the dependency altogether (making core smaller which is always a
> win).
> Can we fill a JIRA for this or do I miss other reasons to depend on
> protobuf in core?
>
> For sdks/java/extensions/sql I don’t know if I am missing something, but I
> don’t see any code use of protobuf and I doubt that calcite uses protobuf
> so maybe it is there just because it was leaking from somewhere else in
> Beam, we should better check this first.
>
> > These modules expose protobuf because it is part of the API surface:
> > * sdks/java/extensions/protobuf
> > * sdks/java/io/google-cloud-platform (I believe that gRPC could be
> shaded here but preferrably the IO module would do it so we wouldn't have
> this maintenance burden.)
>
> Can you please elaborate on ‘but preferrably the IO module would do it so
> we wouldn't have this maintenance burden’. I remember there was an issue
> when running the examples in the spark runner examples because of
> sdks/java/io/google-cloud-platform leaking netty via gRPC (BEAM-3519) [Note
> that this is hidden at this moment because of pure luck Spark 2.3.x and
> Beam are aligned on netty version but this can change in the future so
> hopefully this can be shaded/controlled].
>
> On Wed, Jul 11, 2018 at 8:55 PM Andrew Pilloud 
> wrote:
>
>> This is really cool and should cut down our artifact size significantly!
>> Thanks Luke!
>>
>> I am running into one issue after this: builds with the publishing flag
>> no longer work. (We run './gradlew -Ppublishing shadowJar' to generate
>> release artifacts for the Beam SQL shell.) I get a bunch of errors like
>> this:
>>
>> model/job-management/build/generated/source/proto/main/java/org/apache/beam/model/jobmanagement/v1/JobApi.java:148:
>> error: no suitable method found for
>> readMessage(org.apache.beam.vendor.protobuf.v3.com.google.protobuf.Parser,ExtensionRegistryLite)
>>
>> Is there something I need to change in my build?
>>
>> Andrew
>>
>> On Tue, Jul 10, 2018 at 2:10 PM Lukasz Cwik  wrote:
>>
>>> With the merge of PR #5594[1], we started shading all gRPC / Protobuf
>>> dependencies within all the modules that depended on the model/*
>>> dependencies by vendoring them. The vendored versions are built and
>>> packaged into the model jars (they should be separated out once I figure
>>> out how to generate proto code using a shaded import path). Note that this
>>> cleaned up several issues where we were incorrectly built shaded jars
>>> without repackaging in some locations or the shading process was corrupting
>>> the contents of some of the jars.
>>>
>>> Note that the majority of the code base (especially related to
>>> portability) should be using imports under the
>>> org.apache.beam.vendor.protobuf.v3 or org.apache.beam.vendor.grpc.v1 paths.
>>> I have yet to figure out a clean way to get Intellij to recognize these
>>> vendored paths. My only solution so far has been to manually add one of the
>>> built model jars to the compile classpath of the module being worked on in
>>> Intellij as described here[2]. I would greatly appreciate some ideas on how
>>> to improve this integration because from a few 

Vendoring / Shading Protobuf and gRPC

2018-07-10 Thread Lukasz Cwik
With the merge of PR #5594[1], we started shading all gRPC / Protobuf
dependencies within all the modules that depended on the model/*
dependencies by vendoring them. The vendored versions are built and
packaged into the model jars (they should be separated out once I figure
out how to generate proto code using a shaded import path). Note that this
cleaned up several issues where we were incorrectly built shaded jars
without repackaging in some locations or the shading process was corrupting
the contents of some of the jars.

Note that the majority of the code base (especially related to portability)
should be using imports under the org.apache.beam.vendor.protobuf.v3 or
org.apache.beam.vendor.grpc.v1 paths. I have yet to figure out a clean way
to get Intellij to recognize these vendored paths. My only solution so far
has been to manually add one of the built model jars to the compile
classpath of the module being worked on in Intellij as described here[2]. I
would greatly appreciate some ideas on how to improve this integration
because from a few attempts configuring the intellij gradle pluglin scope
sections didn't produce the result that I was expecting.

I also added a simple test task
*validateShadedJarDoesntLeakNonOrgApacheBeamClasses* that validates the
shaded jar doesn't contain classes without repackaging which we should
apply to any module that performs shading to ensure that classes are
relocated and we don't accidentally expose stuff. I filed BEAM-4753[3] to
this end.

Note, these are the following modules that still depend on protobuf that
are shaded away and could move to use a vendored variant of protobuf:
* sdks/java/core
* sdks/java/extensions/sql

These modules expose protobuf because it is part of the API surface:
* sdks/java/extensions/protobuf
* sdks/java/io/google-cloud-platform (I believe that gRPC could be shaded
here but preferrably the IO module would do it so we wouldn't have this
maintenance burden.)

1: https://github.com/apache/beam/pull/5594
2:
https://stackoverflow.com/questions/1051640/correct-way-to-add-external-jars-lib-jar-to-an-intellij-idea-project
3: https://issues.apache.org/jira/browse/BEAM-4753