We discussed this in person, sounds like my issue is known and will be
fixed shortly. I'm running builds with '-Ppublishing' because I need to
generate release artifacts for bundling the Beam SQL shell with the Google
Cloud SDK. Hope to eventually just use the Beam release, but we are
currently cutting a release off master every week to quickly iterate on bug
fixes.

Andrew

On Wed, Jul 11, 2018 at 1:39 PM Lukasz Cwik <lc...@google.com> wrote:

> Andrew, to my knowledge it seems as though your running into BEAM-4744, is
> there a reason you need to specify -Ppublishing?
>
> No particular reason to using ByteString within ByteKey and TextSource.
> Note that we currently do shade away protobuf in sdks/java/core so we could
> either migrate to using a vendored version or re-implement the
> functionality to not use ByteString. Note that sdks/java/core can now
> dependend on the model/* classes and perform the Pipeline -> Proto
> translation as this will be needed to support portability efforts so I
> would prefer just migrating to use the vendored versions of the code. Filed
> BEAM-4766.
>
> As for the IO module, I was referring to the upstream
> bigtable/bigquery/... libraries vended by Google. If they trimmed their API
> surface to not expose gRPC or protobuf, then we wouldn't have to worry
> about having the shading logic within sdks/java/io/google-cloud-platform. I
> know that this will be impossible for some connectors without backwards
> incompatible changes since they exposed protobuf on their API surface. I
> know that Chamikara was looking to shade this away in the
> sdks/java/io/google-cloud-platform but only had limited success in the past.
>
> On Wed, Jul 11, 2018 at 1:14 PM Ismaël Mejía <ieme...@gmail.com> wrote:
>
>> This is great news in particular for runners (Spark) where the leaking of
>> some grpc subdependencies caused stability issues and required extra
>> shading. Great !
>>
>> About the other modules
>>
>> > Note, these are the following modules that still depend on protobuf
>> that are shaded away and could move to use a vendored variant of protobuf:
>> > * sdks/java/core
>> > * sdks/java/extensions/sql
>>
>> For sdks/java/core the dependency in protobuf seems to be minor, from a
>> quick look it seems that it is only used to import ByteString in two
>> classes: ByteKey and TextSource so hopefully we can rewrite both and get
>> rid of the dependency altogether (making core smaller which is always a
>> win).
>> Can we fill a JIRA for this or do I miss other reasons to depend on
>> protobuf in core?
>>
>> For sdks/java/extensions/sql I don’t know if I am missing something, but
>> I don’t see any code use of protobuf and I doubt that calcite uses protobuf
>> so maybe it is there just because it was leaking from somewhere else in
>> Beam, we should better check this first.
>>
>> > These modules expose protobuf because it is part of the API surface:
>> > * sdks/java/extensions/protobuf
>> > * sdks/java/io/google-cloud-platform (I believe that gRPC could be
>> shaded here but preferrably the IO module would do it so we wouldn't have
>> this maintenance burden.)
>>
>> Can you please elaborate on ‘but preferrably the IO module would do it so
>> we wouldn't have this maintenance burden’. I remember there was an issue
>> when running the examples in the spark runner examples because of
>> sdks/java/io/google-cloud-platform leaking netty via gRPC (BEAM-3519) [Note
>> that this is hidden at this moment because of pure luck Spark 2.3.x and
>> Beam are aligned on netty version but this can change in the future so
>> hopefully this can be shaded/controlled].
>>
>> On Wed, Jul 11, 2018 at 8:55 PM Andrew Pilloud <apill...@google.com>
>> wrote:
>>
>>> This is really cool and should cut down our artifact size significantly!
>>> Thanks Luke!
>>>
>>> I am running into one issue after this: builds with the publishing flag
>>> no longer work. (We run './gradlew -Ppublishing shadowJar' to generate
>>> release artifacts for the Beam SQL shell.) I get a bunch of errors like
>>> this:
>>>
>>> model/job-management/build/generated/source/proto/main/java/org/apache/beam/model/jobmanagement/v1/JobApi.java:148:
>>> error: no suitable method found for
>>> readMessage(org.apache.beam.vendor.protobuf.v3.com.google.protobuf.Parser<Pipeline>,ExtensionRegistryLite)
>>>
>>> Is there something I need to change in my build?
>>>
>>> Andrew
>>>
>>> On Tue, Jul 10, 2018 at 2:10 PM Lukasz Cwik <lc...@google.com> wrote:
>>>
>>>> With the merge of PR #5594[1], we started shading all gRPC / Protobuf
>>>> dependencies within all the modules that depended on the model/*
>>>> dependencies by vendoring them. The vendored versions are built and
>>>> packaged into the model jars (they should be separated out once I figure
>>>> out how to generate proto code using a shaded import path). Note that this
>>>> cleaned up several issues where we were incorrectly built shaded jars
>>>> without repackaging in some locations or the shading process was corrupting
>>>> the contents of some of the jars.
>>>>
>>>> Note that the majority of the code base (especially related to
>>>> portability) should be using imports under the
>>>> org.apache.beam.vendor.protobuf.v3 or org.apache.beam.vendor.grpc.v1 paths.
>>>> I have yet to figure out a clean way to get Intellij to recognize these
>>>> vendored paths. My only solution so far has been to manually add one of the
>>>> built model jars to the compile classpath of the module being worked on in
>>>> Intellij as described here[2]. I would greatly appreciate some ideas on how
>>>> to improve this integration because from a few attempts configuring the
>>>> intellij gradle pluglin scope sections didn't produce the result that I was
>>>> expecting.
>>>>
>>>> I also added a simple test task
>>>> *validateShadedJarDoesntLeakNonOrgApacheBeamClasses* that validates
>>>> the shaded jar doesn't contain classes without repackaging which we should
>>>> apply to any module that performs shading to ensure that classes are
>>>> relocated and we don't accidentally expose stuff. I filed BEAM-4753[3] to
>>>> this end.
>>>>
>>>> Note, these are the following modules that still depend on protobuf
>>>> that are shaded away and could move to use a vendored variant of protobuf:
>>>> * sdks/java/core
>>>> * sdks/java/extensions/sql
>>>>
>>>> These modules expose protobuf because it is part of the API surface:
>>>> * sdks/java/extensions/protobuf
>>>> * sdks/java/io/google-cloud-platform (I believe that gRPC could be
>>>> shaded here but preferrably the IO module would do it so we wouldn't have
>>>> this maintenance burden.)
>>>>
>>>> 1: https://github.com/apache/beam/pull/5594
>>>> 2:
>>>> https://stackoverflow.com/questions/1051640/correct-way-to-add-external-jars-lib-jar-to-an-intellij-idea-project
>>>> 3: https://issues.apache.org/jira/browse/BEAM-4753
>>>>
>>>

Reply via email to