We discussed this in person, sounds like my issue is known and will be fixed shortly. I'm running builds with '-Ppublishing' because I need to generate release artifacts for bundling the Beam SQL shell with the Google Cloud SDK. Hope to eventually just use the Beam release, but we are currently cutting a release off master every week to quickly iterate on bug fixes.
Andrew On Wed, Jul 11, 2018 at 1:39 PM Lukasz Cwik <[email protected]> wrote: > Andrew, to my knowledge it seems as though your running into BEAM-4744, is > there a reason you need to specify -Ppublishing? > > No particular reason to using ByteString within ByteKey and TextSource. > Note that we currently do shade away protobuf in sdks/java/core so we could > either migrate to using a vendored version or re-implement the > functionality to not use ByteString. Note that sdks/java/core can now > dependend on the model/* classes and perform the Pipeline -> Proto > translation as this will be needed to support portability efforts so I > would prefer just migrating to use the vendored versions of the code. Filed > BEAM-4766. > > As for the IO module, I was referring to the upstream > bigtable/bigquery/... libraries vended by Google. If they trimmed their API > surface to not expose gRPC or protobuf, then we wouldn't have to worry > about having the shading logic within sdks/java/io/google-cloud-platform. I > know that this will be impossible for some connectors without backwards > incompatible changes since they exposed protobuf on their API surface. I > know that Chamikara was looking to shade this away in the > sdks/java/io/google-cloud-platform but only had limited success in the past. > > On Wed, Jul 11, 2018 at 1:14 PM Ismaël Mejía <[email protected]> wrote: > >> This is great news in particular for runners (Spark) where the leaking of >> some grpc subdependencies caused stability issues and required extra >> shading. Great ! >> >> About the other modules >> >> > Note, these are the following modules that still depend on protobuf >> that are shaded away and could move to use a vendored variant of protobuf: >> > * sdks/java/core >> > * sdks/java/extensions/sql >> >> For sdks/java/core the dependency in protobuf seems to be minor, from a >> quick look it seems that it is only used to import ByteString in two >> classes: ByteKey and TextSource so hopefully we can rewrite both and get >> rid of the dependency altogether (making core smaller which is always a >> win). >> Can we fill a JIRA for this or do I miss other reasons to depend on >> protobuf in core? >> >> For sdks/java/extensions/sql I don’t know if I am missing something, but >> I don’t see any code use of protobuf and I doubt that calcite uses protobuf >> so maybe it is there just because it was leaking from somewhere else in >> Beam, we should better check this first. >> >> > These modules expose protobuf because it is part of the API surface: >> > * sdks/java/extensions/protobuf >> > * sdks/java/io/google-cloud-platform (I believe that gRPC could be >> shaded here but preferrably the IO module would do it so we wouldn't have >> this maintenance burden.) >> >> Can you please elaborate on ‘but preferrably the IO module would do it so >> we wouldn't have this maintenance burden’. I remember there was an issue >> when running the examples in the spark runner examples because of >> sdks/java/io/google-cloud-platform leaking netty via gRPC (BEAM-3519) [Note >> that this is hidden at this moment because of pure luck Spark 2.3.x and >> Beam are aligned on netty version but this can change in the future so >> hopefully this can be shaded/controlled]. >> >> On Wed, Jul 11, 2018 at 8:55 PM Andrew Pilloud <[email protected]> >> wrote: >> >>> This is really cool and should cut down our artifact size significantly! >>> Thanks Luke! >>> >>> I am running into one issue after this: builds with the publishing flag >>> no longer work. (We run './gradlew -Ppublishing shadowJar' to generate >>> release artifacts for the Beam SQL shell.) I get a bunch of errors like >>> this: >>> >>> model/job-management/build/generated/source/proto/main/java/org/apache/beam/model/jobmanagement/v1/JobApi.java:148: >>> error: no suitable method found for >>> readMessage(org.apache.beam.vendor.protobuf.v3.com.google.protobuf.Parser<Pipeline>,ExtensionRegistryLite) >>> >>> Is there something I need to change in my build? >>> >>> Andrew >>> >>> On Tue, Jul 10, 2018 at 2:10 PM Lukasz Cwik <[email protected]> wrote: >>> >>>> With the merge of PR #5594[1], we started shading all gRPC / Protobuf >>>> dependencies within all the modules that depended on the model/* >>>> dependencies by vendoring them. The vendored versions are built and >>>> packaged into the model jars (they should be separated out once I figure >>>> out how to generate proto code using a shaded import path). Note that this >>>> cleaned up several issues where we were incorrectly built shaded jars >>>> without repackaging in some locations or the shading process was corrupting >>>> the contents of some of the jars. >>>> >>>> Note that the majority of the code base (especially related to >>>> portability) should be using imports under the >>>> org.apache.beam.vendor.protobuf.v3 or org.apache.beam.vendor.grpc.v1 paths. >>>> I have yet to figure out a clean way to get Intellij to recognize these >>>> vendored paths. My only solution so far has been to manually add one of the >>>> built model jars to the compile classpath of the module being worked on in >>>> Intellij as described here[2]. I would greatly appreciate some ideas on how >>>> to improve this integration because from a few attempts configuring the >>>> intellij gradle pluglin scope sections didn't produce the result that I was >>>> expecting. >>>> >>>> I also added a simple test task >>>> *validateShadedJarDoesntLeakNonOrgApacheBeamClasses* that validates >>>> the shaded jar doesn't contain classes without repackaging which we should >>>> apply to any module that performs shading to ensure that classes are >>>> relocated and we don't accidentally expose stuff. I filed BEAM-4753[3] to >>>> this end. >>>> >>>> Note, these are the following modules that still depend on protobuf >>>> that are shaded away and could move to use a vendored variant of protobuf: >>>> * sdks/java/core >>>> * sdks/java/extensions/sql >>>> >>>> These modules expose protobuf because it is part of the API surface: >>>> * sdks/java/extensions/protobuf >>>> * sdks/java/io/google-cloud-platform (I believe that gRPC could be >>>> shaded here but preferrably the IO module would do it so we wouldn't have >>>> this maintenance burden.) >>>> >>>> 1: https://github.com/apache/beam/pull/5594 >>>> 2: >>>> https://stackoverflow.com/questions/1051640/correct-way-to-add-external-jars-lib-jar-to-an-intellij-idea-project >>>> 3: https://issues.apache.org/jira/browse/BEAM-4753 >>>> >>>
