Hi I am late to this but I am keen to understand more. To be exact, how can we better use the thirdparty repo? Looking at HBase as an example, it looks like everything that are known to break a lot after an update get shaded into the hbase-thirdparty artifact: guava, netty, ... etc. Is it the purpose to isolate these naughty dependencies?
On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B <vinayakum...@apache.org> wrote: > Hi All, > > I have updated the PR as per @Owen O'Malley <owen.omal...@gmail.com> > 's suggestions. > > i. Renamed the module to 'hadoop-shaded-protobuf37' > ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37' > > Please review!! > > Thanks, > -Vinay > > > On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) <palomino...@gmail.com> > wrote: > > > For HBase we have a separated repo for hbase-thirdparty > > > > https://github.com/apache/hbase-thirdparty > > > > We will publish the artifacts to nexus so we do not need to include > > binaries in our git repo, just add a dependency in the pom. > > > > > > > https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf > > > > > > And it has its own release cycles, only when there are special > requirements > > or we want to upgrade some of the dependencies. This is the vote thread > for > > the newest release, where we want to provide a shaded gson for jdk7. > > > > > > > https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E > > > > > > Thanks. > > > > Vinayakumar B <vinayakum...@apache.org> 于2019年9月28日周六 上午1:28写道: > > > > > Please find replies inline. > > > > > > -Vinay > > > > > > On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley <owen.omal...@gmail.com > > > > > wrote: > > > > > > > I'm very unhappy with this direction. In particular, I don't think > git > > is > > > > a good place for distribution of binary artifacts. Furthermore, the > PMC > > > > shouldn't be releasing anything without a release vote. > > > > > > > > > > > Proposed solution doesnt release any binaries in git. Its actually a > > > complete sub-project which follows entire release process, including > VOTE > > > in public. I have mentioned already that release process is similar to > > > hadoop. > > > To be specific, using the (almost) same script used in hadoop to > generate > > > artifacts, sign and deploy to staging repository. Please let me know > If I > > > am conveying anything wrong. > > > > > > > > > > I'd propose that we make a third party module that contains the > > *source* > > > > of the pom files to build the relocated jars. This should absolutely > be > > > > treated as a last resort for the mostly Google projects that > regularly > > > > break binary compatibility (eg. Protobuf & Guava). > > > > > > > > > > > Same has been implemented in the PR > > > https://github.com/apache/hadoop-thirdparty/pull/1. Please check and > let > > > me > > > know If I misunderstood. Yes, this is the last option we have AFAIK. > > > > > > > > > > In terms of naming, I'd propose something like: > > > > > > > > org.apache.hadoop.thirdparty.protobuf2_5 > > > > org.apache.hadoop.thirdparty.guava28 > > > > > > > > In particular, I think we absolutely need to include the version of > the > > > > underlying project. On the other hand, since we should not be shading > > > > *everything* we can drop the leading com.google. > > > > > > > > > > > IMO, This naming convention is easy for identifying the underlying > > project, > > > but it will be difficult to maintain going forward if underlying > project > > > versions changes. Since thirdparty module have its own releases, each > of > > > those release can be mapped to specific version of underlying project. > > Even > > > the binary artifact can include a MANIFEST with underlying project > > details > > > as per Steve's suggestion on HADOOP-13363. > > > That said, if you still prefer to have project number in artifact id, > it > > > can be done. > > > > > > The Hadoop project can make releases of the thirdparty module: > > > > > > > > <dependency> > > > > <groupId>org.apache.hadoop</groupId> > > > > <artifactId>hadoop-thirdparty-protobuf25</artifactId> > > > > <version>1.0</version> > > > > </dependency> > > > > > > > > > > > Note that the version has to be the hadoop thirdparty release number, > > which > > > > is part of why you need to have the underlying version in the > artifact > > > > name. These we can push to maven central as new releases from Hadoop. > > > > > > > > > > > Exactly, same has been implemented in the PR. hadoop-thirdparty module > > have > > > its own releases. But in HADOOP Jira, thirdparty versions can be > > > differentiated using prefix "thirdparty-". > > > > > > Same solution is being followed in HBase. May be people involved in > HBase > > > can add some points here. > > > > > > Thoughts? > > > > > > > > .. Owen > > > > > > > > On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B < > vinayakum...@apache.org > > > > > > > wrote: > > > > > > > >> Hi All, > > > >> > > > >> I wanted to discuss about the separate repo for thirdparty > > > dependencies > > > >> which we need to shaded and include in Hadoop component's jars. > > > >> > > > >> Apologies for the big text ahead, but this needs clear > > explanation!! > > > >> > > > >> Right now most needed such dependency is protobuf. Protobuf > > > dependency > > > >> was not upgraded from 2.5.0 onwards with the fear that downstream > > > builds, > > > >> which depends on transitive dependency protobuf coming from hadoop's > > > jars, > > > >> may fail with the upgrade. Apparently protobuf does not guarantee > > source > > > >> compatibility, though it guarantees wire compatibility between > > versions. > > > >> Because of this behavior, version upgrade may cause breakage in > known > > > and > > > >> unknown (private?) downstreams. > > > >> > > > >> So to tackle this, we came up the following proposal in > > HADOOP-13363. > > > >> > > > >> Luckily, As far as I know, no APIs, either public to user or > > between > > > >> Hadoop processes, is not directly using protobuf classes in > > signatures. > > > >> (If > > > >> any exist, please let us know). > > > >> > > > >> Proposal: > > > >> ------------ > > > >> > > > >> 1. Create a artifact(s) which contains shaded dependencies. All > > such > > > >> shading/relocation will be with known prefix > > > >> **org.apache.hadoop.thirdparty.**. > > > >> 2. Right now protobuf jar (ex: > > > o.a.h.thirdparty:hadoop-shaded-protobuf) > > > >> to start with, all **com.google.protobuf** classes will be relocated > > as > > > >> **org.apache.hadoop.thirdparty.com.google.protobuf**. > > > >> 3. Hadoop modules, which needs protobuf as dependency, will add > > this > > > >> shaded artifact as dependency (ex: > > > >> o.a.h.thirdparty:hadoop-shaded-protobuf). > > > >> 4. All previous usages of "com.google.protobuf" will be relocated > > to > > > >> "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and > > will > > > be > > > >> committed. Please note, this replacement is One-Time directly in > > source > > > >> code, NOT during compile and package. > > > >> 5. Once all usages of "com.google.protobuf" is relocated, then > > hadoop > > > >> dont care about which version of original "protobuf-java" is in > > > >> dependency. > > > >> 6. Just keep "protobuf-java:2.5.0" in dependency tree not to > break > > > the > > > >> downstreams. But hadoop will be originally using the latest protobuf > > > >> present in "o.a.h.thirdparty:hadoop-shaded-protobuf". > > > >> > > > >> 7. Coming back to separate repo, Following are most appropriate > > > reasons > > > >> of keeping shaded dependency artifact in separate repo instead of > > > >> submodule. > > > >> > > > >> 7a. These artifacts need not be built all the time. It needs > to > > be > > > >> built only when there is a change in the dependency version or the > > build > > > >> process. > > > >> 7b. If added as "submodule in Hadoop repo", > > > maven-shade-plugin:shade > > > >> will execute only in package phase. That means, "mvn compile" or > "mvn > > > >> test-compile" will not be failed as this artifact will not have > > > relocated > > > >> classes, instead it will have original classes, resulting in > > compilation > > > >> failure. Workaround, build thirdparty submodule first and exclude > > > >> "thirdparty" submodule in other executions. This will be a complex > > > process > > > >> compared to keeping in a separate repo. > > > >> > > > >> 7c. Separate repo, will be a subproject of Hadoop, using the > > same > > > >> HADOOP jira project, with different versioning prefixed with > > > "thirdparty-" > > > >> (ex: thirdparty-1.0.0). > > > >> 7d. Separate will have same release process as Hadoop. > > > >> > > > >> HADOOP-13363 ( > https://issues.apache.org/jira/browse/HADOOP-13363) > > > is > > > >> an > > > >> umbrella jira tracking the changes to protobuf upgrade. > > > >> > > > >> PR (https://github.com/apache/hadoop-thirdparty/pull/1) has > been > > > >> raised > > > >> for separate repo creation in (HADOOP-16595 ( > > > >> https://issues.apache.org/jira/browse/HADOOP-16595) > > > >> > > > >> Please provide your inputs for the proposal and review the PR to > > > >> proceed with the proposal. > > > >> > > > >> > > > > -Thanks, > > > >> Vinay > > > >> > > > >> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli < > > > >> vino...@apache.org> > > > >> wrote: > > > >> > > > >> > Moving the thread to the dev lists. > > > >> > > > > >> > Thanks > > > >> > +Vinod > > > >> > > > > >> > > On Sep 23, 2019, at 11:43 PM, Vinayakumar B < > > > vinayakum...@apache.org> > > > >> > wrote: > > > >> > > > > > >> > > Thanks Marton, > > > >> > > > > > >> > > Current created 'hadoop-thirdparty' repo is empty right now. > > > >> > > Whether to use that repo for shaded artifact or not will be > > > >> monitored in > > > >> > > HADOOP-13363 umbrella jira. Please feel free to join the > > discussion. > > > >> > > > > > >> > > There is no existing codebase is being moved out of hadoop repo. > > So > > > I > > > >> > think > > > >> > > right now we are good to go. > > > >> > > > > > >> > > -Vinay > > > >> > > > > > >> > > On Mon, Sep 23, 2019 at 11:38 PM Marton Elek <e...@apache.org> > > > wrote: > > > >> > > > > > >> > >> > > > >> > >> I am not sure if it's defined when is a vote required. > > > >> > >> > > > >> > >> https://www.apache.org/foundation/voting.html > > > >> > >> > > > >> > >> Personally I think it's a big enough change to send a > > notification > > > to > > > >> > the > > > >> > >> dev lists with a 'lazy consensus' closure > > > >> > >> > > > >> > >> Marton > > > >> > >> > > > >> > >> On 2019/09/23 17:46:37, Vinayakumar B <vinayakum...@apache.org > > > > > >> wrote: > > > >> > >>> Hi, > > > >> > >>> > > > >> > >>> As discussed in HADOOP-13363, protobuf 3.x jar (and may be > more > > in > > > >> > >> future) > > > >> > >>> will be kept as a shaded artifact in a separate repo, which > will > > > be > > > >> > >>> referred as dependency in hadoop modules. This approach > avoids > > > >> shading > > > >> > >> of > > > >> > >>> every submodule during build. > > > >> > >>> > > > >> > >>> So question is does any VOTE required before asking to create > a > > > git > > > >> > repo? > > > >> > >>> > > > >> > >>> On selfserve platform > > > https://gitbox.apache.org/setup/newrepo.html > > > >> > >>> I can access see that, requester should be PMC. > > > >> > >>> > > > >> > >>> Wanted to confirm here first. > > > >> > >>> > > > >> > >>> -Vinay > > > >> > >>> > > > >> > >> > > > >> > >> > > > --------------------------------------------------------------------- > > > >> > >> To unsubscribe, e-mail: private-unsubscr...@hadoop.apache.org > > > >> > >> For additional commands, e-mail: > private-h...@hadoop.apache.org > > > >> > >> > > > >> > >> > > > >> > > > > >> > > > > >> > > > > > > > > > >