PR has been merged. Thanks everyone for discussions.
-Vinay On Thu, Jan 9, 2020 at 4:47 PM Ayush Saxena <ayush...@gmail.com> wrote: > Hi All, > FYI : > We will be going ahead with the present approach, will merge by tomorrow > EOD. Considering no one has objections. > Thanx Everyone!!! > > -Ayush > > > On 07-Jan-2020, at 9:22 PM, Brahma Reddy Battula <bra...@apache.org> > wrote: > > > > Hi Sree vaddi,Owen,stack,Duo Zhang, > > > > We can move forward based on your comments, just waiting for your > > reply.Hope all of your comments answered..(unification we can think > > parallel thread as Vinay mentioned). > > > > > > > > On Mon, 6 Jan 2020 at 6:21 PM, Vinayakumar B <vinayakum...@apache.org> > > wrote: > > > >> Hi Sree, > >> > >>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating > >> Project ? Or as a TLP ? > >>> Or as a new project definition ? > >> As already mentioned by Ayush, this will be a subproject of Hadoop. > >> Releases will be voted by Hadoop PMC as per ASF process. > >> > >> > >>> The effort to streamline and put in an accepted standard for the > >> dependencies that require shading, > >>> seems beyond the siloed efforts of hadoop, hbase, etc.... > >> > >>> I propose, we bring all the decision makers from all these artifacts in > >> one room and decide best course of action. > >>> I am looking at, no projects should ever had to shade any artifacts > >> except as an absolute necessary alternative. > >> > >> This is the ideal proposal for any project. But unfortunately some > projects > >> takes their own course based on need. > >> > >> In the current case of protobuf in Hadoop, > >> Protobuf upgrade from 2.5.0 (which is already EOL) was not taken up > to > >> avoid downstream failures. Since Hadoop is a platform, its dependencies > >> will get added to downstream projects' classpath. So any change in > Hadoop's > >> dependencies will directly affect downstreams. Hadoop strictly follows > >> backward compatibility as far as possible. > >> Though protobuf provides wire compatibility b/w versions, it doesnt > >> provide compatibility for generated sources. > >> Now, to support ARM protobuf upgrade is mandatory. Using shading > >> technique, In Hadoop internally can upgrade to shaded protobuf 3.x and > >> still have 2.5.0 protobuf (deprecated) for downstreams. > >> > >> This shading is necessary to have both versions of protobuf supported. > >> (2.5.0 (non-shaded) for downstream's classpath and 3.x (shaded) for > >> hadoop's internal usage). > >> And this entire work to be done before 3.3.0 release. > >> > >> So, though its ideal to make a common approach for all projects, I > suggest > >> for Hadoop we can go ahead as per current approach. > >> We can also start the parallel effort to address these problems in a > >> separate discussion/proposal. Once the solution is available we can > revisit > >> and adopt new solution accordingly in all such projects (ex: HBase, > Hadoop, > >> Ratis). > >> > >> -Vinay > >> > >>> On Mon, Jan 6, 2020 at 12:39 AM Ayush Saxena <ayush...@gmail.com> > wrote: > >>> > >>> Hey Sree > >>> > >>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating > >>>> Project ? Or as a TLP ? > >>>> Or as a new project definition ? > >>>> > >>> A sub project of Apache Hadoop, having its own independent release > >> cycles. > >>> May be you can put this into the same column as ozone or as > >>> submarine(couple of months ago). > >>> > >>> Unifying for all, seems interesting but each project is independent and > >> has > >>> its own limitations and way of thinking, I don't think it would be an > >> easy > >>> task to bring all on the same table and get them agree to a common > stuff. > >>> > >>> I guess this has been into discussion since quite long, and there > hasn't > >>> been any other alternative suggested. Still we can hold up for a week, > if > >>> someone comes up with a better solution, else we can continue in the > >>> present direction. > >>> > >>> -Ayush > >>> > >>> > >>> > >>> On Sun, 5 Jan 2020 at 05:03, Sree Vaddi <sree_at_ch...@yahoo.com > >> .invalid> > >>> wrote: > >>> > >>>> apache/hadoop-thirdparty, How would it fit into ASF ? As an Incubating > >>>> Project ? Or as a TLP ? > >>>> Or as a new project definition ? > >>>> > >>>> The effort to streamline and put in an accepted standard for the > >>>> dependencies that require shading,seems beyond the siloed efforts of > >>>> hadoop, hbase, etc.... > >>>> > >>>> I propose, we bring all the decision makers from all these artifacts > in > >>>> one room and decide best course of action.I am looking at, no projects > >>>> should ever had to shade any artifacts except as an absolute necessary > >>>> alternative. > >>>> > >>>> > >>>> Thank you./Sree > >>>> > >>>> > >>>> > >>>> On Saturday, January 4, 2020, 7:49:18 AM PST, Vinayakumar B < > >>>> vinayakum...@apache.org> wrote: > >>>> > >>>> Hi, > >>>> Sorry for the late reply,. > >>>>>>> To be exact, how can we better use the thirdparty repo? Looking at > >>>> HBase as an example, it looks like everything that are known to break > a > >>> lot > >>>> after an update get shaded into the hbase-thirdparty artifact: guava, > >>>> netty, ... etc. > >>>> Is it the purpose to isolate these naughty dependencies? > >>>> Yes, shading is to isolate these naughty dependencies from downstream > >>>> classpath and have independent control on these upgrades without > >> breaking > >>>> downstreams. > >>>> > >>>> First PR https://github.com/apache/hadoop-thirdparty/pull/1 to create > >>> the > >>>> protobuf shaded jar is ready to merge. > >>>> > >>>> Please take a look if anyone interested, will be merged may be after > >> two > >>>> days if no objections. > >>>> > >>>> -Vinay > >>>> > >>>> > >>>> On Thu, Oct 10, 2019 at 3:30 AM Wei-Chiu Chuang <weic...@apache.org> > >>>> wrote: > >>>> > >>>>> Hi I am late to this but I am keen to understand more. > >>>>> > >>>>> To be exact, how can we better use the thirdparty repo? Looking at > >>> HBase > >>>>> as an example, it looks like everything that are known to break a lot > >>>> after > >>>>> an update get shaded into the hbase-thirdparty artifact: guava, > >> netty, > >>>> ... > >>>>> etc. > >>>>> Is it the purpose to isolate these naughty dependencies? > >>>>> > >>>>> On Wed, Oct 9, 2019 at 12:38 PM Vinayakumar B < > >> vinayakum...@apache.org > >>>> > >>>>> wrote: > >>>>> > >>>>>> Hi All, > >>>>>> > >>>>>> I have updated the PR as per @Owen O'Malley <owen.omal...@gmail.com > >>> > >>>>>> 's suggestions. > >>>>>> > >>>>>> i. Renamed the module to 'hadoop-shaded-protobuf37' > >>>>>> ii. Kept the shaded package to 'o.a.h.thirdparty.protobuf37' > >>>>>> > >>>>>> Please review!! > >>>>>> > >>>>>> Thanks, > >>>>>> -Vinay > >>>>>> > >>>>>> > >>>>>> On Sat, Sep 28, 2019 at 10:29 AM 张铎(Duo Zhang) < > >> palomino...@gmail.com > >>>> > >>>>>> wrote: > >>>>>> > >>>>>>> For HBase we have a separated repo for hbase-thirdparty > >>>>>>> > >>>>>>> https://github.com/apache/hbase-thirdparty > >>>>>>> > >>>>>>> We will publish the artifacts to nexus so we do not need to > >> include > >>>>>>> binaries in our git repo, just add a dependency in the pom. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>> > >>> > >> > https://mvnrepository.com/artifact/org.apache.hbase.thirdparty/hbase-shaded-protobuf > >>>>>>> > >>>>>>> > >>>>>>> And it has its own release cycles, only when there are special > >>>>>> requirements > >>>>>>> or we want to upgrade some of the dependencies. This is the vote > >>>> thread > >>>>>> for > >>>>>>> the newest release, where we want to provide a shaded gson for > >> jdk7. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>> > >>> > >> > https://lists.apache.org/thread.html/f12c589baabbc79c7fb2843422d4590bea982cd102e2bd9d21e9884b@%3Cdev.hbase.apache.org%3E > >>>>>>> > >>>>>>> > >>>>>>> Thanks. > >>>>>>> > >>>>>>> Vinayakumar B <vinayakum...@apache.org> 于2019年9月28日周六 上午1:28写道: > >>>>>>> > >>>>>>>> Please find replies inline. > >>>>>>>> > >>>>>>>> -Vinay > >>>>>>>> > >>>>>>>> On Fri, Sep 27, 2019 at 10:21 PM Owen O'Malley < > >>>>>> owen.omal...@gmail.com> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> I'm very unhappy with this direction. In particular, I don't > >>> think > >>>>>> git > >>>>>>> is > >>>>>>>>> a good place for distribution of binary artifacts. > >> Furthermore, > >>>> the > >>>>>> PMC > >>>>>>>>> shouldn't be releasing anything without a release vote. > >>>>>>>>> > >>>>>>>>> > >>>>>>>> Proposed solution doesnt release any binaries in git. Its > >>> actually a > >>>>>>>> complete sub-project which follows entire release process, > >>> including > >>>>>> VOTE > >>>>>>>> in public. I have mentioned already that release process is > >>> similar > >>>> to > >>>>>>>> hadoop. > >>>>>>>> To be specific, using the (almost) same script used in hadoop to > >>>>>> generate > >>>>>>>> artifacts, sign and deploy to staging repository. Please let me > >>> know > >>>>>> If I > >>>>>>>> am conveying anything wrong. > >>>>>>>> > >>>>>>>> > >>>>>>>>> I'd propose that we make a third party module that contains > >> the > >>>>>>> *source* > >>>>>>>>> of the pom files to build the relocated jars. This should > >>>>>> absolutely be > >>>>>>>>> treated as a last resort for the mostly Google projects that > >>>>>> regularly > >>>>>>>>> break binary compatibility (eg. Protobuf & Guava). > >>>>>>>>> > >>>>>>>>> > >>>>>>>> Same has been implemented in the PR > >>>>>>>> https://github.com/apache/hadoop-thirdparty/pull/1. Please > >> check > >>>> and > >>>>>> let > >>>>>>>> me > >>>>>>>> know If I misunderstood. Yes, this is the last option we have > >>> AFAIK. > >>>>>>>> > >>>>>>>> > >>>>>>>>> In terms of naming, I'd propose something like: > >>>>>>>>> > >>>>>>>>> org.apache.hadoop.thirdparty.protobuf2_5 > >>>>>>>>> org.apache.hadoop.thirdparty.guava28 > >>>>>>>>> > >>>>>>>>> In particular, I think we absolutely need to include the > >> version > >>>> of > >>>>>> the > >>>>>>>>> underlying project. On the other hand, since we should not be > >>>>>> shading > >>>>>>>>> *everything* we can drop the leading com.google. > >>>>>>>>> > >>>>>>>>> > >>>>>>>> IMO, This naming convention is easy for identifying the > >> underlying > >>>>>>> project, > >>>>>>>> but it will be difficult to maintain going forward if > >> underlying > >>>>>> project > >>>>>>>> versions changes. Since thirdparty module have its own releases, > >>>> each > >>>>>> of > >>>>>>>> those release can be mapped to specific version of underlying > >>>> project. > >>>>>>> Even > >>>>>>>> the binary artifact can include a MANIFEST with underlying > >> project > >>>>>>> details > >>>>>>>> as per Steve's suggestion on HADOOP-13363. > >>>>>>>> That said, if you still prefer to have project number in > >> artifact > >>>> id, > >>>>>> it > >>>>>>>> can be done. > >>>>>>>> > >>>>>>>> The Hadoop project can make releases of the thirdparty module: > >>>>>>>>> > >>>>>>>>> <dependency> > >>>>>>>>> <groupId>org.apache.hadoop</groupId> > >>>>>>>>> <artifactId>hadoop-thirdparty-protobuf25</artifactId> > >>>>>>>>> <version>1.0</version> > >>>>>>>>> </dependency> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> Note that the version has to be the hadoop thirdparty release > >>>> number, > >>>>>>> which > >>>>>>>>> is part of why you need to have the underlying version in the > >>>>>> artifact > >>>>>>>>> name. These we can push to maven central as new releases from > >>>>>> Hadoop. > >>>>>>>>> > >>>>>>>>> > >>>>>>>> Exactly, same has been implemented in the PR. hadoop-thirdparty > >>>> module > >>>>>>> have > >>>>>>>> its own releases. But in HADOOP Jira, thirdparty versions can be > >>>>>>>> differentiated using prefix "thirdparty-". > >>>>>>>> > >>>>>>>> Same solution is being followed in HBase. May be people involved > >>> in > >>>>>> HBase > >>>>>>>> can add some points here. > >>>>>>>> > >>>>>>>> Thoughts? > >>>>>>>>> > >>>>>>>>> .. Owen > >>>>>>>>> > >>>>>>>>> On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B < > >>>>>> vinayakum...@apache.org > >>>>>>>> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Hi All, > >>>>>>>>>> > >>>>>>>>>> I wanted to discuss about the separate repo for thirdparty > >>>>>>>> dependencies > >>>>>>>>>> which we need to shaded and include in Hadoop component's > >> jars. > >>>>>>>>>> > >>>>>>>>>> Apologies for the big text ahead, but this needs clear > >>>>>>> explanation!! > >>>>>>>>>> > >>>>>>>>>> Right now most needed such dependency is protobuf. > >> Protobuf > >>>>>>>> dependency > >>>>>>>>>> was not upgraded from 2.5.0 onwards with the fear that > >>> downstream > >>>>>>>> builds, > >>>>>>>>>> which depends on transitive dependency protobuf coming from > >>>>>> hadoop's > >>>>>>>> jars, > >>>>>>>>>> may fail with the upgrade. Apparently protobuf does not > >>> guarantee > >>>>>>> source > >>>>>>>>>> compatibility, though it guarantees wire compatibility > >> between > >>>>>>> versions. > >>>>>>>>>> Because of this behavior, version upgrade may cause breakage > >> in > >>>>>> known > >>>>>>>> and > >>>>>>>>>> unknown (private?) downstreams. > >>>>>>>>>> > >>>>>>>>>> So to tackle this, we came up the following proposal in > >>>>>>> HADOOP-13363. > >>>>>>>>>> > >>>>>>>>>> Luckily, As far as I know, no APIs, either public to user > >> or > >>>>>>> between > >>>>>>>>>> Hadoop processes, is not directly using protobuf classes in > >>>>>>> signatures. > >>>>>>>>>> (If > >>>>>>>>>> any exist, please let us know). > >>>>>>>>>> > >>>>>>>>>> Proposal: > >>>>>>>>>> ------------ > >>>>>>>>>> > >>>>>>>>>> 1. Create a artifact(s) which contains shaded > >> dependencies. > >>>> All > >>>>>>> such > >>>>>>>>>> shading/relocation will be with known prefix > >>>>>>>>>> **org.apache.hadoop.thirdparty.**. > >>>>>>>>>> 2. Right now protobuf jar (ex: > >>>>>>>> o.a.h.thirdparty:hadoop-shaded-protobuf) > >>>>>>>>>> to start with, all **com.google.protobuf** classes will be > >>>>>> relocated > >>>>>>> as > >>>>>>>>>> **org.apache.hadoop.thirdparty.com.google.protobuf**. > >>>>>>>>>> 3. Hadoop modules, which needs protobuf as dependency, > >> will > >>>> add > >>>>>>> this > >>>>>>>>>> shaded artifact as dependency (ex: > >>>>>>>>>> o.a.h.thirdparty:hadoop-shaded-protobuf). > >>>>>>>>>> 4. All previous usages of "com.google.protobuf" will be > >>>>>> relocated > >>>>>>> to > >>>>>>>>>> "org.apache.hadoop.thirdparty.com.google.protobuf" in the > >> code > >>>> and > >>>>>>> will > >>>>>>>> be > >>>>>>>>>> committed. Please note, this replacement is One-Time directly > >>> in > >>>>>>> source > >>>>>>>>>> code, NOT during compile and package. > >>>>>>>>>> 5. Once all usages of "com.google.protobuf" is relocated, > >>> then > >>>>>>> hadoop > >>>>>>>>>> dont care about which version of original "protobuf-java" is > >>> in > >>>>>>>>>> dependency. > >>>>>>>>>> 6. Just keep "protobuf-java:2.5.0" in dependency tree not > >> to > >>>>>> break > >>>>>>>> the > >>>>>>>>>> downstreams. But hadoop will be originally using the latest > >>>>>> protobuf > >>>>>>>>>> present in "o.a.h.thirdparty:hadoop-shaded-protobuf". > >>>>>>>>>> > >>>>>>>>>> 7. Coming back to separate repo, Following are most > >>>> appropriate > >>>>>>>> reasons > >>>>>>>>>> of keeping shaded dependency artifact in separate repo > >> instead > >>> of > >>>>>>>>>> submodule. > >>>>>>>>>> > >>>>>>>>>> 7a. These artifacts need not be built all the time. It > >>> needs > >>>>>> to > >>>>>>> be > >>>>>>>>>> built only when there is a change in the dependency version > >> or > >>>> the > >>>>>>> build > >>>>>>>>>> process. > >>>>>>>>>> 7b. If added as "submodule in Hadoop repo", > >>>>>>>> maven-shade-plugin:shade > >>>>>>>>>> will execute only in package phase. That means, "mvn compile" > >>> or > >>>>>> "mvn > >>>>>>>>>> test-compile" will not be failed as this artifact will not > >> have > >>>>>>>> relocated > >>>>>>>>>> classes, instead it will have original classes, resulting in > >>>>>>> compilation > >>>>>>>>>> failure. Workaround, build thirdparty submodule first and > >>> exclude > >>>>>>>>>> "thirdparty" submodule in other executions. This will be a > >>>> complex > >>>>>>>> process > >>>>>>>>>> compared to keeping in a separate repo. > >>>>>>>>>> > >>>>>>>>>> 7c. Separate repo, will be a subproject of Hadoop, using > >>> the > >>>>>>> same > >>>>>>>>>> HADOOP jira project, with different versioning prefixed with > >>>>>>>> "thirdparty-" > >>>>>>>>>> (ex: thirdparty-1.0.0). > >>>>>>>>>> 7d. Separate will have same release process as Hadoop. > >>>>>>>>>> > >>>>>>>>>> HADOOP-13363 ( > >>>>>> https://issues.apache.org/jira/browse/HADOOP-13363) > >>>>>>>> is > >>>>>>>>>> an > >>>>>>>>>> umbrella jira tracking the changes to protobuf upgrade. > >>>>>>>>>> > >>>>>>>>>> PR (https://github.com/apache/hadoop-thirdparty/pull/1) > >> has > >>>>>> been > >>>>>>>>>> raised > >>>>>>>>>> for separate repo creation in (HADOOP-16595 ( > >>>>>>>>>> https://issues.apache.org/jira/browse/HADOOP-16595) > >>>>>>>>>> > >>>>>>>>>> Please provide your inputs for the proposal and review the > >>> PR > >>>>>> to > >>>>>>>>>> proceed with the proposal. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> -Thanks, > >>>>>>>>>> Vinay > >>>>>>>>>> > >>>>>>>>>> On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli < > >>>>>>>>>> vino...@apache.org> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Moving the thread to the dev lists. > >>>>>>>>>>> > >>>>>>>>>>> Thanks > >>>>>>>>>>> +Vinod > >>>>>>>>>>> > >>>>>>>>>>>> On Sep 23, 2019, at 11:43 PM, Vinayakumar B < > >>>>>>>> vinayakum...@apache.org> > >>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks Marton, > >>>>>>>>>>>> > >>>>>>>>>>>> Current created 'hadoop-thirdparty' repo is empty right > >>> now. > >>>>>>>>>>>> Whether to use that repo for shaded artifact or not will > >>> be > >>>>>>>>>> monitored in > >>>>>>>>>>>> HADOOP-13363 umbrella jira. Please feel free to join the > >>>>>>> discussion. > >>>>>>>>>>>> > >>>>>>>>>>>> There is no existing codebase is being moved out of > >> hadoop > >>>>>> repo. > >>>>>>> So > >>>>>>>> I > >>>>>>>>>>> think > >>>>>>>>>>>> right now we are good to go. > >>>>>>>>>>>> > >>>>>>>>>>>> -Vinay > >>>>>>>>>>>> > >>>>>>>>>>>> On Mon, Sep 23, 2019 at 11:38 PM Marton Elek < > >>>> e...@apache.org> > >>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> I am not sure if it's defined when is a vote required. > >>>>>>>>>>>>> > >>>>>>>>>>>>> https://www.apache.org/foundation/voting.html > >>>>>>>>>>>>> > >>>>>>>>>>>>> Personally I think it's a big enough change to send a > >>>>>>> notification > >>>>>>>> to > >>>>>>>>>>> the > >>>>>>>>>>>>> dev lists with a 'lazy consensus' closure > >>>>>>>>>>>>> > >>>>>>>>>>>>> Marton > >>>>>>>>>>>>> > >>>>>>>>>>>>> On 2019/09/23 17:46:37, Vinayakumar B < > >>>>>> vinayakum...@apache.org> > >>>>>>>>>> wrote: > >>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> As discussed in HADOOP-13363, protobuf 3.x jar (and may > >>> be > >>>>>> more > >>>>>>> in > >>>>>>>>>>>>> future) > >>>>>>>>>>>>>> will be kept as a shaded artifact in a separate repo, > >>> which > >>>>>> will > >>>>>>>> be > >>>>>>>>>>>>>> referred as dependency in hadoop modules. This > >> approach > >>>>>> avoids > >>>>>>>>>> shading > >>>>>>>>>>>>> of > >>>>>>>>>>>>>> every submodule during build. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> So question is does any VOTE required before asking to > >>>>>> create a > >>>>>>>> git > >>>>>>>>>>> repo? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On selfserve platform > >>>>>>>> https://gitbox.apache.org/setup/newrepo.html > >>>>>>>>>>>>>> I can access see that, requester should be PMC. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Wanted to confirm here first. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> -Vinay > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>> > >>>> --------------------------------------------------------------------- > >>>>>>>>>>>>> To unsubscribe, e-mail: > >>>> private-unsubscr...@hadoop.apache.org > >>>>>>>>>>>>> For additional commands, e-mail: > >>>>>> private-h...@hadoop.apache.org > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>> > >> > > -- > > > > > > > > --Brahma Reddy Battula > > --------------------------------------------------------------------- > To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org > >