Hi All, Thank you Daniel for the update! I was also writing one when your email arrived so I'm just adding a couple of comments to that.
New major version in JIRA: Version 3.0.0 has been created in JIRA <https://issues.apache.org/jira/projects/SQOOP/summary>, please feel free to use it on the corresponding JIRAs from now. As per my previous email I see no point in doing an 1.5.0 release currently so I'm OK with moving all the JIRAs having fix/target version of 1.5.0 to 3.0.0. Any objections? Update on the dependencies of the release: * Gradle patch needs some finalization and can be committed soon: https://reviews.apache.org/r/66067/ * Kite removal effort has been started: SQOOP-3313 <https://issues.apache.org/jira/browse/SQOOP-3313> * Hive 3.0.0 release is still in an early phase based on this email thread <https://mail-archives.apache.org/mod_mbox/hive-dev/201804.mbox/%3c2ec60da6-0a2e-4f3a-92f2-e3ce9d497...@hortonworks.com%3E> and has no ETA yet Thanks Daniel for looking into the Hadoop compatibility question, please let us know your findings. Cheers, Bogi On Thu, May 10, 2018 at 3:27 PM, Dániel Vörös <daniel.vo...@gmail.com> wrote: > Dear All, > > After Bogi has created the 3.0.0 version in Jira I've applied it to a > couple of tickets that don't make sense on the 1.x line (without > Hadoop3/Hive3). > > However, as Bogi has mentioned in her previous email, it probably doesn't > make sense to work on a 1.5 release in parallel with 3.0.0. How would you > feel if we were to move all 1.5 issues [1] to 3.0.0? > > In the meantime I've experimented with running Sqoop 1.4.7 against Hadoop > 3.1.0, and I'm planning to do the opposite, running Sqoop 3.0.0-SNAPSHOT > against Hadoop 2.x. That way we'd be able to better assess Attila's > question about backward compatibility. Please note, that the hard part will > be Hive integration I'm afraid, and until there's no Hive 3.0 release it's > hard to test. If anyone's interested in this topic, check out [2]. > > Regards, > Daniel > > [1] > https://issues.apache.org/jira/issues?jql=project%20%3D%20SQ > OOP%20and%20fixVersion%20%3D%201.5.0%20and%20resolutionDate% > 20is%20not%20%20empty%20order%20by%20resolutiondate%20desc > [2] https://github.com/dvoros/docker-sqoop > > On Mon, Apr 16, 2018 at 2:20 PM Szabolcs Vasas <va...@apache.org> wrote: > > > Hi All, > > > > Sqoop NG/Sqoop 3: > > As far as I remember Sqoop NG was an alternative name suggested for > Sqoop 2 > > which has a totally different architecture than Sqoop 1. I would not use > > now since in this release we do not include changes affecting the > > architecture but bumping the versions of the dependencies. However since > > dependencies are bumped to another major releases I think we should also > > change the major version number of Sqoop. > > > > Hadoop 2 support: > > I agree with Daniel that we should not introduce extra complexity to > > support Hadoop 2 as well. However even if we don't support Hadoop 2 in > our > > next major Sqoop release some features which do not require Hadoop 3 > could > > be backported by the vendors to their earlier releases as well. I think > > introducing a 1.x branch upstream would lead to an increased complexity > of > > committing bug fixes and I am not sure the community wants to make a > > release in Sqoop 1.x branch. Even if at some point somebody wants to do > > this they could cut the branch and cherry-pick the necessary bug fixes > > right before the release. > > > > Kite removal: > > I agree that this is quite complex task on its own but we can't bump the > > Hadoop/Hive/HBase dependencies without deciding what to do with Kite. One > > option is to bump these dependencies in Kite too, create a new Kite > release > > and bump Sqoop's Kite dependency to this new release. Another option is > to > > get rid of the Kite dependency before we bump Hadoop/Hive/HBase version. > In > > my opinion the latter one makes more sense since we wanted to eliminate > the > > Kite dependency anyway and the Kite project seems to be dead so bumping > the > > dependencies, making the necessary code changes, fixing tests and > creating > > the release might be an overkill. > > > > Szabolcs > > > > On Mon, Apr 16, 2018 at 11:50 AM, Dániel Vörös <daniel.vo...@gmail.com> > > wrote: > > > > > Hi All, > > > > > > I believe we're all on the same page on removing Kite, so I've opened > > > SQOOP-3313 to track that. @Attila I'm glad to see you're interest in > the > > > ORC part. It would be highly appreciated if you could take a look at > this > > > review request[1]. > > > > > > I'm not that familiar with Flume, but it seems they've added NG after > > > architectural changes and released FlumeNG 1.0 after Flume 0.9.4 [2]. > > Even > > > if we go with NG, I'd suggest calling it 3.0, to avoid confusion with > > > earlier releases. > > > > > > I think the biggest part of keeping Hadoop 2 (and previous versions of > > > downstream projects like Hive) supported would be testing against > those. > > It > > > would also require at least another build profile to build against > them, > > > and probably another layer of abstraction in the code (like Hadoop > shims > > in > > > Hive). > > > Not sure about vendors, but I think they're usually not adding new > > features > > > to older release lines. In my opinion we should branch off from current > > > trunk to track the 1.x release line (where we keep supporting Hadoop 2) > > and > > > keep adding bugfixes there, but add new features to trunk only and > don't > > > worry about Hadoop 2 there. > > > > > > I agree with Attila on the dependencies. We shouldn't release based on > > > non-final releases. We might bump the dependencies to some alpha/beta > > > during development, but don't forget to move to the final version in > the > > > end. > > > > > > +1 for Bogi as release manager. > > > > > > Regards, > > > Daniel > > > > > > [1] https://reviews.apache.org/r/66548/ > > > [2] https://blogs.apache.org/flume/entry/flume_ng_architecture > > > > > > On Fri, Apr 13, 2018 at 5:24 PM Szabó Attila <mau...@inf.elte.hu> > wrote: > > > > > > > > > > > > > > > Hello everyone, > > > > > > > > > > > > I'd like to also attach my thoughts: > > > > > > > > > > > > New Sqoop version: Last time when I'd the chance to talk about this > > with > > > > some of the PMC members (e.g. Jarcec, Kate ) we've been on the front > to > > > > create Sqoop-NG (NG == Next Generation), quite the same what the > Flume > > > > community did (and AFAIK from Mike Percy it's been a quite successful > > act > > > > from their POV). Don't get me wrong, I'm totall NOT against 3.0, > though > > > > IMHO Sqoop-NG 1.0 would be a better choice. > > > > > > > > > > > > Kite: I would totally split this effort into two subtasks. First I > > would > > > > get in contact with the Parquet team, and would create a KITE > > independent > > > > execution path in Sqoop for the Parquet backed tables > > (Hive/Impala/etc.). > > > > As a part of this effort I would also add direct support for ORC > format > > > (in > > > > the past few years I've found it very useful in several different > > > > situation, and usually it's quite inconvenient that Sqoop does not > > > support > > > > it "out of the box"). > > > > > > > > As the second substask I would start to remove every KITE based > > > dependency > > > > (but according to my gut feeling it could break the codebase on too > > many > > > > places, and might not be that EZ to succeed on that front). > > > > > > > > > > > > Hadoop 2: > > > > > > > > Could anyone please highlight me what would be the pros/cons on this > > > > front? AFAIK several vendors (including Cloudera, Hortonworks, MapR, > > EMR, > > > > etc.) are still supporting Hadoop 2, and according to my best > knowledge > > > > most of the userbase are connected to their releases, so I'd like to > > > > provide the chance for those users to use the newest features of > Sqoop, > > > > thus I would vote for the compatibility for a bit more time/versions. > > > > > > > > > > > > Dependencies: > > > > > > > > I'd like to cast my very direct and LOUD vote against any alpha > > > > dependencies (including HBase or anything else!). IMHO Sqoop is > > already a > > > > stable component of the Apache Foundation, and the users can depend > on > > > it, > > > > thus I'd like to avoid any kind of "immature" dependency related > > issues. > > > Of > > > > course this is also just my solo opinion, but as a community I think > we > > > > must not undermine our stability. > > > > > > > > On the other fronts I totally agree and +1 with the planned efforts, > > > > > > > > Best regards, > > > > Attila > > > > > > > > ________________________________ > > > > From: Szabolcs Vasas <va...@apache.org> > > > > Sent: Friday, April 13, 2018 3:43 PM > > > > To: dev@sqoop.apache.org > > > > Subject: Re: Release to support Hadoop 3 > > > > > > > > Hi all, > > > > > > > > I also think that completely eliminating the Kite dependency from > Sqoop > > > > would be the easiest way of going forward, I will try to analyze this > > > topic > > > > a bit more next week and come up with subtasks so we could work on it > > in > > > > parallel potentially. > > > > > > > > I am happy with the Sqoop 3.0 scope proposal too and Bogi being the > > > release > > > > manager of it. > > > > > > > > Szabolcs > > > > > > > > > > > > On Fri, Apr 13, 2018 at 2:37 PM, Boglarka Egyed <b...@apache.org> > > wrote: > > > > > > > > > Hi Daniel et al, > > > > > > > > > > Thanks for bringing up this topic and the detailed status update. > > > > > > > > > > I am sharing my thoughts point by point, please find them below. > > > > > > > > > > 1) How to get a new Kite release? Maybe we should remove the Kite > > > > > > dependency altogether (as Szabolcs hinted in comments of > > SQOOP-3171)? > > > > > > > > > > > > > > > I think making a new Kite release would be a huge effort as it > would > > > > > require upgrading the versions, making the necessary code > > > modifications, > > > > > testing it thoroughly, etc. then making the release itself > meanwhile > > > Kite > > > > > is a very passively handled tool having minimal activity on it thus > > it > > > > > would definitely mean a lot of effort to get it done. It would > have a > > > > > dependency on Solr community too as the Morphlines module of Kite > is > > > > > heavily used and somewhat actively developed by them. Also indeed > > there > > > > is > > > > > a shorter/longer term goal to get rid of Kite dependency in Sqoop > > > > entirely, > > > > > i.e. all release efforts would become throw-away very soon. > > > > > > > > > > Focusing on the Kite removal seems to be more reasonable to me. > > However > > > > it > > > > > would be great to see an estimation regarding this effort, > @Szabolcs > > > > could > > > > > you maybe share your thoughts on this? > > > > > > > > > > 2) Should we drop support for Hadoop 2? > > > > > > > > > > > > > > > > I think we can drop support for Hadoop 2 especially if we use > > > > > straightforward versioning with the new release. > > > > > > > > > > > > > > > > 3) What version number should we use? To avoid confusion with > > Sqoop2 > > > > I'd > > > > > go > > > > > > with 3.0. > > > > > > > > > > > > > > > > I like this idea, +1 for making a 3.0 release containing these > > changes. > > > > > > > > > > > > > > > > 4) Does (should?) this affect the 1.5 release? > > > > > > > > > > > > > > > I think the answer is yes. Currently the following breaking changes > > are > > > > on > > > > > the horizon which could be part of a next Sqoop release: > > > > > * com.cloudera package removal (done) > > > > > * Gradle introduction (in progress) > > > > > * Hadoop/Hive/HBase version upgrade (in progress) > > > > > * Kite deprecation/removal (planned) > > > > > * Bump Java version to 8 (planned ) > > > > > > > > > > Looking at this list I would say that making a Sqoop 1.5 release > > > > containing > > > > > only the com.cloudera package removal, the Gradle introduction and > > the > > > > Java > > > > > version bump would mean a somewhat small and irrelevant scope from > a > > > user > > > > > perspective so maybe having two releases (1.5 and 3.0) would be a > > > little > > > > > bit overkill. I would instead suggest to go with a Sqoop 3.0 > release > > > > > containing all the changes listed above. What do you think? > > > > > > > > > > Summarizing it up I see the following dependencies for a next Sqoop > > > > release > > > > > currently: > > > > > * Finishing up the Gradle patch > > > > > * Hive 3 release > > > > > * Kite removal - this could be the next common effort in the > > community > > > > > > > > > > Anyhow I would be happy to take the Release Manager role for the > next > > > > > release, please let me know if everyone would be OK with that. > > > > > > > > > > I am looking forward to see others thoughts on this too. > > > > > > > > > > Many thanks, > > > > > Bogi > > > > > > > > > > On Thu, Apr 12, 2018 at 5:17 PM, Dániel Vörös < > > daniel.vo...@gmail.com> > > > > > wrote: > > > > > > > > > > > Dear All, > > > > > > > > > > > > After some development towards supporting Hadoop 3 (and latest > > > version > > > > of > > > > > > downstream components) I'd like to summarize the current state of > > the > > > > > > upgrade and start the conversation about releasing a new version > of > > > > Sqoop > > > > > > with Hadoop 3 support. > > > > > > > > > > > > Here's what happened so far: > > > > > > - Upgraded Hadoop dependency to 3.0.0 > > > > > > - Hive had to be upgraded, since old Hive didn't work with > Hadoop > > 3. > > > > > > - HBase had to be upgraded since Hive 3 depends on HBase > 2(alpha) > > > > > > - Dealt with a bunch of minor issues like changed Hadoop > > > configuration > > > > > > names and different packaging of Maven artifacts. > > > > > > > > > > > > For details please refer to this ticket and the attached review > > > > request: > > > > > > https://issues.apache.org/jira/browse/SQOOP-3305 > > > > > > > > > > > > Remaining work: > > > > > > - Parquet importing doesn't work. It was broken by a > > > > > standalone-metastore > > > > > > change in Hive and fixing would require a new Kite version to be > > > built > > > > > > against Hive 3. > > > > > > - Hive 3 is going to enable ACID tables by default. We should > > > support > > > > > > importing into these. Details: > > > > > > https://issues.apache.org/jira/browse/SQOOP-3311 > > > > > > > > > > > > Other blocking issues: > > > > > > - There's no Hive 3 release (no alpha/beta) yet. > > > > > > > > > > > > I'd like to kindly ask you all to share any other tasks/issues > you > > > know > > > > > of > > > > > > that we should address to support the latest versions. Also, > there > > > are > > > > a > > > > > > couple open questions: > > > > > > 1) How to get a new Kite release? Maybe we should remove the > Kite > > > > > > dependency altogether (as Szabolcs hinted in comments of > > SQOOP-3171)? > > > > > > 2) Should we drop support for Hadoop 2? > > > > > > 3) What version number should we use? To avoid confusion with > > Sqoop2 > > > > I'd > > > > > > go with 3.0. > > > > > > 4) Does (should?) this affect the 1.5 release? > > > > > > > > > > > > Regards, > > > > > > Daniel > > > > > > > > > > > > > > > > > > > > >