Hi all, I've just created the 3.x branch. It's based on branch-3.4.0 and cleanly applying these fixes: https://github.com/apache/impala/compare/branch-3.4.0...3.x
- e41fc61 IMPALA-9921: Change error messages in checking needsQuotes to TRACE level logs - 4796d13 IMPALA-9809: Multi-aggregation query on particular dataset crashes impalad - b71187d IMPALA-9725: incorrect spilling join results for wide keys - f598819 IMPALA-9483 Add logs for debugging builtin functions throw unknown exception randomly - 33aba11 IMPALA-10044: Fix cleanup for bootstrap_toolchain.py failure case - 1ec0bdf IMPALA-9739: Fix data race during impala graceful shutdown - fe4de65 IMPALA-9858: Fix wrong partition metrics in LocalCatalog profile - d938d81 IMPALA-9787: fix spinning thread with memory-based table invalidation - e638bc0 IMPALA-7833 Audit and fix string builtins for long string handling - b68e610 IMPALA-9727: Fix HBaseScanNode explain formatting - 812ad40 IMPALA-9721: Fix minor python2/3 syntax regression - 3ba1ea1 IMPALA-9398: Fix shell history duplication when cmdloop breaks - 0a1266f IMPALA-9643: fix runtime filter race for mt_dop - c1af049 IMPALA-9650: Fix flakiness in RuntimeFilterTest - 317c65b IMPALA-9612: Fix race condition in RuntimeFilter::WaitForArrival - ed576c9 IMPALA-9618: fix some usability issues with dev env - 6364f4e IMPALA-9602: Fix case-sensitivity for local catalog - 40265c6 IMPALA-9815: Update URL for cdh-releases-rcs maven repo Exhaustive tests are passed here: - DEBUG build: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/13081/ - RELEASE build: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/13085/ Note that the corresponding Impala-lzo branch of 3.x is asf-3.4: https://github.com/cloudera/impala-lzo/tree/asf-3.4. Remember to checkout to it when compiling impala-3.x. Next steps: - I'll submit a fix for IMPALA-9708 to the 3.x branch. - Some fixes like IMPALA-9957 will be considered. - *Please let me know if any other fixes are critical and should be picked.* - After these I'll manage a 3.4.1 release. Anyone can submit code reviews to the 3.x branch by "git push asf-gerrit HEAD:refs/for/3.x". More about gerrit: https://cwiki.apache.org/confluence/display/IMPALA/Using+Gerrit+to+submit+and+review+patches Thanks, Quanlong On Mon, Dec 28, 2020 at 11:09 AM Quanlong Huang <huangquanl...@gmail.com> wrote: > Thanks for everyone's input! I'll create a 3.x branch first, and then > manage a 3.4.1 release after applying IMPALA-9815 and other fixes on it. > > There are two other topics in this thread. I created two JIRAs for further > discussion: > > - IMPALA-10408: Build against Apache official versions > - IMPALA-10409: Reduce total size of artifacts downloaded from S3 in > building > > Thanks, > Quanlong > > On Wed, Dec 9, 2020 at 9:36 PM Laszlo Gaal <laszlo.g...@cloudera.com> > wrote: > >> For naming convention we could follow the example set by the 2.x branch, >> which was created when major incompatible changes, including major changes >> in Impala's Hadoop dependencies, started landing on the master branch. >> >> We could create 3.x as a long-term branch for Sentry fixes, Hadoop >> dependency >> changes and general maintenance. >> At the same time 3.4.1 can also be released as a "maintenance" or bugfix >> release, >> ensuring that Impala 3.4.[x1] is buildable again. >> >> Thanks, >> >> - Laszlo >> >> On Wed, Dec 9, 2020 at 6:39 AM Joe McDonnell <joemcdonn...@cloudera.com> >> wrote: >> >> > I think a 3.4.1 branch is a good idea. It is nice to have a branch that >> can >> > accept changes to fix Sentry issues. Also, the Maven repo changes were a >> > surprise for everyone, and the latest release should always be >> buildable. >> > >> > On a side note, if we want to scrutinize where all the jars are coming >> > from, the maven logs from >> > https://jenkins.impala.io/job/all-build-options-ub1604/ are easy to >> grep >> > and that job prints some statistics about how many artifacts come from >> each >> > repo. Here is the output from the latest run on the master branch: >> > >> > Number of artifacts downloaded from each repo: >> > 16 cdh.rcs.releases.repo >> > 2067 central >> > 203 impala.cdp.repo >> > 2 impala.toolchain.kudu.repo >> > >> > Thanks, >> > >> > Joe >> > >> > >> > On Tue, Dec 8, 2020 at 7:12 PM Quanlong Huang <huangquanl...@gmail.com> >> > wrote: >> > >> > > Another benefit of depending on Apache releases is we can avoid >> > downloading >> > > lots of binaries from Cloudera's S3 buckets. E.g. downloading Hadoop, >> > Hive >> > > binaries from the Apache mirrors is must faster. >> > > >> > > I think we can create another branch for this purpose. But not sure >> what >> > > the branch name should be. I think 3.4.1 should only be used for >> > > backporting bug fixes for 3.4.0. >> > > >> > > >> > > >> > > On Tue, Dec 8, 2020 at 7:26 AM Tim Armstrong <tarmstr...@cloudera.com >> > >> > > wrote: >> > > >> > > > We did manage to switch to ASF Kudu master via native-toolchain by >> > > default, >> > > > but that was probably the easiest switch. I don't think we've tried >> > > pinning >> > > > to Kudu release for our official release, but it's probably doable. >> I >> > > think >> > > > the main concern would be is if there wasn't a Kudu release >> available >> > > with >> > > > a feature we depended on. >> > > > >> > > > Maybe ASF Hadoop would be the next easiest, since Impala doesn't >> > directly >> > > > depend on YARN and the HDFS APIs have tended to be quite stable. The >> > main >> > > > changes we've depended on from the HDFS codebase are client changes >> > (like >> > > > hdfsUnbuffer() support) - I can imagine we might have to reconcile >> some >> > > of >> > > > those to get things working correctly against ASF hadoop, but that >> > would >> > > be >> > > > achievable (it would basically mean switching back to using older >> APIs >> > in >> > > > ASF mode). >> > > > >> > > > On Mon, Dec 7, 2020 at 9:59 AM Csaba Ringhofer < >> > csringho...@cloudera.com >> > > > >> > > > wrote: >> > > > >> > > > > > Another motivation is that we need a branch to maintain the >> Sentry >> > > > > support >> > > > > > which is removed in the 4.0 branch >> > > > > >> > > > > +1, it would be great to have a support branch with Sentry >> > > > > >> > > > > > More ambitiously, I'd love it if releases were compatible with >> > > official >> > > > > > released versions of our ASF dependencies like Hadoop and Ranger >> > > > > >> > > > > Switching completely to ASF released dependencies looks a >> potentially >> > > > very >> > > > > hard task to me for two reasons: >> > > > > 1. We have several dependencies - Hadoop, Hive, Sentry, Ranger, >> > HBase. >> > > > > Kudu, probably some others - even if we can't find a proper >> release >> > > for a >> > > > > single one of these, then we would be stuck and would have to wait >> > for >> > > > > another community. >> > > > > As an example SENTRY-2549 >> > > > > <https://issues.apache.org/jira/browse/SENTRY-2549> is not even >> > merged >> > > > > yet, >> > > > > while it breaks nearly all of our authorization tests. >> > > > > 2. Some of our tests depend deeply on the exact behaviour of some >> > > > > components, e.g. we may assume a given table to have a certain >> amount >> > > of >> > > > > files or size, which can be easily broken by valid differences in >> > Hive >> > > or >> > > > > parquet-mr/ORC. >> > > > > This can lead to the dilemma of a: rewriting a lot of tests b: >> > > > skipping >> > > > > them, making the test coverage weaker. >> > > > > >> > > > > A step in this direction could be to add a flag to the build like >> > > > > USE_ASF_DEPENDENCIES, which would lead to replace some or all CDH >> > > > > dependencies with ASF ones, and see how the build goes, e.g. is >> the >> > > > > build/dataload successful, if yes, then what tests are red. I >> think >> > > that >> > > > > releasing with CDH dependencies + adding some information about >> the >> > > state >> > > > > with ASF ones could be already a big improvement for adaption, >> even >> > if >> > > > not >> > > > > everything works. E.g. someone may simply want to try Impala in a >> > > Hadoop >> > > > + >> > > > > Hive cluster and not care about authorization or HBase/Kudu. >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > On Mon, Dec 7, 2020 at 4:36 AM Jim Apple <apa...@jbapple.com> >> wrote: >> > > > > >> > > > > > Yeah, I suppose it depends on the version and the bug fixes. >> > > Sometimes >> > > > > it's >> > > > > > also new features, which it would be good to feature gate >> anyway. >> > > IIRC, >> > > > > at >> > > > > > some point Impala wouldn't build against any released Hive >> version >> > > > > because >> > > > > > Hive 3.0 made changes to Hive 2.0 that Impala couldn't deal with >> > and >> > > > Hive >> > > > > > 2.x didn't contain a feature Impala depended on just to compile >> > > > > > the frontend. Or maybe it was Hadoop. >> > > > > > >> > > > > > If we could set an example with older releases, I think it >> would be >> > > > > lovely >> > > > > > and perhaps help adoption, too! >> > > > > > >> > > > > > On Sun, Dec 6, 2020 at 5:49 PM Quanlong Huang < >> > > huangquanl...@gmail.com >> > > > > >> > > > > > wrote: >> > > > > > >> > > > > > > Yeah, it'd be good to depend on Apache official versions. In >> my >> > > > > > > understanding, we depend on cdh/cdp snapshot versions since we >> > need >> > > > > some >> > > > > > > bug fixes that haven't been released in Apache official >> versions. >> > > So >> > > > > it's >> > > > > > > more suitable to do this for an older release like impala-3.4. >> > > > Because >> > > > > > all >> > > > > > > its dependent features/bug fixes may already exist in some >> Apache >> > > > > > official >> > > > > > > versions. >> > > > > > > >> > > > > > > On Sat, Dec 5, 2020 at 11:53 PM Jim Apple <jbap...@apache.org >> > >> > > > wrote: >> > > > > > > >> > > > > > > > I think this is the right choice. >> > > > > > > > >> > > > > > > > More ambitiously, I'd love it if releases were compatible >> with >> > > > > official >> > > > > > > > released versions of our ASF dependencies like Hadoop and >> > Ranger. >> > > > > > Perhaps >> > > > > > > > this would limit the Cloudera Maven dependencies for devs. >> > > > > > > > >> > > > > > > > On Sat, Dec 5, 2020 at 12:32 AM Quanlong Huang < >> > > > > > huangquanl...@gmail.com> >> > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > Hi all, >> > > > > > > > > >> > > > > > > > > Due to Cloudera's maven repo changes, the latest released >> > > version >> > > > > > 3.4.0 >> > > > > > > > is >> > > > > > > > > not compilable now (need the patch of IMPALA-9815). I'm >> > > thinking >> > > > > > about >> > > > > > > > > doing a minor release for 3.4.1. >> > > > > > > > > >> > > > > > > > > Another motivation is that we need a branch to maintain >> the >> > > > Sentry >> > > > > > > > support >> > > > > > > > > which is removed in the 4.0 branch (IMPALA-9708). One bug >> we >> > > > > recently >> > > > > > > > found >> > > > > > > > > is IMPALA-10326 (PrincipalPrivilegeTree doesn't handle >> empty >> > > > string >> > > > > > and >> > > > > > > > > wildcards correctly). We have a fix in downstream but >> can't >> > put >> > > > it >> > > > > > > > upstream >> > > > > > > > > due to missing the Sentry support. IMPALA-10130 is another >> > > Sentry >> > > > > > issue >> > > > > > > > > that we may need to fix. >> > > > > > > > > >> > > > > > > > > We can also apply some critical fixes in this version. >> Here >> > are >> > > > > bugs >> > > > > > > that >> > > > > > > > > affect 3.4.0 and are fixed in 4.0: >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://issues.apache.org/jira/browse/IMPALA-9725?jql=project%20%3D%20IMPALA%20AND%20issuetype%20%3D%20Bug%20AND%20status%20%3D%20Resolved%20AND%20affectedVersion%20%3D%20%22Impala%203.4.0%22%20AND%20fixVersion%20%3D%20%22Impala%204.0%22%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC >> > > > > > > > > >> > > > > > > > > Any objections or suggestions? >> > > > > > > > > >> > > > > > > > > Thanks, >> > > > > > > > > Quanlong >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >