Thanks for everyone's input! I'll create a 3.x branch first, and then manage a 3.4.1 release after applying IMPALA-9815 and other fixes on it.
There are two other topics in this thread. I created two JIRAs for further discussion: - IMPALA-10408: Build against Apache official versions - IMPALA-10409: Reduce total size of artifacts downloaded from S3 in building Thanks, Quanlong On Wed, Dec 9, 2020 at 9:36 PM Laszlo Gaal <laszlo.g...@cloudera.com> wrote: > For naming convention we could follow the example set by the 2.x branch, > which was created when major incompatible changes, including major changes > in Impala's Hadoop dependencies, started landing on the master branch. > > We could create 3.x as a long-term branch for Sentry fixes, Hadoop > dependency > changes and general maintenance. > At the same time 3.4.1 can also be released as a "maintenance" or bugfix > release, > ensuring that Impala 3.4.[x1] is buildable again. > > Thanks, > > - Laszlo > > On Wed, Dec 9, 2020 at 6:39 AM Joe McDonnell <joemcdonn...@cloudera.com> > wrote: > > > I think a 3.4.1 branch is a good idea. It is nice to have a branch that > can > > accept changes to fix Sentry issues. Also, the Maven repo changes were a > > surprise for everyone, and the latest release should always be buildable. > > > > On a side note, if we want to scrutinize where all the jars are coming > > from, the maven logs from > > https://jenkins.impala.io/job/all-build-options-ub1604/ are easy to grep > > and that job prints some statistics about how many artifacts come from > each > > repo. Here is the output from the latest run on the master branch: > > > > Number of artifacts downloaded from each repo: > > 16 cdh.rcs.releases.repo > > 2067 central > > 203 impala.cdp.repo > > 2 impala.toolchain.kudu.repo > > > > Thanks, > > > > Joe > > > > > > On Tue, Dec 8, 2020 at 7:12 PM Quanlong Huang <huangquanl...@gmail.com> > > wrote: > > > > > Another benefit of depending on Apache releases is we can avoid > > downloading > > > lots of binaries from Cloudera's S3 buckets. E.g. downloading Hadoop, > > Hive > > > binaries from the Apache mirrors is must faster. > > > > > > I think we can create another branch for this purpose. But not sure > what > > > the branch name should be. I think 3.4.1 should only be used for > > > backporting bug fixes for 3.4.0. > > > > > > > > > > > > On Tue, Dec 8, 2020 at 7:26 AM Tim Armstrong <tarmstr...@cloudera.com> > > > wrote: > > > > > > > We did manage to switch to ASF Kudu master via native-toolchain by > > > default, > > > > but that was probably the easiest switch. I don't think we've tried > > > pinning > > > > to Kudu release for our official release, but it's probably doable. I > > > think > > > > the main concern would be is if there wasn't a Kudu release available > > > with > > > > a feature we depended on. > > > > > > > > Maybe ASF Hadoop would be the next easiest, since Impala doesn't > > directly > > > > depend on YARN and the HDFS APIs have tended to be quite stable. The > > main > > > > changes we've depended on from the HDFS codebase are client changes > > (like > > > > hdfsUnbuffer() support) - I can imagine we might have to reconcile > some > > > of > > > > those to get things working correctly against ASF hadoop, but that > > would > > > be > > > > achievable (it would basically mean switching back to using older > APIs > > in > > > > ASF mode). > > > > > > > > On Mon, Dec 7, 2020 at 9:59 AM Csaba Ringhofer < > > csringho...@cloudera.com > > > > > > > > wrote: > > > > > > > > > > Another motivation is that we need a branch to maintain the > Sentry > > > > > support > > > > > > which is removed in the 4.0 branch > > > > > > > > > > +1, it would be great to have a support branch with Sentry > > > > > > > > > > > More ambitiously, I'd love it if releases were compatible with > > > official > > > > > > released versions of our ASF dependencies like Hadoop and Ranger > > > > > > > > > > Switching completely to ASF released dependencies looks a > potentially > > > > very > > > > > hard task to me for two reasons: > > > > > 1. We have several dependencies - Hadoop, Hive, Sentry, Ranger, > > HBase. > > > > > Kudu, probably some others - even if we can't find a proper release > > > for a > > > > > single one of these, then we would be stuck and would have to wait > > for > > > > > another community. > > > > > As an example SENTRY-2549 > > > > > <https://issues.apache.org/jira/browse/SENTRY-2549> is not even > > merged > > > > > yet, > > > > > while it breaks nearly all of our authorization tests. > > > > > 2. Some of our tests depend deeply on the exact behaviour of some > > > > > components, e.g. we may assume a given table to have a certain > amount > > > of > > > > > files or size, which can be easily broken by valid differences in > > Hive > > > or > > > > > parquet-mr/ORC. > > > > > This can lead to the dilemma of a: rewriting a lot of tests b: > > > > skipping > > > > > them, making the test coverage weaker. > > > > > > > > > > A step in this direction could be to add a flag to the build like > > > > > USE_ASF_DEPENDENCIES, which would lead to replace some or all CDH > > > > > dependencies with ASF ones, and see how the build goes, e.g. is the > > > > > build/dataload successful, if yes, then what tests are red. I think > > > that > > > > > releasing with CDH dependencies + adding some information about the > > > state > > > > > with ASF ones could be already a big improvement for adaption, even > > if > > > > not > > > > > everything works. E.g. someone may simply want to try Impala in a > > > Hadoop > > > > + > > > > > Hive cluster and not care about authorization or HBase/Kudu. > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Dec 7, 2020 at 4:36 AM Jim Apple <apa...@jbapple.com> > wrote: > > > > > > > > > > > Yeah, I suppose it depends on the version and the bug fixes. > > > Sometimes > > > > > it's > > > > > > also new features, which it would be good to feature gate anyway. > > > IIRC, > > > > > at > > > > > > some point Impala wouldn't build against any released Hive > version > > > > > because > > > > > > Hive 3.0 made changes to Hive 2.0 that Impala couldn't deal with > > and > > > > Hive > > > > > > 2.x didn't contain a feature Impala depended on just to compile > > > > > > the frontend. Or maybe it was Hadoop. > > > > > > > > > > > > If we could set an example with older releases, I think it would > be > > > > > lovely > > > > > > and perhaps help adoption, too! > > > > > > > > > > > > On Sun, Dec 6, 2020 at 5:49 PM Quanlong Huang < > > > huangquanl...@gmail.com > > > > > > > > > > > wrote: > > > > > > > > > > > > > Yeah, it'd be good to depend on Apache official versions. In my > > > > > > > understanding, we depend on cdh/cdp snapshot versions since we > > need > > > > > some > > > > > > > bug fixes that haven't been released in Apache official > versions. > > > So > > > > > it's > > > > > > > more suitable to do this for an older release like impala-3.4. > > > > Because > > > > > > all > > > > > > > its dependent features/bug fixes may already exist in some > Apache > > > > > > official > > > > > > > versions. > > > > > > > > > > > > > > On Sat, Dec 5, 2020 at 11:53 PM Jim Apple <jbap...@apache.org> > > > > wrote: > > > > > > > > > > > > > > > I think this is the right choice. > > > > > > > > > > > > > > > > More ambitiously, I'd love it if releases were compatible > with > > > > > official > > > > > > > > released versions of our ASF dependencies like Hadoop and > > Ranger. > > > > > > Perhaps > > > > > > > > this would limit the Cloudera Maven dependencies for devs. > > > > > > > > > > > > > > > > On Sat, Dec 5, 2020 at 12:32 AM Quanlong Huang < > > > > > > huangquanl...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > Due to Cloudera's maven repo changes, the latest released > > > version > > > > > > 3.4.0 > > > > > > > > is > > > > > > > > > not compilable now (need the patch of IMPALA-9815). I'm > > > thinking > > > > > > about > > > > > > > > > doing a minor release for 3.4.1. > > > > > > > > > > > > > > > > > > Another motivation is that we need a branch to maintain the > > > > Sentry > > > > > > > > support > > > > > > > > > which is removed in the 4.0 branch (IMPALA-9708). One bug > we > > > > > recently > > > > > > > > found > > > > > > > > > is IMPALA-10326 (PrincipalPrivilegeTree doesn't handle > empty > > > > string > > > > > > and > > > > > > > > > wildcards correctly). We have a fix in downstream but can't > > put > > > > it > > > > > > > > upstream > > > > > > > > > due to missing the Sentry support. IMPALA-10130 is another > > > Sentry > > > > > > issue > > > > > > > > > that we may need to fix. > > > > > > > > > > > > > > > > > > We can also apply some critical fixes in this version. Here > > are > > > > > bugs > > > > > > > that > > > > > > > > > affect 3.4.0 and are fixed in 4.0: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/IMPALA-9725?jql=project%20%3D%20IMPALA%20AND%20issuetype%20%3D%20Bug%20AND%20status%20%3D%20Resolved%20AND%20affectedVersion%20%3D%20%22Impala%203.4.0%22%20AND%20fixVersion%20%3D%20%22Impala%204.0%22%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC > > > > > > > > > > > > > > > > > > Any objections or suggestions? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Quanlong > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >