Hi all,

I've just created the 3.x branch. It's based on branch-3.4.0 and cleanly
applying these fixes:
https://github.com/apache/impala/compare/branch-3.4.0...3.x

   - e41fc61 IMPALA-9921: Change error messages in checking needsQuotes to
   TRACE level logs
   - 4796d13 IMPALA-9809: Multi-aggregation query on particular dataset
   crashes impalad
   - b71187d IMPALA-9725: incorrect spilling join results for wide keys
   - f598819 IMPALA-9483 Add logs for debugging builtin functions throw
   unknown exception randomly
   - 33aba11 IMPALA-10044: Fix cleanup for bootstrap_toolchain.py failure
   case
   - 1ec0bdf IMPALA-9739: Fix data race during impala graceful shutdown
   - fe4de65 IMPALA-9858: Fix wrong partition metrics in LocalCatalog
   profile
   - d938d81 IMPALA-9787: fix spinning thread with memory-based table
   invalidation
   - e638bc0 IMPALA-7833 Audit and fix string builtins for long string
   handling
   - b68e610 IMPALA-9727: Fix HBaseScanNode explain formatting
   - 812ad40 IMPALA-9721: Fix minor python2/3 syntax regression
   - 3ba1ea1 IMPALA-9398: Fix shell history duplication when cmdloop breaks
   - 0a1266f IMPALA-9643: fix runtime filter race for mt_dop
   - c1af049 IMPALA-9650: Fix flakiness in RuntimeFilterTest
   - 317c65b IMPALA-9612: Fix race condition in
   RuntimeFilter::WaitForArrival
   - ed576c9 IMPALA-9618: fix some usability issues with dev env
   - 6364f4e IMPALA-9602: Fix case-sensitivity for local catalog
   - 40265c6 IMPALA-9815: Update URL for cdh-releases-rcs maven repo

Exhaustive tests are passed here:

   - DEBUG build:
   https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/13081/
   - RELEASE build:
   https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/13085/

Note that the corresponding Impala-lzo branch of 3.x is asf-3.4:
https://github.com/cloudera/impala-lzo/tree/asf-3.4. Remember to checkout
to it when compiling impala-3.x.

Next steps:

   - I'll submit a fix for IMPALA-9708 to the 3.x branch.
   - Some fixes like IMPALA-9957 will be considered.
   - *Please let me know if any other fixes are critical and should be
   picked.*
   - After these I'll manage a 3.4.1 release.

Anyone can submit code reviews to the 3.x branch by "git push asf-gerrit
HEAD:refs/for/3.x". More about gerrit:
https://cwiki.apache.org/confluence/display/IMPALA/Using+Gerrit+to+submit+and+review+patches

Thanks,
Quanlong

On Mon, Dec 28, 2020 at 11:09 AM Quanlong Huang <huangquanl...@gmail.com>
wrote:

> Thanks for everyone's input! I'll create a 3.x branch first, and then
> manage a 3.4.1 release after applying IMPALA-9815 and other fixes on it.
>
> There are two other topics in this thread. I created two JIRAs for further
> discussion:
>
>    - IMPALA-10408: Build against Apache official versions
>    - IMPALA-10409: Reduce total size of artifacts downloaded from S3 in
>    building
>
> Thanks,
> Quanlong
>
> On Wed, Dec 9, 2020 at 9:36 PM Laszlo Gaal <laszlo.g...@cloudera.com>
> wrote:
>
>> For naming convention we could follow the example set by the 2.x branch,
>> which was created when major incompatible changes, including major changes
>> in Impala's Hadoop dependencies, started landing on the master branch.
>>
>> We could create 3.x as a long-term branch for Sentry fixes, Hadoop
>> dependency
>> changes and general maintenance.
>> At the same time 3.4.1 can also be released as a "maintenance" or bugfix
>> release,
>> ensuring that Impala 3.4.[x1] is buildable again.
>>
>> Thanks,
>>
>>   - Laszlo
>>
>> On Wed, Dec 9, 2020 at 6:39 AM Joe McDonnell <joemcdonn...@cloudera.com>
>> wrote:
>>
>> > I think a 3.4.1 branch is a good idea. It is nice to have a branch that
>> can
>> > accept changes to fix Sentry issues. Also, the Maven repo changes were a
>> > surprise for everyone, and the latest release should always be
>> buildable.
>> >
>> > On a side note, if we want to scrutinize where all the jars are coming
>> > from, the maven logs from
>> > https://jenkins.impala.io/job/all-build-options-ub1604/ are easy to
>> grep
>> > and that job prints some statistics about how many artifacts come from
>> each
>> > repo. Here is the output from the latest run on the master branch:
>> >
>> > Number of artifacts downloaded from each repo:
>> >      16 cdh.rcs.releases.repo
>> >    2067 central
>> >     203 impala.cdp.repo
>> >       2 impala.toolchain.kudu.repo
>> >
>> > Thanks,
>> >
>> > Joe
>> >
>> >
>> > On Tue, Dec 8, 2020 at 7:12 PM Quanlong Huang <huangquanl...@gmail.com>
>> > wrote:
>> >
>> > > Another benefit of depending on Apache releases is we can avoid
>> > downloading
>> > > lots of binaries from Cloudera's S3 buckets. E.g. downloading Hadoop,
>> > Hive
>> > > binaries from the Apache mirrors is must faster.
>> > >
>> > > I think we can create another branch for this purpose. But not sure
>> what
>> > > the branch name should be. I think 3.4.1 should only be used for
>> > > backporting bug fixes for 3.4.0.
>> > >
>> > >
>> > >
>> > > On Tue, Dec 8, 2020 at 7:26 AM Tim Armstrong <tarmstr...@cloudera.com
>> >
>> > > wrote:
>> > >
>> > > > We did manage to switch to ASF Kudu master via native-toolchain by
>> > > default,
>> > > > but that was probably the easiest switch. I don't think we've tried
>> > > pinning
>> > > > to Kudu release for our official release, but it's probably doable.
>> I
>> > > think
>> > > > the main concern would be is if there wasn't a Kudu release
>> available
>> > > with
>> > > > a feature we depended on.
>> > > >
>> > > > Maybe ASF Hadoop would be the next easiest, since Impala doesn't
>> > directly
>> > > > depend on YARN and the HDFS APIs have tended to be quite stable. The
>> > main
>> > > > changes we've depended on from the HDFS codebase are client changes
>> > (like
>> > > > hdfsUnbuffer() support) - I can imagine we might have to reconcile
>> some
>> > > of
>> > > > those to get things working correctly against ASF hadoop, but that
>> > would
>> > > be
>> > > > achievable (it would basically mean switching back to using older
>> APIs
>> > in
>> > > > ASF mode).
>> > > >
>> > > > On Mon, Dec 7, 2020 at 9:59 AM Csaba Ringhofer <
>> > csringho...@cloudera.com
>> > > >
>> > > > wrote:
>> > > >
>> > > > > > Another motivation is that we need a branch to maintain the
>> Sentry
>> > > > > support
>> > > > > > which is removed in the 4.0 branch
>> > > > >
>> > > > > +1, it would be great to have a support branch with Sentry
>> > > > >
>> > > > > > More ambitiously, I'd love it if releases were compatible with
>> > > official
>> > > > > > released versions of our ASF dependencies like Hadoop and Ranger
>> > > > >
>> > > > > Switching completely to ASF released dependencies looks a
>> potentially
>> > > > very
>> > > > > hard task to me for two reasons:
>> > > > > 1. We have several dependencies - Hadoop, Hive, Sentry, Ranger,
>> > HBase.
>> > > > > Kudu, probably some others - even if we can't find a proper
>> release
>> > > for a
>> > > > > single one of these, then we would be stuck and would have to wait
>> > for
>> > > > > another community.
>> > > > >     As an example SENTRY-2549
>> > > > > <https://issues.apache.org/jira/browse/SENTRY-2549> is not even
>> > merged
>> > > > > yet,
>> > > > > while it breaks nearly all of our authorization tests.
>> > > > > 2. Some of our tests depend deeply on the exact behaviour of some
>> > > > > components, e.g. we may assume a given table to have a certain
>> amount
>> > > of
>> > > > > files or size, which can be easily broken by valid differences in
>> > Hive
>> > > or
>> > > > > parquet-mr/ORC.
>> > > > >     This can lead to the dilemma of a: rewriting a lot of tests b:
>> > > > skipping
>> > > > > them, making the test coverage weaker.
>> > > > >
>> > > > > A step in this direction could be to add a flag to the build like
>> > > > > USE_ASF_DEPENDENCIES, which would lead to replace some or all CDH
>> > > > > dependencies with ASF ones, and see how the build goes, e.g. is
>> the
>> > > > > build/dataload successful, if yes, then what tests are red. I
>> think
>> > > that
>> > > > > releasing with CDH dependencies + adding some information about
>> the
>> > > state
>> > > > > with ASF ones could be already a big improvement for adaption,
>> even
>> > if
>> > > > not
>> > > > > everything works. E.g. someone may simply want to try Impala in a
>> > > Hadoop
>> > > > +
>> > > > > Hive cluster and not care about authorization or HBase/Kudu.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Mon, Dec 7, 2020 at 4:36 AM Jim Apple <apa...@jbapple.com>
>> wrote:
>> > > > >
>> > > > > > Yeah, I suppose it depends on the version and the bug fixes.
>> > > Sometimes
>> > > > > it's
>> > > > > > also new features, which it would be good to feature gate
>> anyway.
>> > > IIRC,
>> > > > > at
>> > > > > > some point Impala wouldn't build against any released Hive
>> version
>> > > > > because
>> > > > > > Hive 3.0 made changes to Hive 2.0 that Impala couldn't deal with
>> > and
>> > > > Hive
>> > > > > > 2.x didn't contain a feature Impala depended on just to compile
>> > > > > > the frontend. Or maybe it was Hadoop.
>> > > > > >
>> > > > > > If we could set an example with older releases, I think it
>> would be
>> > > > > lovely
>> > > > > > and perhaps help adoption, too!
>> > > > > >
>> > > > > > On Sun, Dec 6, 2020 at 5:49 PM Quanlong Huang <
>> > > huangquanl...@gmail.com
>> > > > >
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Yeah, it'd be good to depend on Apache official versions. In
>> my
>> > > > > > > understanding, we depend on cdh/cdp snapshot versions since we
>> > need
>> > > > > some
>> > > > > > > bug fixes that haven't been released in Apache official
>> versions.
>> > > So
>> > > > > it's
>> > > > > > > more suitable to do this for an older release like impala-3.4.
>> > > > Because
>> > > > > > all
>> > > > > > > its dependent features/bug fixes may already exist in some
>> Apache
>> > > > > > official
>> > > > > > > versions.
>> > > > > > >
>> > > > > > > On Sat, Dec 5, 2020 at 11:53 PM Jim Apple <jbap...@apache.org
>> >
>> > > > wrote:
>> > > > > > >
>> > > > > > > > I think this is the right choice.
>> > > > > > > >
>> > > > > > > > More ambitiously, I'd love it if releases were compatible
>> with
>> > > > > official
>> > > > > > > > released versions of our ASF dependencies like Hadoop and
>> > Ranger.
>> > > > > > Perhaps
>> > > > > > > > this would limit the Cloudera Maven dependencies for devs.
>> > > > > > > >
>> > > > > > > > On Sat, Dec 5, 2020 at 12:32 AM Quanlong Huang <
>> > > > > > huangquanl...@gmail.com>
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Hi all,
>> > > > > > > > >
>> > > > > > > > > Due to Cloudera's maven repo changes, the latest released
>> > > version
>> > > > > > 3.4.0
>> > > > > > > > is
>> > > > > > > > > not compilable now (need the patch of IMPALA-9815). I'm
>> > > thinking
>> > > > > > about
>> > > > > > > > > doing a minor release for 3.4.1.
>> > > > > > > > >
>> > > > > > > > > Another motivation is that we need a branch to maintain
>> the
>> > > > Sentry
>> > > > > > > > support
>> > > > > > > > > which is removed in the 4.0 branch (IMPALA-9708). One bug
>> we
>> > > > > recently
>> > > > > > > > found
>> > > > > > > > > is IMPALA-10326 (PrincipalPrivilegeTree doesn't handle
>> empty
>> > > > string
>> > > > > > and
>> > > > > > > > > wildcards correctly). We have a fix in downstream but
>> can't
>> > put
>> > > > it
>> > > > > > > > upstream
>> > > > > > > > > due to missing the Sentry support. IMPALA-10130 is another
>> > > Sentry
>> > > > > > issue
>> > > > > > > > > that we may need to fix.
>> > > > > > > > >
>> > > > > > > > > We can also apply some critical fixes in this version.
>> Here
>> > are
>> > > > > bugs
>> > > > > > > that
>> > > > > > > > > affect 3.4.0 and are fixed in 4.0:
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://issues.apache.org/jira/browse/IMPALA-9725?jql=project%20%3D%20IMPALA%20AND%20issuetype%20%3D%20Bug%20AND%20status%20%3D%20Resolved%20AND%20affectedVersion%20%3D%20%22Impala%203.4.0%22%20AND%20fixVersion%20%3D%20%22Impala%204.0%22%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>> > > > > > > > >
>> > > > > > > > > Any objections or suggestions?
>> > > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > > Quanlong
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to