Re: Impala 3.x (and 2.x)
On Fri, Jan 12, 2018 at 4:04 PM, Jim Applewrote: > This makes sense to me. > > In this mode, for 2.x-only changes and for changes on 3.0 that don't > apply cleanly, there will be a manual way to do the step labelled "1. > Cherrypick tool", and that way is the same way we send patches for > review now, but pushing to HEAD:refs/for/2.x rather than > HEAD:refs/for/master, yes? > Exactly. So, non-clean cherrypicks or 2.x-only changes go through review on Gerrit, but we give an implicit review pass to clean cherrypicks. We could have the cherrypick tool between gerrit/master and gerrit/2.x do the cherrypicks and run the tests on Jenkins. Do you think that's preferable? -- Philip > > On Fri, Jan 12, 2018 at 3:57 PM, Philip Zeyliger > wrote: > > Picture: > > https://gist.github.com/philz/323c8b4cb411dc12eb7231d922c195 > 1f#file-impala-branch-image-pdf > > > > > > On Fri, Jan 12, 2018 at 3:47 PM, Jim Apple wrote: > > > >> Often, this list seems to filter out images. Could you post it and send > a > >> link? > >> > >> Thanks for taking this on, Phil! > >> > >> On Fri, Jan 12, 2018 at 3:15 PM, Philip Zeyliger > >> wrote: > >> > >> > I think most patches go to Gerrit branch 'master', which happens to > >> > identify itself as 3.0. (Or 3.x?). > >> > > >> > Here's a picture: > >> > > >> > [image: Inline image 1] > >> > > >> > > >> > With this, every time "cherrypick_and_push_to_asf.py" is run, it > would > >> > first offer to cherrypick changes between master and 2.x. Then, it > would > >> > offer push those cherrypicks to gerrit/2.x. After that, it continues > on > >> as > >> > before and offers to push changes to ASF. I think this maintains the > >> > invariant that pushing to ASF is only done with a human trigger. (We > >> could > >> > also have step 1 be done by a Jenkins robot, since it's between Gerrit > >> and > >> > Gerrit.) > >> > > >> > I looked at the How to Release page, and the main difference would be > >> > that, for a 2.x release, the $COMMIT_HASH_YOU_CHOSE would come from > the > >> 2.x > >> > branch, as would any cherrypicks. > >> > > >> > Does this match what you're thinking? > >> > > >> > Thanks! > >> > > >> > -- Philip > >> > > >> > On Fri, Jan 12, 2018 at 11:07 AM, Jim Apple > >> wrote: > >> > > >> >> Which gerrit branch were you thinking most patches would go to? > >> >> > >> >> If they go to 3.0, then push_to_asf.py would have to be amended to > >> >> push to gerrit, bypassing code review. I think that's possible, but > >> >> I'm not 100%. > >> >> > >> >> There is also security to think about, since the push_to_asf.py users > >> >> can push a few commits at a time, including ones they didn't author > or > >> >> review. > >> >> > >> >> We'll also want to clarify > >> >> https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release > and > >> >> keep it consistent with the git & gerrit statuses quo. > >> >> > >> >> On Fri, Jan 12, 2018 at 10:52 AM, Philip Zeyliger < > phi...@cloudera.com> > >> >> wrote: > >> >> > Hi! > >> >> > > >> >> >> Should we start tagging all candidates with a common label, e.g. > >> >> > include-in-v3? > >> >> > > >> >> > I agree with Lars's suggestion for tagging JIRAs with > include-in-v3. > >> >> I've > >> >> > done so, and the relevant query is > >> >> > https://issues.apache.org/jira/issues/?jql=labels%20%3D%20in > >> >> clude-in-v3%20and%20project%3Dimpala > >> >> > . > >> >> > > >> >> >> What sort of process were you thinking of for the automation? > >> >> > > >> >> > I think amending push_to_asf.py, as you suggest, is a great idea. I > >> >> think > >> >> > we have a string ("not for 2.x") which can be used in commit > messages > >> to > >> >> > discourage the cherrypick for the changes we want to be exclusive > >> until > >> >> we > >> >> > want to change the defaults in the other direction. (I.e., right > now > >> the > >> >> > string is "not for 2.x", but at some point the string may be > "should > >> be > >> >> > cherrypicked to 2.x".) > >> >> > > >> >> > I do think that we want to create a gerrit branch to allow 2.x-only > >> >> changes > >> >> > to be reviewed in the straight-forward fashion. > >> >> > > >> >> > -- Philip > >> >> > > >> >> > > >> >> > > >> >> > On Thu, Jan 11, 2018 at 9:31 AM, Jim Apple > >> >> wrote: > >> >> > > >> >> >> I'm on-board with all of this. (I also would be OK delaying 3.0, > if > >> >> >> that were the consensus). > >> >> >> > >> >> >> There is one issue in here I think we should dive into: > >> >> >> > >> >> >> > Both master and 2.x would be active, and, at least for the > >> beginning, > >> >> >> > changes would automatically be pulled into the 2.x line, unless > >> >> >> explicitly > >> >> >> > blacklisted. > >> >> >> > >> >> >> What sort of process were you thinking of for the automation? > >> >> >> > >> >> >> Some context, starting from what we all likely already know: > >> >> >>
Re: Impala 3.x (and 2.x)
This makes sense to me. In this mode, for 2.x-only changes and for changes on 3.0 that don't apply cleanly, there will be a manual way to do the step labelled "1. Cherrypick tool", and that way is the same way we send patches for review now, but pushing to HEAD:refs/for/2.x rather than HEAD:refs/for/master, yes? On Fri, Jan 12, 2018 at 3:57 PM, Philip Zeyligerwrote: > Picture: > https://gist.github.com/philz/323c8b4cb411dc12eb7231d922c1951f#file-impala-branch-image-pdf > > > On Fri, Jan 12, 2018 at 3:47 PM, Jim Apple wrote: > >> Often, this list seems to filter out images. Could you post it and send a >> link? >> >> Thanks for taking this on, Phil! >> >> On Fri, Jan 12, 2018 at 3:15 PM, Philip Zeyliger >> wrote: >> >> > I think most patches go to Gerrit branch 'master', which happens to >> > identify itself as 3.0. (Or 3.x?). >> > >> > Here's a picture: >> > >> > [image: Inline image 1] >> > >> > >> > With this, every time "cherrypick_and_push_to_asf.py" is run, it would >> > first offer to cherrypick changes between master and 2.x. Then, it would >> > offer push those cherrypicks to gerrit/2.x. After that, it continues on >> as >> > before and offers to push changes to ASF. I think this maintains the >> > invariant that pushing to ASF is only done with a human trigger. (We >> could >> > also have step 1 be done by a Jenkins robot, since it's between Gerrit >> and >> > Gerrit.) >> > >> > I looked at the How to Release page, and the main difference would be >> > that, for a 2.x release, the $COMMIT_HASH_YOU_CHOSE would come from the >> 2.x >> > branch, as would any cherrypicks. >> > >> > Does this match what you're thinking? >> > >> > Thanks! >> > >> > -- Philip >> > >> > On Fri, Jan 12, 2018 at 11:07 AM, Jim Apple >> wrote: >> > >> >> Which gerrit branch were you thinking most patches would go to? >> >> >> >> If they go to 3.0, then push_to_asf.py would have to be amended to >> >> push to gerrit, bypassing code review. I think that's possible, but >> >> I'm not 100%. >> >> >> >> There is also security to think about, since the push_to_asf.py users >> >> can push a few commits at a time, including ones they didn't author or >> >> review. >> >> >> >> We'll also want to clarify >> >> https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release and >> >> keep it consistent with the git & gerrit statuses quo. >> >> >> >> On Fri, Jan 12, 2018 at 10:52 AM, Philip Zeyliger >> >> wrote: >> >> > Hi! >> >> > >> >> >> Should we start tagging all candidates with a common label, e.g. >> >> > include-in-v3? >> >> > >> >> > I agree with Lars's suggestion for tagging JIRAs with include-in-v3. >> >> I've >> >> > done so, and the relevant query is >> >> > https://issues.apache.org/jira/issues/?jql=labels%20%3D%20in >> >> clude-in-v3%20and%20project%3Dimpala >> >> > . >> >> > >> >> >> What sort of process were you thinking of for the automation? >> >> > >> >> > I think amending push_to_asf.py, as you suggest, is a great idea. I >> >> think >> >> > we have a string ("not for 2.x") which can be used in commit messages >> to >> >> > discourage the cherrypick for the changes we want to be exclusive >> until >> >> we >> >> > want to change the defaults in the other direction. (I.e., right now >> the >> >> > string is "not for 2.x", but at some point the string may be "should >> be >> >> > cherrypicked to 2.x".) >> >> > >> >> > I do think that we want to create a gerrit branch to allow 2.x-only >> >> changes >> >> > to be reviewed in the straight-forward fashion. >> >> > >> >> > -- Philip >> >> > >> >> > >> >> > >> >> > On Thu, Jan 11, 2018 at 9:31 AM, Jim Apple >> >> wrote: >> >> > >> >> >> I'm on-board with all of this. (I also would be OK delaying 3.0, if >> >> >> that were the consensus). >> >> >> >> >> >> There is one issue in here I think we should dive into: >> >> >> >> >> >> > Both master and 2.x would be active, and, at least for the >> beginning, >> >> >> > changes would automatically be pulled into the 2.x line, unless >> >> >> explicitly >> >> >> > blacklisted. >> >> >> >> >> >> What sort of process were you thinking of for the automation? >> >> >> >> >> >> Some context, starting from what we all likely already know: >> >> >> >> >> >> The bulk of the code review and pre-merge testing results are >> recorded >> >> >> in gerrit. Once the pre-merge testing passes, a patch is >> cherry-picked >> >> >> to the git repo hosted with gerrit. To get the patch to the Impala >> git >> >> >> repo hosted by the ASF, bin/push_to_asf.py is run by a human who >> >> >> supplies his or her ASF credentials. That script copies the commit to >> >> >> the ASF git repo. >> >> >> >> >> >> Often, 2-3 commits will pile up in gerrit before some committer runs >> >> >> that script and pushes them to ASF. >> >> >> >> >> >> We could edit that script (bin/push_to_asf.py) to help with the >> cherry >> >> >>
Re: Impala 3.x (and 2.x)
Often, this list seems to filter out images. Could you post it and send a link? Thanks for taking this on, Phil! On Fri, Jan 12, 2018 at 3:15 PM, Philip Zeyligerwrote: > I think most patches go to Gerrit branch 'master', which happens to > identify itself as 3.0. (Or 3.x?). > > Here's a picture: > > [image: Inline image 1] > > > With this, every time "cherrypick_and_push_to_asf.py" is run, it would > first offer to cherrypick changes between master and 2.x. Then, it would > offer push those cherrypicks to gerrit/2.x. After that, it continues on as > before and offers to push changes to ASF. I think this maintains the > invariant that pushing to ASF is only done with a human trigger. (We could > also have step 1 be done by a Jenkins robot, since it's between Gerrit and > Gerrit.) > > I looked at the How to Release page, and the main difference would be > that, for a 2.x release, the $COMMIT_HASH_YOU_CHOSE would come from the 2.x > branch, as would any cherrypicks. > > Does this match what you're thinking? > > Thanks! > > -- Philip > > On Fri, Jan 12, 2018 at 11:07 AM, Jim Apple wrote: > >> Which gerrit branch were you thinking most patches would go to? >> >> If they go to 3.0, then push_to_asf.py would have to be amended to >> push to gerrit, bypassing code review. I think that's possible, but >> I'm not 100%. >> >> There is also security to think about, since the push_to_asf.py users >> can push a few commits at a time, including ones they didn't author or >> review. >> >> We'll also want to clarify >> https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release and >> keep it consistent with the git & gerrit statuses quo. >> >> On Fri, Jan 12, 2018 at 10:52 AM, Philip Zeyliger >> wrote: >> > Hi! >> > >> >> Should we start tagging all candidates with a common label, e.g. >> > include-in-v3? >> > >> > I agree with Lars's suggestion for tagging JIRAs with include-in-v3. >> I've >> > done so, and the relevant query is >> > https://issues.apache.org/jira/issues/?jql=labels%20%3D%20in >> clude-in-v3%20and%20project%3Dimpala >> > . >> > >> >> What sort of process were you thinking of for the automation? >> > >> > I think amending push_to_asf.py, as you suggest, is a great idea. I >> think >> > we have a string ("not for 2.x") which can be used in commit messages to >> > discourage the cherrypick for the changes we want to be exclusive until >> we >> > want to change the defaults in the other direction. (I.e., right now the >> > string is "not for 2.x", but at some point the string may be "should be >> > cherrypicked to 2.x".) >> > >> > I do think that we want to create a gerrit branch to allow 2.x-only >> changes >> > to be reviewed in the straight-forward fashion. >> > >> > -- Philip >> > >> > >> > >> > On Thu, Jan 11, 2018 at 9:31 AM, Jim Apple >> wrote: >> > >> >> I'm on-board with all of this. (I also would be OK delaying 3.0, if >> >> that were the consensus). >> >> >> >> There is one issue in here I think we should dive into: >> >> >> >> > Both master and 2.x would be active, and, at least for the beginning, >> >> > changes would automatically be pulled into the 2.x line, unless >> >> explicitly >> >> > blacklisted. >> >> >> >> What sort of process were you thinking of for the automation? >> >> >> >> Some context, starting from what we all likely already know: >> >> >> >> The bulk of the code review and pre-merge testing results are recorded >> >> in gerrit. Once the pre-merge testing passes, a patch is cherry-picked >> >> to the git repo hosted with gerrit. To get the patch to the Impala git >> >> repo hosted by the ASF, bin/push_to_asf.py is run by a human who >> >> supplies his or her ASF credentials. That script copies the commit to >> >> the ASF git repo. >> >> >> >> Often, 2-3 commits will pile up in gerrit before some committer runs >> >> that script and pushes them to ASF. >> >> >> >> We could edit that script (bin/push_to_asf.py) to help with the cherry >> >> picks, so that each time a commit is made, the committer must say >> >> whether the commit goes in 2.x, 3.0, or both, but the commits are >> >> often made by people who didn't author the patches, so they may not be >> >> sure which branch to go in. >> >> >> >> Additionally, gerrit code review is intimately tied to the git repo. >> >> Gerrit runs a git repo under-the-hood, and I believe that the branch >> >> on gerrit's git that changes are cherry-picked to after pre-merge >> >> testing is identical to the Impala git repo hosted by the ASF - down >> >> to the hashes, even. If we think 2.x and 3.0 will diverge enough that >> >> we'll want different code reviews for different branches, then we >> >> might want two different branches on gerrit, too. >> >> >> > >
Re: Impala 3.x (and 2.x)
Which gerrit branch were you thinking most patches would go to? If they go to 3.0, then push_to_asf.py would have to be amended to push to gerrit, bypassing code review. I think that's possible, but I'm not 100%. There is also security to think about, since the push_to_asf.py users can push a few commits at a time, including ones they didn't author or review. We'll also want to clarify https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release and keep it consistent with the git & gerrit statuses quo. On Fri, Jan 12, 2018 at 10:52 AM, Philip Zeyligerwrote: > Hi! > >> Should we start tagging all candidates with a common label, e.g. > include-in-v3? > > I agree with Lars's suggestion for tagging JIRAs with include-in-v3. I've > done so, and the relevant query is > https://issues.apache.org/jira/issues/?jql=labels%20%3D%20include-in-v3%20and%20project%3Dimpala > . > >> What sort of process were you thinking of for the automation? > > I think amending push_to_asf.py, as you suggest, is a great idea. I think > we have a string ("not for 2.x") which can be used in commit messages to > discourage the cherrypick for the changes we want to be exclusive until we > want to change the defaults in the other direction. (I.e., right now the > string is "not for 2.x", but at some point the string may be "should be > cherrypicked to 2.x".) > > I do think that we want to create a gerrit branch to allow 2.x-only changes > to be reviewed in the straight-forward fashion. > > -- Philip > > > > On Thu, Jan 11, 2018 at 9:31 AM, Jim Apple wrote: > >> I'm on-board with all of this. (I also would be OK delaying 3.0, if >> that were the consensus). >> >> There is one issue in here I think we should dive into: >> >> > Both master and 2.x would be active, and, at least for the beginning, >> > changes would automatically be pulled into the 2.x line, unless >> explicitly >> > blacklisted. >> >> What sort of process were you thinking of for the automation? >> >> Some context, starting from what we all likely already know: >> >> The bulk of the code review and pre-merge testing results are recorded >> in gerrit. Once the pre-merge testing passes, a patch is cherry-picked >> to the git repo hosted with gerrit. To get the patch to the Impala git >> repo hosted by the ASF, bin/push_to_asf.py is run by a human who >> supplies his or her ASF credentials. That script copies the commit to >> the ASF git repo. >> >> Often, 2-3 commits will pile up in gerrit before some committer runs >> that script and pushes them to ASF. >> >> We could edit that script (bin/push_to_asf.py) to help with the cherry >> picks, so that each time a commit is made, the committer must say >> whether the commit goes in 2.x, 3.0, or both, but the commits are >> often made by people who didn't author the patches, so they may not be >> sure which branch to go in. >> >> Additionally, gerrit code review is intimately tied to the git repo. >> Gerrit runs a git repo under-the-hood, and I believe that the branch >> on gerrit's git that changes are cherry-picked to after pre-merge >> testing is identical to the Impala git repo hosted by the ASF - down >> to the hashes, even. If we think 2.x and 3.0 will diverge enough that >> we'll want different code reviews for different branches, then we >> might want two different branches on gerrit, too. >>
Re: Accumulo Integration [SEC=UNOFFICIAL]
https://www.quora.com/How-do-we-compare-Apache-HBase-vs-Apache-Accumulo HBase and Accumulo are both implementations of BigTable using HDFS as their storage mechanism.They are virtually identical in most regards from both an operational and architectual standpoint; they have different APIs, and use different nomenclature, but conceptually they do the same thing and performance is close enough that there's no significant advantage to using one over the other. I was under the general impression that the above was true. So it should be a near 1 to 1 port right? IE fundamentally both are BigTable clones. On Fri, Jan 12, 2018 at 11:59 AM, Jim Applewrote: > I don't know that Impala can scan Accumulo tables. Assuming it can't: > Impala can read HBase tables, so that might be a good place for you to > start looking on what this could look like. > > I suspect the most complex piece is going to be the auth integration. > > On Thu, Jan 11, 2018 at 7:16 PM, Dickson, Matt MR > wrote: > > UNOFFICIAL > > I'm investigate integrating Accumulo as a backed to Impala. At this > point it is simply an idea and I'm interested to know if anything has been > done to achieve this in the past/present? > > > > If so, is the code open source? > > > > If not, I'd be very keen to gain some detail on how this could be > achieved or issues that prevent it being feasible. > > > > Thanks in advance, > > Matt > > >
Re: Impala 3.x (and 2.x)
Hi! > Should we start tagging all candidates with a common label, e.g. include-in-v3? I agree with Lars's suggestion for tagging JIRAs with include-in-v3. I've done so, and the relevant query is https://issues.apache.org/jira/issues/?jql=labels%20%3D%20include-in-v3%20and%20project%3Dimpala . > What sort of process were you thinking of for the automation? I think amending push_to_asf.py, as you suggest, is a great idea. I think we have a string ("not for 2.x") which can be used in commit messages to discourage the cherrypick for the changes we want to be exclusive until we want to change the defaults in the other direction. (I.e., right now the string is "not for 2.x", but at some point the string may be "should be cherrypicked to 2.x".) I do think that we want to create a gerrit branch to allow 2.x-only changes to be reviewed in the straight-forward fashion. -- Philip On Thu, Jan 11, 2018 at 9:31 AM, Jim Applewrote: > I'm on-board with all of this. (I also would be OK delaying 3.0, if > that were the consensus). > > There is one issue in here I think we should dive into: > > > Both master and 2.x would be active, and, at least for the beginning, > > changes would automatically be pulled into the 2.x line, unless > explicitly > > blacklisted. > > What sort of process were you thinking of for the automation? > > Some context, starting from what we all likely already know: > > The bulk of the code review and pre-merge testing results are recorded > in gerrit. Once the pre-merge testing passes, a patch is cherry-picked > to the git repo hosted with gerrit. To get the patch to the Impala git > repo hosted by the ASF, bin/push_to_asf.py is run by a human who > supplies his or her ASF credentials. That script copies the commit to > the ASF git repo. > > Often, 2-3 commits will pile up in gerrit before some committer runs > that script and pushes them to ASF. > > We could edit that script (bin/push_to_asf.py) to help with the cherry > picks, so that each time a commit is made, the committer must say > whether the commit goes in 2.x, 3.0, or both, but the commits are > often made by people who didn't author the patches, so they may not be > sure which branch to go in. > > Additionally, gerrit code review is intimately tied to the git repo. > Gerrit runs a git repo under-the-hood, and I believe that the branch > on gerrit's git that changes are cherry-picked to after pre-merge > testing is identical to the Impala git repo hosted by the ASF - down > to the hashes, even. If we think 2.x and 3.0 will diverge enough that > we'll want different code reviews for different branches, then we > might want two different branches on gerrit, too. >
Re: .test file format
It is possible to get the test infra to generate .test files based on the actual output. For planner tests, they automatically get written to logs/fe_tests/PlannerTest/. If you're running end-to-end tests with impala-py.test, you can use the --update_results flag to get it to generate similar files for tests that fail. There are some more pitfalls here but it can be a useful tool. If you use these shortcuts it's still important to manually check the changes to the files. On Fri, Jan 12, 2018 at 9:03 AM, Jim Applewrote: > .test files are written by hand, with the exception of long complex > input, which I'm sure authors copy-paste from Impala shell sometimes. > > The formats are described in > > https://github.com/apache/impala/blob/master/fe/src/ > test/java/org/apache/impala/testutil/TestFileParser.java > > and > > https://github.com/apache/impala/blob/master/tests/util/ > test_file_parser.py > > On Fri, Jan 12, 2018 at 8:57 AM, Xinran Yu Tinney > wrote: > > Hi, > >Is there a link for how to generate the .test files and what is each > > part in it? Thanks! > > > > Xinran >
Re: .test file format
.test files are written by hand, with the exception of long complex input, which I'm sure authors copy-paste from Impala shell sometimes. The formats are described in https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java and https://github.com/apache/impala/blob/master/tests/util/test_file_parser.py On Fri, Jan 12, 2018 at 8:57 AM, Xinran Yu Tinneywrote: > Hi, >Is there a link for how to generate the .test files and what is each > part in it? Thanks! > > Xinran
Re: Accumulo Integration [SEC=UNOFFICIAL]
I don't know that Impala can scan Accumulo tables. Assuming it can't: Impala can read HBase tables, so that might be a good place for you to start looking on what this could look like. I suspect the most complex piece is going to be the auth integration. On Thu, Jan 11, 2018 at 7:16 PM, Dickson, Matt MRwrote: > UNOFFICIAL > I'm investigate integrating Accumulo as a backed to Impala. At this point it > is simply an idea and I'm interested to know if anything has been done to > achieve this in the past/present? > > If so, is the code open source? > > If not, I'd be very keen to gain some detail on how this could be achieved or > issues that prevent it being feasible. > > Thanks in advance, > Matt >
.test file format
Hi, Is there a link for how to generate the .test files and what is each part in it? Thanks! Xinran