[Impala-ASF-CR] IMPALA-7061: Rework HBase splitting and assignment

2018-05-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10447 )

Change subject: IMPALA-7061: Rework HBase splitting and assignment
..


Patch Set 9:

This change did not cherrypick successfully into branch 2.x. To resolve this, 
please do the cherry-pick manually and submit it to Gerrit at refs/for/2.x or 
add an exception to the branch 2.x copy of bin/ignored_commits.json. Thanks, 
your friendly bot at https://jenkins.impala.io/job/cherrypick-2.x-and-test/557/ 
.


--
To view, visit http://gerrit.cloudera.org:8080/10447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897
Gerrit-Change-Number: 10447
Gerrit-PatchSet: 9
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 30 May 2018 05:03:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7061: Rework HBase splitting and assignment

2018-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10447 )

Change subject: IMPALA-7061: Rework HBase splitting and assignment
..


Patch Set 8: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/10447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897
Gerrit-Change-Number: 10447
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Fri, 25 May 2018 00:28:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7061: Rework HBase splitting and assignment

2018-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/10447 )

Change subject: IMPALA-7061: Rework HBase splitting and assignment
..

IMPALA-7061: Rework HBase splitting and assignment

Some frontend PlannerTests rely on HBase tables being
arranged in a deterministic way. Specifically, the
HBase tables need to be split with specific region
boundaries and those regions need to be assigned to
specific HBase region servers.

Currently, the tables are created without splits and
testdata/bin/split-hbase.sh runs Java code in
HBaseTestDataRegionAssignment to split and assign
the tables. This runs during dataload via
testdata/bin/create-load-data.sh and during tests
with bin/run-all-tests.sh. There are problems with
both parts of this process. The table splitting is
flaky. Since significant time can pass between the
assignments and the tests, rebalancing means the
assignments are not always stable.

This changes the process so that the HBase tables are
created with the splits already specified via the
HBase shell. The splits remain stable over time.
PlannerTestBase runs the assignment code in
HBaseTestDataRegionAssignment at the start of
the PlannerTests. This makes the assignments
deterministic. No other tests depends on the
exact assignments, so this does not regress anything.

Testing:
 - Local testing
 - Ran gerrit-verify-dryrun-external
 - Verified minicluster profile 2 compiles

Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897
Reviewed-on: http://gerrit.cloudera.org:8080/10447
Reviewed-by: Philip Zeyliger 
Tested-by: Impala Public Jenkins 
---
M bin/run-all-tests.sh
A 
fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java
A 
fe/src/compat-minicluster-profile-3/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M testdata/bin/create-load-data.sh
M testdata/bin/generate-schema-statements.py
D testdata/bin/split-hbase.sh
M testdata/datasets/functional/functional_schema_template.sql
D 
testdata/src/compat-minicluster-profile-2/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssigment.java
D 
testdata/src/compat-minicluster-profile-3/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssigment.java
10 files changed, 329 insertions(+), 728 deletions(-)

Approvals:
  Philip Zeyliger: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/10447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897
Gerrit-Change-Number: 10447
Gerrit-PatchSet: 9
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-7061: Rework HBase splitting and assignment

2018-05-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10447 )

Change subject: IMPALA-7061: Rework HBase splitting and assignment
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2550/


--
To view, visit http://gerrit.cloudera.org:8080/10447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897
Gerrit-Change-Number: 10447
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 24 May 2018 18:15:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7061: Rework HBase splitting and assignment

2018-05-24 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10447 )

Change subject: IMPALA-7061: Rework HBase splitting and assignment
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897
Gerrit-Change-Number: 10447
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 24 May 2018 18:12:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7061: Rework HBase splitting and assignment

2018-05-23 Thread Joe McDonnell (Code Review)
Hello Philip Zeyliger, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10447

to look at the new patch set (#7).

Change subject: IMPALA-7061: Rework HBase splitting and assignment
..

IMPALA-7061: Rework HBase splitting and assignment

Some frontend PlannerTests rely on HBase tables being
arranged in a deterministic way. Specifically, the
HBase tables need to be split with specific region
boundaries and those regions need to be assigned to
specific HBase region servers.

Currently, the tables are created without splits and
testdata/bin/split-hbase.sh runs Java code in
HBaseTestDataRegionAssignment to split and assign
the tables. This runs during dataload via
testdata/bin/create-load-data.sh and during tests
with bin/run-all-tests.sh. There are problems with
both parts of this process. The table splitting is
flaky. Since significant time can pass between the
assignments and the tests, rebalancing means the
assignments are not always stable.

This changes the process so that the HBase tables are
created with the splits already specified via the
HBase shell. The splits remain stable over time.
PlannerTestBase runs the assignment code in
HBaseTestDataRegionAssignment at the start of
the PlannerTests. This makes the assignments
deterministic. No other tests depends on the
exact assignments, so this does not regress anything.

Testing:
 - Local testing
 - Ran gerrit-verify-dryrun-external
 - Verified minicluster profile 2 compiles

Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897
---
M bin/run-all-tests.sh
A 
fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java
A 
fe/src/compat-minicluster-profile-3/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M testdata/bin/create-load-data.sh
M testdata/bin/generate-schema-statements.py
D testdata/bin/split-hbase.sh
M testdata/datasets/functional/functional_schema_template.sql
D 
testdata/src/compat-minicluster-profile-2/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssigment.java
D 
testdata/src/compat-minicluster-profile-3/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssigment.java
10 files changed, 329 insertions(+), 728 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/10447/7
--
To view, visit http://gerrit.cloudera.org:8080/10447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897
Gerrit-Change-Number: 10447
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-7061: Rework HBase splitting and assignment

2018-05-23 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10447 )

Change subject: IMPALA-7061: Rework HBase splitting and assignment
..


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/10447/6/fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java
File 
fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java:

http://gerrit.cloudera.org:8080/#/c/10447/6/fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java@48
PS6, Line 48:  * Splits HBase tables into regions and deterministically assigns 
regions to region
> Gerrit is lame about not detecting the rename here and showing a useful dif
Updated the comments. This is only removing the splitting code. It is otherwise 
almost identical.


http://gerrit.cloudera.org:8080/#/c/10447/6/fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java@142
PS6, Line 142:   public static String printKey(byte[] key) {
> Remove this and use fe/src/main/java/org/apache/impala/planner/HBaseScanNod
Done


http://gerrit.cloudera.org:8080/#/c/10447/6/fe/src/compat-minicluster-profile-3/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java
File 
fe/src/compat-minicluster-profile-3/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java:

http://gerrit.cloudera.org:8080/#/c/10447/6/fe/src/compat-minicluster-profile-3/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java@168
PS6, Line 168:   public static String printKey(byte[] key) {
> same here; we have another copy of this.
Done


http://gerrit.cloudera.org:8080/#/c/10447/6/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/10447/6/testdata/datasets/functional/functional_schema_template.sql@144
PS6, Line 144: True
> Instead of this being a boolean, do you want to put the splits values here?
Switched this to be HBASE_REGION_SPLITS and specified the splits directly.


http://gerrit.cloudera.org:8080/#/c/10447/6/testdata/datasets/functional/functional_schema_template.sql@521
PS6, Line 521: True
> same
Done



--
To view, visit http://gerrit.cloudera.org:8080/10447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897
Gerrit-Change-Number: 10447
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 23 May 2018 23:36:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7061: Rework HBase splitting and assignment

2018-05-23 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10447 )

Change subject: IMPALA-7061: Rework HBase splitting and assignment
..


Patch Set 6:

(5 comments)

Thanks for tackling this. This looks great. I didn't manually look at the diff 
of HBaseTestDataRegionAssignment, but I assume you just removed the splitting 
stuff. If you did more interesting surgery, let me know, and I'll take a more 
careful look.

I think we can move the splitpoints into the template.sql file, but I'm ok 
either way.

http://gerrit.cloudera.org:8080/#/c/10447/6/fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java
File 
fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java:

http://gerrit.cloudera.org:8080/#/c/10447/6/fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java@48
PS6, Line 48:  * Splits HBase tables into regions and deterministically assigns 
regions to region
Gerrit is lame about not detecting the rename here and showing a useful diff. 
It looks like you removed the splitting part of the code. if so, update the 
comments?


http://gerrit.cloudera.org:8080/#/c/10447/6/fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java@142
PS6, Line 142:   public static String printKey(byte[] key) {
Remove this and use 
fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java's version?


http://gerrit.cloudera.org:8080/#/c/10447/6/fe/src/compat-minicluster-profile-3/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java
File 
fe/src/compat-minicluster-profile-3/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java:

http://gerrit.cloudera.org:8080/#/c/10447/6/fe/src/compat-minicluster-profile-3/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java@168
PS6, Line 168:   public static String printKey(byte[] key) {
same here; we have another copy of this.


http://gerrit.cloudera.org:8080/#/c/10447/6/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/10447/6/testdata/datasets/functional/functional_schema_template.sql@144
PS6, Line 144: True
Instead of this being a boolean, do you want to put the splits values here? It 
means that there will be two copies (for the two relevant table), but that 
seems correct.


http://gerrit.cloudera.org:8080/#/c/10447/6/testdata/datasets/functional/functional_schema_template.sql@521
PS6, Line 521: True
same



--
To view, visit http://gerrit.cloudera.org:8080/10447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897
Gerrit-Change-Number: 10447
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 23 May 2018 19:33:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7061: Rework HBase splitting and assignment

2018-05-23 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10447 )

Change subject: IMPALA-7061: Rework HBase splitting and assignment
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10447/6/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/10447/6/testdata/bin/generate-schema-statements.py@487
PS6, Line 487: {SPLITS => ['1', '3', '5', '7', '9']}"
Thinking about how to avoid hard-coding this (or at least making it clear what 
is happening)



--
To view, visit http://gerrit.cloudera.org:8080/10447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897
Gerrit-Change-Number: 10447
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 23 May 2018 16:53:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7061: Rework HBase splitting and assignment

2018-05-23 Thread Joe McDonnell (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10447

to look at the new patch set (#5).

Change subject: IMPALA-7061: Rework HBase splitting and assignment
..

IMPALA-7061: Rework HBase splitting and assignment

Some frontend PlannerTests rely on HBase tables being
arranged in a deterministic way. Specifically, the
HBase tables need to be split with specific region
boundaries and those regions need to be assigned to
specific HBase region servers.

Currently, the tables are created without splits and
testdata/bin/split-hbase.sh runs Java code in
HBaseTestDataRegionAssignment to split and assign
the tables. This runs during dataload via
testdata/bin/create-load-data.sh and during tests
with bin/run-all-tests.sh. There are problems with
both parts of this process. The table splitting is
flaky. Since significant time can pass between the
assignments and the tests, rebalancing means the
assignments are not always stable.

This changes the process so that the HBase tables are
created with the splits already specified via the
HBase shell. The splits remain stable over time.
PlannerTestBase runs the assignment code in
HBaseTestDataRegionAssignment at the start of
the PlannerTests. This makes the assignments
deterministic. No other tests depends on the
exact assignments, so this does not regress anything.

Testing:
 - Local testing
 - Ran gerrit-verify-dryrun-external
 - Verified minicluster profile 2 compiles

Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897
---
M bin/run-all-tests.sh
A 
fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java
A 
fe/src/compat-minicluster-profile-3/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M testdata/bin/create-load-data.sh
M testdata/bin/generate-schema-statements.py
D testdata/bin/split-hbase.sh
M testdata/datasets/functional/functional_schema_template.sql
D 
testdata/src/compat-minicluster-profile-2/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssigment.java
D 
testdata/src/compat-minicluster-profile-3/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssigment.java
10 files changed, 362 insertions(+), 728 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/10447/5
--
To view, visit http://gerrit.cloudera.org:8080/10447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897
Gerrit-Change-Number: 10447
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins