[Impala-ASF-CR] IMPALA-7061: Rework HBase splitting and assignment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10447 ) Change subject: IMPALA-7061: Rework HBase splitting and assignment .. Patch Set 9: This change did not cherrypick successfully into branch 2.x. To resolve this, please do the cherry-pick manually and submit it to Gerrit at refs/for/2.x or add an exception to the branch 2.x copy of bin/ignored_commits.json. Thanks, your friendly bot at https://jenkins.impala.io/job/cherrypick-2.x-and-test/557/ . -- To view, visit http://gerrit.cloudera.org:8080/10447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897 Gerrit-Change-Number: 10447 Gerrit-PatchSet: 9 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 30 May 2018 05:03:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7061: Rework HBase splitting and assignment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10447 ) Change subject: IMPALA-7061: Rework HBase splitting and assignment .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/10447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897 Gerrit-Change-Number: 10447 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 25 May 2018 00:28:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7061: Rework HBase splitting and assignment
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10447 ) Change subject: IMPALA-7061: Rework HBase splitting and assignment .. IMPALA-7061: Rework HBase splitting and assignment Some frontend PlannerTests rely on HBase tables being arranged in a deterministic way. Specifically, the HBase tables need to be split with specific region boundaries and those regions need to be assigned to specific HBase region servers. Currently, the tables are created without splits and testdata/bin/split-hbase.sh runs Java code in HBaseTestDataRegionAssignment to split and assign the tables. This runs during dataload via testdata/bin/create-load-data.sh and during tests with bin/run-all-tests.sh. There are problems with both parts of this process. The table splitting is flaky. Since significant time can pass between the assignments and the tests, rebalancing means the assignments are not always stable. This changes the process so that the HBase tables are created with the splits already specified via the HBase shell. The splits remain stable over time. PlannerTestBase runs the assignment code in HBaseTestDataRegionAssignment at the start of the PlannerTests. This makes the assignments deterministic. No other tests depends on the exact assignments, so this does not regress anything. Testing: - Local testing - Ran gerrit-verify-dryrun-external - Verified minicluster profile 2 compiles Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897 Reviewed-on: http://gerrit.cloudera.org:8080/10447 Reviewed-by: Philip Zeyliger Tested-by: Impala Public Jenkins --- M bin/run-all-tests.sh A fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java A fe/src/compat-minicluster-profile-3/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M testdata/bin/create-load-data.sh M testdata/bin/generate-schema-statements.py D testdata/bin/split-hbase.sh M testdata/datasets/functional/functional_schema_template.sql D testdata/src/compat-minicluster-profile-2/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssigment.java D testdata/src/compat-minicluster-profile-3/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssigment.java 10 files changed, 329 insertions(+), 728 deletions(-) Approvals: Philip Zeyliger: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/10447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897 Gerrit-Change-Number: 10447 Gerrit-PatchSet: 9 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-7061: Rework HBase splitting and assignment
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10447 ) Change subject: IMPALA-7061: Rework HBase splitting and assignment .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2550/ -- To view, visit http://gerrit.cloudera.org:8080/10447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897 Gerrit-Change-Number: 10447 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 24 May 2018 18:15:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7061: Rework HBase splitting and assignment
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/10447 ) Change subject: IMPALA-7061: Rework HBase splitting and assignment .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897 Gerrit-Change-Number: 10447 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 24 May 2018 18:12:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7061: Rework HBase splitting and assignment
Hello Philip Zeyliger, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10447 to look at the new patch set (#7). Change subject: IMPALA-7061: Rework HBase splitting and assignment .. IMPALA-7061: Rework HBase splitting and assignment Some frontend PlannerTests rely on HBase tables being arranged in a deterministic way. Specifically, the HBase tables need to be split with specific region boundaries and those regions need to be assigned to specific HBase region servers. Currently, the tables are created without splits and testdata/bin/split-hbase.sh runs Java code in HBaseTestDataRegionAssignment to split and assign the tables. This runs during dataload via testdata/bin/create-load-data.sh and during tests with bin/run-all-tests.sh. There are problems with both parts of this process. The table splitting is flaky. Since significant time can pass between the assignments and the tests, rebalancing means the assignments are not always stable. This changes the process so that the HBase tables are created with the splits already specified via the HBase shell. The splits remain stable over time. PlannerTestBase runs the assignment code in HBaseTestDataRegionAssignment at the start of the PlannerTests. This makes the assignments deterministic. No other tests depends on the exact assignments, so this does not regress anything. Testing: - Local testing - Ran gerrit-verify-dryrun-external - Verified minicluster profile 2 compiles Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897 --- M bin/run-all-tests.sh A fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java A fe/src/compat-minicluster-profile-3/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M testdata/bin/create-load-data.sh M testdata/bin/generate-schema-statements.py D testdata/bin/split-hbase.sh M testdata/datasets/functional/functional_schema_template.sql D testdata/src/compat-minicluster-profile-2/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssigment.java D testdata/src/compat-minicluster-profile-3/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssigment.java 10 files changed, 329 insertions(+), 728 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/10447/7 -- To view, visit http://gerrit.cloudera.org:8080/10447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897 Gerrit-Change-Number: 10447 Gerrit-PatchSet: 7 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-7061: Rework HBase splitting and assignment
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/10447 ) Change subject: IMPALA-7061: Rework HBase splitting and assignment .. Patch Set 6: (5 comments) http://gerrit.cloudera.org:8080/#/c/10447/6/fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java File fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java: http://gerrit.cloudera.org:8080/#/c/10447/6/fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java@48 PS6, Line 48: * Splits HBase tables into regions and deterministically assigns regions to region > Gerrit is lame about not detecting the rename here and showing a useful dif Updated the comments. This is only removing the splitting code. It is otherwise almost identical. http://gerrit.cloudera.org:8080/#/c/10447/6/fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java@142 PS6, Line 142: public static String printKey(byte[] key) { > Remove this and use fe/src/main/java/org/apache/impala/planner/HBaseScanNod Done http://gerrit.cloudera.org:8080/#/c/10447/6/fe/src/compat-minicluster-profile-3/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java File fe/src/compat-minicluster-profile-3/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java: http://gerrit.cloudera.org:8080/#/c/10447/6/fe/src/compat-minicluster-profile-3/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java@168 PS6, Line 168: public static String printKey(byte[] key) { > same here; we have another copy of this. Done http://gerrit.cloudera.org:8080/#/c/10447/6/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/10447/6/testdata/datasets/functional/functional_schema_template.sql@144 PS6, Line 144: True > Instead of this being a boolean, do you want to put the splits values here? Switched this to be HBASE_REGION_SPLITS and specified the splits directly. http://gerrit.cloudera.org:8080/#/c/10447/6/testdata/datasets/functional/functional_schema_template.sql@521 PS6, Line 521: True > same Done -- To view, visit http://gerrit.cloudera.org:8080/10447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897 Gerrit-Change-Number: 10447 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 23 May 2018 23:36:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7061: Rework HBase splitting and assignment
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/10447 ) Change subject: IMPALA-7061: Rework HBase splitting and assignment .. Patch Set 6: (5 comments) Thanks for tackling this. This looks great. I didn't manually look at the diff of HBaseTestDataRegionAssignment, but I assume you just removed the splitting stuff. If you did more interesting surgery, let me know, and I'll take a more careful look. I think we can move the splitpoints into the template.sql file, but I'm ok either way. http://gerrit.cloudera.org:8080/#/c/10447/6/fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java File fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java: http://gerrit.cloudera.org:8080/#/c/10447/6/fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java@48 PS6, Line 48: * Splits HBase tables into regions and deterministically assigns regions to region Gerrit is lame about not detecting the rename here and showing a useful diff. It looks like you removed the splitting part of the code. if so, update the comments? http://gerrit.cloudera.org:8080/#/c/10447/6/fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java@142 PS6, Line 142: public static String printKey(byte[] key) { Remove this and use fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java's version? http://gerrit.cloudera.org:8080/#/c/10447/6/fe/src/compat-minicluster-profile-3/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java File fe/src/compat-minicluster-profile-3/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java: http://gerrit.cloudera.org:8080/#/c/10447/6/fe/src/compat-minicluster-profile-3/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java@168 PS6, Line 168: public static String printKey(byte[] key) { same here; we have another copy of this. http://gerrit.cloudera.org:8080/#/c/10447/6/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/10447/6/testdata/datasets/functional/functional_schema_template.sql@144 PS6, Line 144: True Instead of this being a boolean, do you want to put the splits values here? It means that there will be two copies (for the two relevant table), but that seems correct. http://gerrit.cloudera.org:8080/#/c/10447/6/testdata/datasets/functional/functional_schema_template.sql@521 PS6, Line 521: True same -- To view, visit http://gerrit.cloudera.org:8080/10447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897 Gerrit-Change-Number: 10447 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 23 May 2018 19:33:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7061: Rework HBase splitting and assignment
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/10447 ) Change subject: IMPALA-7061: Rework HBase splitting and assignment .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/10447/6/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: http://gerrit.cloudera.org:8080/#/c/10447/6/testdata/bin/generate-schema-statements.py@487 PS6, Line 487: {SPLITS => ['1', '3', '5', '7', '9']}" Thinking about how to avoid hard-coding this (or at least making it clear what is happening) -- To view, visit http://gerrit.cloudera.org:8080/10447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897 Gerrit-Change-Number: 10447 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 23 May 2018 16:53:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7061: Rework HBase splitting and assignment
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10447 to look at the new patch set (#5). Change subject: IMPALA-7061: Rework HBase splitting and assignment .. IMPALA-7061: Rework HBase splitting and assignment Some frontend PlannerTests rely on HBase tables being arranged in a deterministic way. Specifically, the HBase tables need to be split with specific region boundaries and those regions need to be assigned to specific HBase region servers. Currently, the tables are created without splits and testdata/bin/split-hbase.sh runs Java code in HBaseTestDataRegionAssignment to split and assign the tables. This runs during dataload via testdata/bin/create-load-data.sh and during tests with bin/run-all-tests.sh. There are problems with both parts of this process. The table splitting is flaky. Since significant time can pass between the assignments and the tests, rebalancing means the assignments are not always stable. This changes the process so that the HBase tables are created with the splits already specified via the HBase shell. The splits remain stable over time. PlannerTestBase runs the assignment code in HBaseTestDataRegionAssignment at the start of the PlannerTests. This makes the assignments deterministic. No other tests depends on the exact assignments, so this does not regress anything. Testing: - Local testing - Ran gerrit-verify-dryrun-external - Verified minicluster profile 2 compiles Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897 --- M bin/run-all-tests.sh A fe/src/compat-minicluster-profile-2/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java A fe/src/compat-minicluster-profile-3/test/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssignment.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M testdata/bin/create-load-data.sh M testdata/bin/generate-schema-statements.py D testdata/bin/split-hbase.sh M testdata/datasets/functional/functional_schema_template.sql D testdata/src/compat-minicluster-profile-2/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssigment.java D testdata/src/compat-minicluster-profile-3/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssigment.java 10 files changed, 362 insertions(+), 728 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/10447/5 -- To view, visit http://gerrit.cloudera.org:8080/10447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3d639128a856254a6ccb93d6750f531974b5f897 Gerrit-Change-Number: 10447 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins