[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates .. IMPALA-4762: RECOVER PARTITIONS should batch partition updates Batch updates when doing a RECOVER PARTITIONS on over 500 partitions at a time to avoid HMS timeouts, possible OOM. Testing: Expanded test coverage with a new python test for this case. Test takes ~18s to run. Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733 Reviewed-on: http://gerrit.cloudera.org:8080/6275 Reviewed-by: Dimitris Tsirogiannis Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_recover_partitions.py 2 files changed, 59 insertions(+), 26 deletions(-) Approvals: Impala Public Jenkins: Verified Dimitris Tsirogiannis: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/372/ -- To view, visit http://gerrit.cloudera.org:8080/6275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates .. Patch Set 2: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/6275/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: PS2, Line 2661: updateLastDdlTime(msTbl, msClient); > If we get some kind of exception during a later round of processing, it is Unfortunately, the whole operation is far from being atomic but I see your point :) -- To view, visit http://gerrit.cloudera.org:8080/6275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates
Zach Amsden has posted comments on this change. Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/6275/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: PS2, Line 2661: updateLastDdlTime(msTbl, msClient); > Do we need to call this for every batch? I believe it is sufficient to call If we get some kind of exception during a later round of processing, it is better to have updated the time, and this could reasonably happen with a timeout or network error. http://gerrit.cloudera.org:8080/#/c/6275/2/tests/metadata/test_recover_partitions.py File tests/metadata/test_recover_partitions.py: PS2, Line 190: for i in xrange(1, 700): : PART_DIR = "part%d\t" % i : assert not self.has_value(PART_DIR, result.data) > This check may be an overkill since no partitions were added to this table. It does return one row, the 'Total', which is confusing as it isn't clear this is intended to be machine readable. I'd rather leave the check as is since it is more obviously correct (and also fast, this is a purely local operation). -- To view, visit http://gerrit.cloudera.org:8080/6275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/6275/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: PS2, Line 2661: updateLastDdlTime(msTbl, msClient); Do we need to call this for every batch? I believe it is sufficient to call it once at the end. http://gerrit.cloudera.org:8080/#/c/6275/2/tests/metadata/test_recover_partitions.py File tests/metadata/test_recover_partitions.py: PS2, Line 190: for i in xrange(1, 700): : PART_DIR = "part%d\t" % i : assert not self.has_value(PART_DIR, result.data) This check may be an overkill since no partitions were added to this table. Why don't you simply verify that show partitions doesn't return any rows? -- To view, visit http://gerrit.cloudera.org:8080/6275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates
Zach Amsden has uploaded a new patch set (#2). Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates .. IMPALA-4762: RECOVER PARTITIONS should batch partition updates Batch updates when doing a RECOVER PARTITIONS on over 500 partitions at a time to avoid HMS timeouts, possible OOM. Testing: Expanded test coverage with a new python test for this case. Test takes ~18s to run. Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733 --- M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_recover_partitions.py 2 files changed, 59 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/6275/2 -- To view, visit http://gerrit.cloudera.org:8080/6275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates
Zach Amsden has posted comments on this change. Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/6275/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: PS1, Line 2641: // ifNotExists and needResults are true. > nit: move comment above L2644 fixed PS1, Line 2646: } catch (AlreadyExistsException e) { : throw new InternalException( : "AlreadyExistsException thrown although ifNotExists given", e); : } > I don't think we need to handle this case separately. Let the outer catch h killed it http://gerrit.cloudera.org:8080/#/c/6275/1/tests/metadata/test_recover_partitions.py File tests/metadata/test_recover_partitions.py: PS1, Line 171: test_recover_many_partitions > This test may be too expensive. How much time does it take to run? See my c ~30 seconds or so.. PS1, Line 182: # Preload 10% of the partitions with partition(i) = values(i) : for i in xrange(1, 700, 10): : self.execute_query_expect_success(self.client, : "INSERT INTO TABLE %s PARTITION(s='part%d') VALUES(%d)" % (FQ_TBL_NAME, i, i)) > I think we can live without this part. agreed. PS1, Line 193: self.filesystem_client.create_file(TBL_LOCATION + PART_DIR + FILE_PATH, :INSERTED_VALUE) > It should be sufficient to just create the directory, no? if I don't care about the values at all, this should be faster PS1, Line 196: result = self.execute_query_expect_success(self.client, : "SHOW PARTITIONS %s" % FQ_TBL_NAME) : for i in xrange(1, 700): : PART_DIR = "part%d\t" % i : if (i % 10 == 1): : assert self.has_value(PART_DIR, result.data) : else: : assert not self.has_value(PART_DIR, result.data) : : self.execute_query_expect_success(self.client, : "ALTER TABLE %s RECOVER PARTITIONS" % FQ_TBL_NAME) : result = self.execute_query_expect_success(self.client, : "select c from %s" % FQ_TBL_NAME) : for i in xrange(1, 700): : if (i % 10 == 1): : assert self.has_value(str(i), result.data) : else: : assert self.has_value(str(-i), result.data) > I would just do: a) show partitions and verify 0, b) recover partitions, c) Done -- To view, visit http://gerrit.cloudera.org:8080/6275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/6275/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: PS1, Line 2641: // ifNotExists and needResults are true. nit: move comment above L2644 PS1, Line 2646: } catch (AlreadyExistsException e) { : throw new InternalException( : "AlreadyExistsException thrown although ifNotExists given", e); : } I don't think we need to handle this case separately. Let the outer catch handle it. Besides, we're not doing anything special here. http://gerrit.cloudera.org:8080/#/c/6275/1/tests/metadata/test_recover_partitions.py File tests/metadata/test_recover_partitions.py: PS1, Line 171: test_recover_many_partitions This test may be too expensive. How much time does it take to run? See my comments below for ideas on how to trim it a bit. PS1, Line 182: # Preload 10% of the partitions with partition(i) = values(i) : for i in xrange(1, 700, 10): : self.execute_query_expect_success(self.client, : "INSERT INTO TABLE %s PARTITION(s='part%d') VALUES(%d)" % (FQ_TBL_NAME, i, i)) I think we can live without this part. PS1, Line 193: self.filesystem_client.create_file(TBL_LOCATION + PART_DIR + FILE_PATH, :INSERTED_VALUE) It should be sufficient to just create the directory, no? PS1, Line 196: result = self.execute_query_expect_success(self.client, : "SHOW PARTITIONS %s" % FQ_TBL_NAME) : for i in xrange(1, 700): : PART_DIR = "part%d\t" % i : if (i % 10 == 1): : assert self.has_value(PART_DIR, result.data) : else: : assert not self.has_value(PART_DIR, result.data) : : self.execute_query_expect_success(self.client, : "ALTER TABLE %s RECOVER PARTITIONS" % FQ_TBL_NAME) : result = self.execute_query_expect_success(self.client, : "select c from %s" % FQ_TBL_NAME) : for i in xrange(1, 700): : if (i % 10 == 1): : assert self.has_value(str(i), result.data) : else: : assert self.has_value(str(-i), result.data) I would just do: a) show partitions and verify 0, b) recover partitions, c) show partitions and verify 700. -- To view, visit http://gerrit.cloudera.org:8080/6275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates
Zach Amsden has uploaded a new change for review. http://gerrit.cloudera.org:8080/6275 Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates .. IMPALA-4762: RECOVER PARTITIONS should batch partition updates Batch updates when doing a RECOVER PARTITIONS on over 500 partitions at a time to avoid HMS timeouts, possible OOM. Testing: Expanded test coverage with a new python test for this case. Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733 --- M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_recover_partitions.py 2 files changed, 79 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/6275/1 -- To view, visit http://gerrit.cloudera.org:8080/6275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden