[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..


IMPALA-4762: RECOVER PARTITIONS should batch partition updates

Batch updates when doing a RECOVER PARTITIONS on over 500
partitions at a time to avoid HMS timeouts, possible OOM.

Testing: Expanded test coverage with a new python test
for this case.  Test takes ~18s to run.

Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Reviewed-on: http://gerrit.cloudera.org:8080/6275
Reviewed-by: Dimitris Tsirogiannis 
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/metadata/test_recover_partitions.py
2 files changed, 59 insertions(+), 26 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Dimitris Tsirogiannis: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/372/

-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-07 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6275/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS2, Line 2661: updateLastDdlTime(msTbl, msClient);
> If we get some kind of exception during a later round of processing, it is 
Unfortunately, the whole operation is far from being atomic but I see your 
point :)


-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-07 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6275/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS2, Line 2661: updateLastDdlTime(msTbl, msClient);
Do we need to call this for every batch? I believe it is sufficient to call it 
once at the end.


http://gerrit.cloudera.org:8080/#/c/6275/2/tests/metadata/test_recover_partitions.py
File tests/metadata/test_recover_partitions.py:

PS2, Line 190: for i in xrange(1, 700):
 : PART_DIR = "part%d\t" % i
 : assert not self.has_value(PART_DIR, result.data)
This check may be an overkill since no partitions were added to this table. Why 
don't you simply verify that show partitions doesn't return any rows?


-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-06 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6275/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS1, Line 2641: // ifNotExists and needResults are true.
> nit: move comment above L2644
fixed


PS1, Line 2646: } catch (AlreadyExistsException e) {
  : throw new InternalException(
  : "AlreadyExistsException thrown although 
ifNotExists given", e);
  : }
> I don't think we need to handle this case separately. Let the outer catch h
killed it


http://gerrit.cloudera.org:8080/#/c/6275/1/tests/metadata/test_recover_partitions.py
File tests/metadata/test_recover_partitions.py:

PS1, Line 171: test_recover_many_partitions
> This test may be too expensive. How much time does it take to run? See my c
~30 seconds or so..


PS1, Line 182: # Preload 10% of the partitions with partition(i) = values(i)
 : for i in xrange(1, 700, 10):
 : self.execute_query_expect_success(self.client,
 : "INSERT INTO TABLE %s PARTITION(s='part%d') 
VALUES(%d)" % (FQ_TBL_NAME, i, i))
> I think we can live without this part.
agreed.


PS1, Line 193: self.filesystem_client.create_file(TBL_LOCATION + PART_DIR + 
FILE_PATH,
 :INSERTED_VALUE)
> It should be sufficient to just create the directory, no?
if I don't care about the values at all, this should be faster


PS1, Line 196: result = self.execute_query_expect_success(self.client,
 : "SHOW PARTITIONS %s" % FQ_TBL_NAME)
 : for i in xrange(1, 700):
 : PART_DIR = "part%d\t" % i
 : if (i % 10 == 1):
 : assert self.has_value(PART_DIR, result.data)
 : else:
 : assert not self.has_value(PART_DIR, result.data)
 : 
 : self.execute_query_expect_success(self.client,
 : "ALTER TABLE %s RECOVER PARTITIONS" % FQ_TBL_NAME)
 : result = self.execute_query_expect_success(self.client,
 : "select c from %s" % FQ_TBL_NAME)
 : for i in xrange(1, 700):
 : if (i % 10 == 1):
 : assert self.has_value(str(i), result.data)
 : else:
 : assert self.has_value(str(-i), result.data)
> I would just do: a) show partitions and verify 0, b) recover partitions, c)
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-06 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6275/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS1, Line 2641: // ifNotExists and needResults are true.
nit: move comment above L2644


PS1, Line 2646: } catch (AlreadyExistsException e) {
  : throw new InternalException(
  : "AlreadyExistsException thrown although 
ifNotExists given", e);
  : }
I don't think we need to handle this case separately. Let the outer catch 
handle it. Besides, we're not doing anything special here.


http://gerrit.cloudera.org:8080/#/c/6275/1/tests/metadata/test_recover_partitions.py
File tests/metadata/test_recover_partitions.py:

PS1, Line 171: test_recover_many_partitions
This test may be too expensive. How much time does it take to run? See my 
comments below for ideas on how to trim it a bit.


PS1, Line 182: # Preload 10% of the partitions with partition(i) = values(i)
 : for i in xrange(1, 700, 10):
 : self.execute_query_expect_success(self.client,
 : "INSERT INTO TABLE %s PARTITION(s='part%d') 
VALUES(%d)" % (FQ_TBL_NAME, i, i))
I think we can live without this part.


PS1, Line 193: self.filesystem_client.create_file(TBL_LOCATION + PART_DIR + 
FILE_PATH,
 :INSERTED_VALUE)
It should be sufficient to just create the directory, no?


PS1, Line 196: result = self.execute_query_expect_success(self.client,
 : "SHOW PARTITIONS %s" % FQ_TBL_NAME)
 : for i in xrange(1, 700):
 : PART_DIR = "part%d\t" % i
 : if (i % 10 == 1):
 : assert self.has_value(PART_DIR, result.data)
 : else:
 : assert not self.has_value(PART_DIR, result.data)
 : 
 : self.execute_query_expect_success(self.client,
 : "ALTER TABLE %s RECOVER PARTITIONS" % FQ_TBL_NAME)
 : result = self.execute_query_expect_success(self.client,
 : "select c from %s" % FQ_TBL_NAME)
 : for i in xrange(1, 700):
 : if (i % 10 == 1):
 : assert self.has_value(str(i), result.data)
 : else:
 : assert self.has_value(str(-i), result.data)
I would just do: a) show partitions and verify 0, b) recover partitions, c) 
show partitions and verify 700.


-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-06 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6275

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..

IMPALA-4762: RECOVER PARTITIONS should batch partition updates

Batch updates when doing a RECOVER PARTITIONS on over 500
partitions at a time to avoid HMS timeouts, possible OOM.

Testing: Expanded test coverage with a new python test
for this case.

Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/metadata/test_recover_partitions.py
2 files changed, 79 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/6275/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden