[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..


IMPALA-4762: RECOVER PARTITIONS should batch partition updates

Batch updates when doing a RECOVER PARTITIONS on over 500
partitions at a time to avoid HMS timeouts, possible OOM.

Testing: Expanded test coverage with a new python test
for this case.  Test takes ~18s to run.

Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Reviewed-on: http://gerrit.cloudera.org:8080/6275
Reviewed-by: Dimitris Tsirogiannis 
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/metadata/test_recover_partitions.py
2 files changed, 59 insertions(+), 26 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Dimitris Tsirogiannis: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/372/

-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-07 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6275/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS2, Line 2661: updateLastDdlTime(msTbl, msClient);
> If we get some kind of exception during a later round of processing, it is 
Unfortunately, the whole operation is far from being atomic but I see your 
point :)


-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-07 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6275/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS2, Line 2661: updateLastDdlTime(msTbl, msClient);
> Do we need to call this for every batch? I believe it is sufficient to call
If we get some kind of exception during a later round of processing, it is 
better to have updated the time, and this could reasonably happen with a 
timeout or network error.


http://gerrit.cloudera.org:8080/#/c/6275/2/tests/metadata/test_recover_partitions.py
File tests/metadata/test_recover_partitions.py:

PS2, Line 190: for i in xrange(1, 700):
 : PART_DIR = "part%d\t" % i
 : assert not self.has_value(PART_DIR, result.data)
> This check may be an overkill since no partitions were added to this table.
It does return one row, the 'Total', which is confusing as it isn't clear this 
is intended to be machine readable.  I'd rather leave the check as is since it 
is more obviously correct (and also fast, this is a purely local operation).


-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-07 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6275/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS2, Line 2661: updateLastDdlTime(msTbl, msClient);
Do we need to call this for every batch? I believe it is sufficient to call it 
once at the end.


http://gerrit.cloudera.org:8080/#/c/6275/2/tests/metadata/test_recover_partitions.py
File tests/metadata/test_recover_partitions.py:

PS2, Line 190: for i in xrange(1, 700):
 : PART_DIR = "part%d\t" % i
 : assert not self.has_value(PART_DIR, result.data)
This check may be an overkill since no partitions were added to this table. Why 
don't you simply verify that show partitions doesn't return any rows?


-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-06 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new patch set (#2).

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..

IMPALA-4762: RECOVER PARTITIONS should batch partition updates

Batch updates when doing a RECOVER PARTITIONS on over 500
partitions at a time to avoid HMS timeouts, possible OOM.

Testing: Expanded test coverage with a new python test
for this case.  Test takes ~18s to run.

Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/metadata/test_recover_partitions.py
2 files changed, 59 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/6275/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-06 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6275/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS1, Line 2641: // ifNotExists and needResults are true.
> nit: move comment above L2644
fixed


PS1, Line 2646: } catch (AlreadyExistsException e) {
  : throw new InternalException(
  : "AlreadyExistsException thrown although 
ifNotExists given", e);
  : }
> I don't think we need to handle this case separately. Let the outer catch h
killed it


http://gerrit.cloudera.org:8080/#/c/6275/1/tests/metadata/test_recover_partitions.py
File tests/metadata/test_recover_partitions.py:

PS1, Line 171: test_recover_many_partitions
> This test may be too expensive. How much time does it take to run? See my c
~30 seconds or so..


PS1, Line 182: # Preload 10% of the partitions with partition(i) = values(i)
 : for i in xrange(1, 700, 10):
 : self.execute_query_expect_success(self.client,
 : "INSERT INTO TABLE %s PARTITION(s='part%d') 
VALUES(%d)" % (FQ_TBL_NAME, i, i))
> I think we can live without this part.
agreed.


PS1, Line 193: self.filesystem_client.create_file(TBL_LOCATION + PART_DIR + 
FILE_PATH,
 :INSERTED_VALUE)
> It should be sufficient to just create the directory, no?
if I don't care about the values at all, this should be faster


PS1, Line 196: result = self.execute_query_expect_success(self.client,
 : "SHOW PARTITIONS %s" % FQ_TBL_NAME)
 : for i in xrange(1, 700):
 : PART_DIR = "part%d\t" % i
 : if (i % 10 == 1):
 : assert self.has_value(PART_DIR, result.data)
 : else:
 : assert not self.has_value(PART_DIR, result.data)
 : 
 : self.execute_query_expect_success(self.client,
 : "ALTER TABLE %s RECOVER PARTITIONS" % FQ_TBL_NAME)
 : result = self.execute_query_expect_success(self.client,
 : "select c from %s" % FQ_TBL_NAME)
 : for i in xrange(1, 700):
 : if (i % 10 == 1):
 : assert self.has_value(str(i), result.data)
 : else:
 : assert self.has_value(str(-i), result.data)
> I would just do: a) show partitions and verify 0, b) recover partitions, c)
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-06 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6275/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS1, Line 2641: // ifNotExists and needResults are true.
nit: move comment above L2644


PS1, Line 2646: } catch (AlreadyExistsException e) {
  : throw new InternalException(
  : "AlreadyExistsException thrown although 
ifNotExists given", e);
  : }
I don't think we need to handle this case separately. Let the outer catch 
handle it. Besides, we're not doing anything special here.


http://gerrit.cloudera.org:8080/#/c/6275/1/tests/metadata/test_recover_partitions.py
File tests/metadata/test_recover_partitions.py:

PS1, Line 171: test_recover_many_partitions
This test may be too expensive. How much time does it take to run? See my 
comments below for ideas on how to trim it a bit.


PS1, Line 182: # Preload 10% of the partitions with partition(i) = values(i)
 : for i in xrange(1, 700, 10):
 : self.execute_query_expect_success(self.client,
 : "INSERT INTO TABLE %s PARTITION(s='part%d') 
VALUES(%d)" % (FQ_TBL_NAME, i, i))
I think we can live without this part.


PS1, Line 193: self.filesystem_client.create_file(TBL_LOCATION + PART_DIR + 
FILE_PATH,
 :INSERTED_VALUE)
It should be sufficient to just create the directory, no?


PS1, Line 196: result = self.execute_query_expect_success(self.client,
 : "SHOW PARTITIONS %s" % FQ_TBL_NAME)
 : for i in xrange(1, 700):
 : PART_DIR = "part%d\t" % i
 : if (i % 10 == 1):
 : assert self.has_value(PART_DIR, result.data)
 : else:
 : assert not self.has_value(PART_DIR, result.data)
 : 
 : self.execute_query_expect_success(self.client,
 : "ALTER TABLE %s RECOVER PARTITIONS" % FQ_TBL_NAME)
 : result = self.execute_query_expect_success(self.client,
 : "select c from %s" % FQ_TBL_NAME)
 : for i in xrange(1, 700):
 : if (i % 10 == 1):
 : assert self.has_value(str(i), result.data)
 : else:
 : assert self.has_value(str(-i), result.data)
I would just do: a) show partitions and verify 0, b) recover partitions, c) 
show partitions and verify 700.


-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-06 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6275

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..

IMPALA-4762: RECOVER PARTITIONS should batch partition updates

Batch updates when doing a RECOVER PARTITIONS on over 500
partitions at a time to avoid HMS timeouts, possible OOM.

Testing: Expanded test coverage with a new python test
for this case.

Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/metadata/test_recover_partitions.py
2 files changed, 79 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/6275/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden