Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13430 )

Change subject: [backup] KUDU-2786 Parallelize tables for backup and restore
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/13430/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13430/1//COMMIT_MSG@10
PS1, Line 10:  parallel
in parallel


http://gerrit.cloudera.org:8080/#/c/13430/1//COMMIT_MSG@11
PS1, Line 11: is very difficult
How would queueing and resource management of this be any different than doing 
a single backup or restore of a large table with hundreds or thousands of 
partitions?

The main thing I can think of is that one would have to allocate to Spark 
sufficient memory to handle backing up and restoring the widest table with the 
heaviest cells, so basically lowest common denominator == highest memory 
required. Anything else?


http://gerrit.cloudera.org:8080/#/c/13430/1//COMMIT_MSG@24
PS1, Line 24:
            : foo
            :
            : Change-Id: Ib02f26fbfd6a714ad0797f8b5ed1eeeb8fd6e371
            :
            : b
            :
            : Change-Id: I02f0a818a6fa372ab3c696c11882284877ce207e
nit: remove these remnants of a git squash


http://gerrit.cloudera.org:8080/#/c/13430/1/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala:

http://gerrit.cloudera.org:8080/#/c/13430/1/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@128
PS1, Line 128:     val pool = new ForkJoinPool(options.numParallelBackups) // 
Need a clean-up reference.
Can you talk a little about the tradeoffs involved in submitting parallel jobs 
vs adding support for running a single Spark job that handles multiple tables? 
The latter would seem more natural to me. I also wonder what the performance 
implications of fork() in the context of a driver running on YARN are, 
especially on RHEL 6.



--
To view, visit http://gerrit.cloudera.org:8080/13430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02f0a818a6fa372ab3c696c11882284877ce207e
Gerrit-Change-Number: 13430
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wdberke...@gmail.com>
Gerrit-Reviewer: Grant Henke <granthe...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Comment-Date: Tue, 28 May 2019 23:23:08 +0000
Gerrit-HasComments: Yes

Reply via email to