Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13430 )
Change subject: [backup] KUDU-2786 Parallelize tables for backup and restore ...................................................................... Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/13430/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13430/1//COMMIT_MSG@10 PS1, Line 10: parallel in parallel http://gerrit.cloudera.org:8080/#/c/13430/1//COMMIT_MSG@11 PS1, Line 11: is very difficult How would queueing and resource management of this be any different than doing a single backup or restore of a large table with hundreds or thousands of partitions? The main thing I can think of is that one would have to allocate to Spark sufficient memory to handle backing up and restoring the widest table with the heaviest cells, so basically lowest common denominator == highest memory required. Anything else? http://gerrit.cloudera.org:8080/#/c/13430/1//COMMIT_MSG@24 PS1, Line 24: : foo : : Change-Id: Ib02f26fbfd6a714ad0797f8b5ed1eeeb8fd6e371 : : b : : Change-Id: I02f0a818a6fa372ab3c696c11882284877ce207e nit: remove these remnants of a git squash http://gerrit.cloudera.org:8080/#/c/13430/1/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala: http://gerrit.cloudera.org:8080/#/c/13430/1/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@128 PS1, Line 128: val pool = new ForkJoinPool(options.numParallelBackups) // Need a clean-up reference. Can you talk a little about the tradeoffs involved in submitting parallel jobs vs adding support for running a single Spark job that handles multiple tables? The latter would seem more natural to me. I also wonder what the performance implications of fork() in the context of a driver running on YARN are, especially on RHEL 6. -- To view, visit http://gerrit.cloudera.org:8080/13430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02f0a818a6fa372ab3c696c11882284877ce207e Gerrit-Change-Number: 13430 Gerrit-PatchSet: 1 Gerrit-Owner: Will Berkeley <wdberke...@gmail.com> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Comment-Date: Tue, 28 May 2019 23:23:08 +0000 Gerrit-HasComments: Yes