Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13430
Change subject: [backup] KUDU-2786 Parallelize tables for backup and restore ...................................................................... [backup] KUDU-2786 Parallelize tables for backup and restore This patch adds a hidden, experimental option to run backups and restores parallel across tables. Managing resources across parallel backups and restores is very difficult: the sizes of tables in terms of number of tablets and size of tables can vary by orders of magnitude across a cluster, and there are many resources which may be constrained depending on many factors: CPU, memory, disk I/O, network, number of executors available. This patch doesn't do resource management. It will kick off the jobs in parallel, and it's up to Spark to manage the resources of parallel jobs. Maybe this will work well, maybe it won't... that's why this is just experimental. I tested manually on a Spark cluster to verify that jobs are actually run in parallel. Change-Id: I79043b73bf4ecfa11b51f16a7f4369f93357029f foo Change-Id: Ib02f26fbfd6a714ad0797f8b5ed1eeeb8fd6e371 b Change-Id: I02f0a818a6fa372ab3c696c11882284877ce207e --- M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala M java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala 4 files changed, 69 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/13430/1 -- To view, visit http://gerrit.cloudera.org:8080/13430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I02f0a818a6fa372ab3c696c11882284877ce207e Gerrit-Change-Number: 13430 Gerrit-PatchSet: 1 Gerrit-Owner: Will Berkeley <wdberke...@gmail.com>