Will Berkeley has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/13430


Change subject: [backup] KUDU-2786 Parallelize tables for backup and restore
......................................................................

[backup] KUDU-2786 Parallelize tables for backup and restore

This patch adds a hidden, experimental option to run backups and
restores parallel across tables. Managing resources across parallel
backups and restores is very difficult: the sizes of tables in terms of
number of tablets and size of tables can vary by orders of magnitude
across a cluster, and there are many resources which may be constrained
depending on many factors: CPU, memory, disk I/O, network, number of
executors available. This patch doesn't do resource management. It will
kick off the jobs in parallel, and it's up to Spark to manage the
resources of parallel jobs. Maybe this will work well, maybe it won't...
that's why this is just experimental.

I tested manually on a Spark cluster to verify that jobs are actually
run in parallel.

Change-Id: I79043b73bf4ecfa11b51f16a7f4369f93357029f

foo

Change-Id: Ib02f26fbfd6a714ad0797f8b5ed1eeeb8fd6e371

b

Change-Id: I02f0a818a6fa372ab3c696c11882284877ce207e
---
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
4 files changed, 69 insertions(+), 14 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/13430/1
--
To view, visit http://gerrit.cloudera.org:8080/13430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I02f0a818a6fa372ab3c696c11882284877ce207e
Gerrit-Change-Number: 13430
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wdberke...@gmail.com>

Reply via email to