Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/13430 )
Change subject: [backup] KUDU-2786 Parallelize tables for backup and restore ...................................................................... Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/13430/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13430/1//COMMIT_MSG@10 PS1, Line 10: parallel > in parallel Done http://gerrit.cloudera.org:8080/#/c/13430/1//COMMIT_MSG@11 PS1, Line 11: is very difficult > How would queueing and resource management of this be any different than do I'm not sure, but I know there's a lot I don't know about Spark, and I know we haven't tested this very much. http://gerrit.cloudera.org:8080/#/c/13430/1//COMMIT_MSG@24 PS1, Line 24: : foo : : Change-Id: Ib02f26fbfd6a714ad0797f8b5ed1eeeb8fd6e371 : : b : : Change-Id: I02f0a818a6fa372ab3c696c11882284877ce207e > nit: remove these remnants of a git squash :( http://gerrit.cloudera.org:8080/#/c/13430/1/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala: http://gerrit.cloudera.org:8080/#/c/13430/1/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@128 PS1, Line 128: val pool = new ForkJoinPool(options.numParallelBackups) // Need a clean-up reference. > Due to Data input and output format and layout assumptions it's easier to k +1 to what Grant said. Separate jobs can fail separately. Re: the ForkJoinPool, I don't think that the "Fork" here is the syscall fork. Of course the threads in the pool will be forked (or cloned) at some point, but I don't think the pool is forking for every task. See this for an explanation of why the pool is called a ForkJoinPool: http://tutorials.jenkov.com/java-util-concurrent/java-fork-and-join-forkjoinpool.html. Also, this is the default type of ExecutorService used, not a specific choice by me. I needed to configure the parallelism explicitly, else I could have dispensed with configuring my own pool. That would have resulted in parallelism equal to the number of processors on the driver node, which doesn't have much to do with the parallelism one might want in the restore job. -- To view, visit http://gerrit.cloudera.org:8080/13430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02f0a818a6fa372ab3c696c11882284877ce207e Gerrit-Change-Number: 13430 Gerrit-PatchSet: 1 Gerrit-Owner: Will Berkeley <wdberke...@gmail.com> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com> Gerrit-Comment-Date: Wed, 29 May 2019 18:05:21 +0000 Gerrit-HasComments: Yes