Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13430 )

Change subject: [backup] KUDU-2786 Parallelize tables for backup and restore
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/13430/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13430/1//COMMIT_MSG@10
PS1, Line 10:  parallel
> in parallel
Done


http://gerrit.cloudera.org:8080/#/c/13430/1//COMMIT_MSG@11
PS1, Line 11: is very difficult
> How would queueing and resource management of this be any different than do
I'm not sure, but I know there's a lot I don't know about Spark, and I know we 
haven't tested this very much.


http://gerrit.cloudera.org:8080/#/c/13430/1//COMMIT_MSG@24
PS1, Line 24:
            : foo
            :
            : Change-Id: Ib02f26fbfd6a714ad0797f8b5ed1eeeb8fd6e371
            :
            : b
            :
            : Change-Id: I02f0a818a6fa372ab3c696c11882284877ce207e
> nit: remove these remnants of a git squash
:(


http://gerrit.cloudera.org:8080/#/c/13430/1/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala:

http://gerrit.cloudera.org:8080/#/c/13430/1/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@128
PS1, Line 128:     val pool = new ForkJoinPool(options.numParallelBackups) // 
Need a clean-up reference.
> Due to Data input and output format and layout assumptions it's easier to k
+1 to what Grant said. Separate jobs can fail separately.

Re: the ForkJoinPool, I don't think that the "Fork" here is the syscall fork. 
Of course the threads in the pool will be forked (or cloned) at some point, but 
I don't think the pool is forking for every task. See this for an explanation 
of why the pool is called a ForkJoinPool: 
http://tutorials.jenkov.com/java-util-concurrent/java-fork-and-join-forkjoinpool.html.

Also, this is the default type of ExecutorService used, not a specific choice 
by me. I needed to configure the parallelism explicitly, else I could have 
dispensed with configuring my own pool. That would have resulted in parallelism 
equal to the number of processors on the driver node, which doesn't have much 
to do with the parallelism one might want in the restore job.



--
To view, visit http://gerrit.cloudera.org:8080/13430
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02f0a818a6fa372ab3c696c11882284877ce207e
Gerrit-Change-Number: 13430
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wdberke...@gmail.com>
Gerrit-Reviewer: Grant Henke <granthe...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>
Gerrit-Comment-Date: Wed, 29 May 2019 18:05:21 +0000
Gerrit-HasComments: Yes

Reply via email to