[jira] [Commented] (GIRAPH-326) Writing input splits to ZooKeeper in parallel
[ https://issues.apache.org/jira/browse/GIRAPH-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13467170#comment-13467170 ] Hudson commented on GIRAPH-326: --- Integrated in Giraph-trunk-Commit #212 (See [https://builds.apache.org/job/Giraph-trunk-Commit/212/]) GIRAPH-326: Writing input splits to ZooKeeper in parallel (Revision 1392574) Result = SUCCESS maja : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1392574 Files : * /giraph/trunk/CHANGELOG * /giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java * /giraph/trunk/src/main/java/org/apache/giraph/utils/ProgressableUtils.java Writing input splits to ZooKeeper in parallel - Key: GIRAPH-326 URL: https://issues.apache.org/jira/browse/GIRAPH-326 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Attachments: GIRAPH-326.patch, GIRAPH-326.patch, GIRAPH-326.patch (Posting issue and the patch from a colleague) Writing input splits to zookeeper can take a lot of time. From his experiments: serial 2m45s, with 16 cores 15s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-326) Writing input splits to ZooKeeper in parallel
[ https://issues.apache.org/jira/browse/GIRAPH-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13463082#comment-13463082 ] Avery Ching commented on GIRAPH-326: +1. Really minor thing. In public Void call() { There are a bunch of log messages that start with createInputSplits: and should be call: . Can you fix that and then commit? Writing input splits to ZooKeeper in parallel - Key: GIRAPH-326 URL: https://issues.apache.org/jira/browse/GIRAPH-326 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Attachments: GIRAPH-326.patch, GIRAPH-326.patch (Posting issue and the patch from a colleague) Writing input splits to zookeeper can take a lot of time. From his experiments: serial 2m45s, with 16 cores 15s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-326) Writing input splits to ZooKeeper in parallel
[ https://issues.apache.org/jira/browse/GIRAPH-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458835#comment-13458835 ] Alessandro Presta commented on GIRAPH-326: -- Eli: our colleague had over 3000 list entries. Each entry is 10KB, and apparently Zookeeper can write around 20 of them per second. Hence he was getting a crash with Forced a shutdown hook kill of the ZooKeeper process on the master. Writing input splits to ZooKeeper in parallel - Key: GIRAPH-326 URL: https://issues.apache.org/jira/browse/GIRAPH-326 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Attachments: GIRAPH-326.patch (Posting issue and the patch from a colleague) Writing input splits to zookeeper can take a lot of time. From his experiments: serial 2m45s, with 16 cores 15s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-326) Writing input splits to ZooKeeper in parallel
[ https://issues.apache.org/jira/browse/GIRAPH-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458843#comment-13458843 ] Alessandro Presta commented on GIRAPH-326: -- (I see what I'm doing here, pressing ctrl-Enter like on Facebook...) I understand your concern about resources on a shared cluster. Maybe we can leave multithreading as an option? Writing input splits to ZooKeeper in parallel - Key: GIRAPH-326 URL: https://issues.apache.org/jira/browse/GIRAPH-326 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Attachments: GIRAPH-326.patch (Posting issue and the patch from a colleague) Writing input splits to zookeeper can take a lot of time. From his experiments: serial 2m45s, with 16 cores 15s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-326) Writing input splits to ZooKeeper in parallel
[ https://issues.apache.org/jira/browse/GIRAPH-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458948#comment-13458948 ] Eli Reisman commented on GIRAPH-326: I agree. I would say if that spot needs progress calls, put up a JIRA to add them right away, we need those with or without the threaded writes. I am not against the multithreaded option to write to ZK. I am wondering still: if the quorum is agreeing on the order for each proposed write before it is delivered, is the real speed bottleneck the number of writes making it to ZK fast enough from the master, or the ZK quorum syncing itself on the writes as it delivers them? Its surprising to me that the extra writers would speed this up. Do the repeated write calls from the single thread writer block until the watch is signalled for each call or something? Since the system isn't really doing anything important during the ZK connections and input split write, it shouldn't hurt to have the threads used this way. I would like to see those thread resources cleaned up or repurposed before the next stages of the job begin, other than that it sounds good to me. Writing input splits to ZooKeeper in parallel - Key: GIRAPH-326 URL: https://issues.apache.org/jira/browse/GIRAPH-326 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Attachments: GIRAPH-326.patch (Posting issue and the patch from a colleague) Writing input splits to zookeeper can take a lot of time. From his experiments: serial 2m45s, with 16 cores 15s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-326) Writing input splits to ZooKeeper in parallel
[ https://issues.apache.org/jira/browse/GIRAPH-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459099#comment-13459099 ] Avery Ching commented on GIRAPH-326: Default of 1 or 2 seems safe. I don't know why ZK is slow on this, but glad this workaround improves it. Writing input splits to ZooKeeper in parallel - Key: GIRAPH-326 URL: https://issues.apache.org/jira/browse/GIRAPH-326 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Attachments: GIRAPH-326.patch (Posting issue and the patch from a colleague) Writing input splits to zookeeper can take a lot of time. From his experiments: serial 2m45s, with 16 cores 15s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-326) Writing input splits to ZooKeeper in parallel
[ https://issues.apache.org/jira/browse/GIRAPH-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457631#comment-13457631 ] Eugene Koontz commented on GIRAPH-326: -- I like the idea of the ProgressableUtils class. Perhaps this could be used for other parallelizable operations like Netty client-server startup tasks like authentication (GIRAPH-211). I'd suggest splitting it out into a separate JIRA, though, because it's separate from the main point of this one. Writing input splits to ZooKeeper in parallel - Key: GIRAPH-326 URL: https://issues.apache.org/jira/browse/GIRAPH-326 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Attachments: GIRAPH-326.patch (Posting issue and the patch from a colleague) Writing input splits to zookeeper can take a lot of time. From his experiments: serial 2m45s, with 16 cores 15s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-326) Writing input splits to ZooKeeper in parallel
[ https://issues.apache.org/jira/browse/GIRAPH-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456009#comment-13456009 ] Eli Reisman commented on GIRAPH-326: This makes a lot of sense. I have seen ZooKeeper really bog down when lots of writes and reads are all occurring concurrently and the quorum is syncing all the time. On the other hand, I have run countless jobs with many machines and many inputsplits, and have not encountered a big slowdown in this initial write phase. The threads should be accessing different znode path to write splits to all the time. What was the setup where this problem is encountered? Lots of machines, or few machines and lots of splits? Interesting stuff, I look trying this out hearing more about this problem. Writing input splits to ZooKeeper in parallel - Key: GIRAPH-326 URL: https://issues.apache.org/jira/browse/GIRAPH-326 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Attachments: GIRAPH-326.patch (Posting issue and the patch from a colleague) Writing input splits to zookeeper can take a lot of time. From his experiments: serial 2m45s, with 16 cores 15s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira