[jira] [Commented] (GIRAPH-326) Writing input splits to ZooKeeper in parallel

2012-10-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13467170#comment-13467170
 ] 

Hudson commented on GIRAPH-326:
---

Integrated in Giraph-trunk-Commit #212 (See 
[https://builds.apache.org/job/Giraph-trunk-Commit/212/])
GIRAPH-326: Writing input splits to ZooKeeper in parallel (Revision 1392574)

 Result = SUCCESS
maja : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1392574
Files : 
* /giraph/trunk/CHANGELOG
* /giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
* /giraph/trunk/src/main/java/org/apache/giraph/utils/ProgressableUtils.java


 Writing input splits to ZooKeeper in parallel
 -

 Key: GIRAPH-326
 URL: https://issues.apache.org/jira/browse/GIRAPH-326
 Project: Giraph
  Issue Type: Improvement
Reporter: Maja Kabiljo
 Attachments: GIRAPH-326.patch, GIRAPH-326.patch, GIRAPH-326.patch


 (Posting issue and the patch from a colleague)
 Writing input splits to zookeeper can take a lot of time. From his 
 experiments: serial 2m45s, with 16 cores 15s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-326) Writing input splits to ZooKeeper in parallel

2012-09-25 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13463082#comment-13463082
 ] 

Avery Ching commented on GIRAPH-326:


+1.  Really minor thing.

In 

public Void call() {

There are a bunch of log messages that start with createInputSplits:  and 
should be call: .  Can you fix that and then commit? 

 Writing input splits to ZooKeeper in parallel
 -

 Key: GIRAPH-326
 URL: https://issues.apache.org/jira/browse/GIRAPH-326
 Project: Giraph
  Issue Type: Improvement
Reporter: Maja Kabiljo
 Attachments: GIRAPH-326.patch, GIRAPH-326.patch


 (Posting issue and the patch from a colleague)
 Writing input splits to zookeeper can take a lot of time. From his 
 experiments: serial 2m45s, with 16 cores 15s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-326) Writing input splits to ZooKeeper in parallel

2012-09-19 Thread Alessandro Presta (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458835#comment-13458835
 ] 

Alessandro Presta commented on GIRAPH-326:
--

Eli: our colleague had over 3000 list entries. Each entry is 10KB, and 
apparently Zookeeper can write around 20 of them per second. Hence he was 
getting a crash with Forced a shutdown hook kill of the ZooKeeper process on 
the master.

 Writing input splits to ZooKeeper in parallel
 -

 Key: GIRAPH-326
 URL: https://issues.apache.org/jira/browse/GIRAPH-326
 Project: Giraph
  Issue Type: Improvement
Reporter: Maja Kabiljo
 Attachments: GIRAPH-326.patch


 (Posting issue and the patch from a colleague)
 Writing input splits to zookeeper can take a lot of time. From his 
 experiments: serial 2m45s, with 16 cores 15s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-326) Writing input splits to ZooKeeper in parallel

2012-09-19 Thread Alessandro Presta (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458843#comment-13458843
 ] 

Alessandro Presta commented on GIRAPH-326:
--

(I see what I'm doing here, pressing ctrl-Enter like on Facebook...)

I understand your concern about resources on a shared cluster. Maybe we can 
leave multithreading as an option?

 Writing input splits to ZooKeeper in parallel
 -

 Key: GIRAPH-326
 URL: https://issues.apache.org/jira/browse/GIRAPH-326
 Project: Giraph
  Issue Type: Improvement
Reporter: Maja Kabiljo
 Attachments: GIRAPH-326.patch


 (Posting issue and the patch from a colleague)
 Writing input splits to zookeeper can take a lot of time. From his 
 experiments: serial 2m45s, with 16 cores 15s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-326) Writing input splits to ZooKeeper in parallel

2012-09-19 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13458948#comment-13458948
 ] 

Eli Reisman commented on GIRAPH-326:


I agree. I would say if that spot needs progress calls, put up a JIRA to add 
them right away, we need those with or without the threaded writes. 

I am not against the multithreaded option to write to ZK. I am wondering still: 
if the quorum is agreeing on the order for each proposed write before it is 
delivered, is the real speed bottleneck the number of writes making it to ZK 
fast enough from the master, or the ZK quorum syncing itself on the writes as 
it delivers them? Its surprising to me that the extra writers would speed this 
up. Do the repeated write calls from the single thread writer block until the 
watch is signalled for each call or something?

Since the system isn't really doing anything important during the ZK 
connections and input split write, it shouldn't hurt to have the threads used 
this way. I would like to see those thread resources cleaned up or repurposed 
before the next stages of the job begin, other than that it sounds good to me.


 Writing input splits to ZooKeeper in parallel
 -

 Key: GIRAPH-326
 URL: https://issues.apache.org/jira/browse/GIRAPH-326
 Project: Giraph
  Issue Type: Improvement
Reporter: Maja Kabiljo
 Attachments: GIRAPH-326.patch


 (Posting issue and the patch from a colleague)
 Writing input splits to zookeeper can take a lot of time. From his 
 experiments: serial 2m45s, with 16 cores 15s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-326) Writing input splits to ZooKeeper in parallel

2012-09-19 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459099#comment-13459099
 ] 

Avery Ching commented on GIRAPH-326:


Default of 1 or 2 seems safe.  I don't know why ZK is slow on this, but glad 
this workaround improves it.

 Writing input splits to ZooKeeper in parallel
 -

 Key: GIRAPH-326
 URL: https://issues.apache.org/jira/browse/GIRAPH-326
 Project: Giraph
  Issue Type: Improvement
Reporter: Maja Kabiljo
 Attachments: GIRAPH-326.patch


 (Posting issue and the patch from a colleague)
 Writing input splits to zookeeper can take a lot of time. From his 
 experiments: serial 2m45s, with 16 cores 15s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-326) Writing input splits to ZooKeeper in parallel

2012-09-18 Thread Eugene Koontz (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457631#comment-13457631
 ] 

Eugene Koontz commented on GIRAPH-326:
--

I like the idea of the ProgressableUtils class. Perhaps this could be used for 
other parallelizable operations like Netty client-server startup tasks like 
authentication (GIRAPH-211). I'd suggest splitting it out into a separate JIRA, 
though, because it's separate from the main point of this one.

 Writing input splits to ZooKeeper in parallel
 -

 Key: GIRAPH-326
 URL: https://issues.apache.org/jira/browse/GIRAPH-326
 Project: Giraph
  Issue Type: Improvement
Reporter: Maja Kabiljo
 Attachments: GIRAPH-326.patch


 (Posting issue and the patch from a colleague)
 Writing input splits to zookeeper can take a lot of time. From his 
 experiments: serial 2m45s, with 16 cores 15s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-326) Writing input splits to ZooKeeper in parallel

2012-09-14 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456009#comment-13456009
 ] 

Eli Reisman commented on GIRAPH-326:


This makes a lot of sense. I have seen ZooKeeper really bog down when lots of 
writes and reads are all occurring concurrently and the quorum is syncing all 
the time.

On the other hand, I have run countless jobs with many machines and many 
inputsplits, and have not encountered a big slowdown in this initial write 
phase. The threads should be accessing different znode path to write splits to 
all the time. What was the setup where this problem is encountered? Lots of 
machines, or few machines and lots of splits?

Interesting stuff, I look trying this out  hearing more about this problem.


 Writing input splits to ZooKeeper in parallel
 -

 Key: GIRAPH-326
 URL: https://issues.apache.org/jira/browse/GIRAPH-326
 Project: Giraph
  Issue Type: Improvement
Reporter: Maja Kabiljo
 Attachments: GIRAPH-326.patch


 (Posting issue and the patch from a colleague)
 Writing input splits to zookeeper can take a lot of time. From his 
 experiments: serial 2m45s, with 16 cores 15s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira