[jira] [Commented] (GIRAPH-359) Parallelize the input (loading) / output (storing)

Eli Reisman (JIRA) Mon, 08 Oct 2012 15:10:07 -0700

    [ 
https://issues.apache.org/jira/browse/GIRAPH-359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471907#comment-13471907
 ]


Eli Reisman commented on GIRAPH-359:
------------------------------------

By write speeds I take it to mean when you're done with the supersteps and 
you're writing the final output to HDFS? Folks who use Giraph with more workers 
per machine seem to report this issue; for those of us who have logged more 
time in "less workers per machine" setups, this has never been a particularly 
long section of a job run. Could this have something to do with it (too many 
workers per machine vying to do IO at once, etc.) or is there a more general 
reason this is happening? Do you have an idea how you want to approach fixing 
this problem yet?

When I sat it wasn't a problem with my setup, I will say the write stage was 
about as long as your average superstep took during the calculation stages. 
That just wasn't very long for that setup/amount of input data. I know in other 
hardware setups or input data amounts the supersteps for calc were longer too 
-- would you still say the write takes as long as your average supersteps, or 
much longer? shorter? Just curious.


                
> Parallelize the input (loading) / output (storing)
> --------------------------------------------------
>
>                 Key: GIRAPH-359
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-359
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>
> Often we find that our write rates aren't great.  This could likely be 
> improved by parallelizing the input/output with multi-threading.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-359) Parallelize the input (loading) / output (storing)

Reply via email to