[ 
https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481658#comment-13481658
 ] 

Yuki Morishita commented on CASSANDRA-4813:
-------------------------------------------

This is limitation of BulkOutputFormat right now. Currently, streaming session 
uses (IP, counter) for its ID. Since counter is per JVM, running two or more 
reducers on same node streaming to one cassandra node likely cause session 
conflict, and I think that is causing the issue here.
To resolve this, we need to change the way to distinguish each session(possibly 
by changing to use UUID for session ID).

[~mkjellman] Do you run your reducer on top of cassandra node? If that is the 
case, session conflict I described above may be the cause. If not, there is 
another issue in your one reducer case I think.
                
> Problem using BulkOutputFormat while streaming several SSTables 
> simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.3, 1.1.5
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop 
> nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. 
> The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 
> 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using 
> Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>              Labels: Bulkoutputformat, Hadoop, SSTables
>
> The issue occurs when streaming simultaneously SSTables from the same node to 
> a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot 
> handle receiving simultaneously SSTables from the same node. However, when it 
> receives simultaneously SSTables from two different nodes, everything works 
> fine. As a consequence, when using BulkOutputFormat to generate SSTables and 
> stream them to a cassandra cluster, I cannot use more than one reducer per 
> node otherwise I get a java.io.EOFException in the tasktracker's logs and a 
> java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to