[ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478352#comment-13478352 ]
Michael Kjellman commented on CASSANDRA-4813: --------------------------------------------- Just confirmed that limiting the reducer to 1 does not change the behavior in my environment. Also noticed that BulkRecordWriter will always throw an IOException (as mentioned in the original bug) if mapreduce.output.bulkoutputformat.maxfailedhosts is ever > 0 (assuming defaults). In my case future.getFailedHosts() always returns every node in my cluster when the condition occurs. I'm doing about 50 million insertions into 50 million rows and the EOFExceptions seem to crop up after a good number of the sstables have already been successfully sent. > Problem using BulkOutputFormat while streaming several SSTables > simultaneously from a given node. > ------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-4813 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4813 > Project: Cassandra > Issue Type: Bug > Affects Versions: 1.1.3, 1.1.5 > Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop > nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. > The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and > 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using > Hadoop 0.20.2. > Reporter: Ralph Romanos > Labels: Bulkoutputformat, Hadoop, SSTables > > The issue occurs when streaming simultaneously SSTables from the same node to > a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot > handle receiving simultaneously SSTables from the same node. However, when it > receives simultaneously SSTables from two different nodes, everything works > fine. As a consequence, when using BulkOutputFormat to generate SSTables and > stream them to a cassandra cluster, I cannot use more than one reducer per > node otherwise I get a java.io.EOFException in the tasktracker's logs and a > java.io.IOException: Broken pipe in the Cassandra logs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira