[ https://issues.apache.org/jira/browse/CASSANDRA-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668449#comment-13668449 ]
Alex Liu commented on CASSANDRA-5544: ------------------------------------- [~shamim] I think you already found the answer, SET pig.noSplitCombination true, so Pig doesn't combine the small splits into one mapper. HBase internal code does it as well. I found that C*-1.2.1 update Pig from 0.9.0 version to 0.10.0 version which may cause the behavior changes. As far as number 4) and number 5) concerns, I think the empty maps/big maps are due to data skewness. If you can first print out the splits, then you can check the rows for each split. I will add the following code to CassandraStorage.java job.getConfiguration().setBoolean("pig.noSplitCombination", true); > Hadoop jobs assigns only one mapper in task > -------------------------------------------- > > Key: CASSANDRA-5544 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5544 > Project: Cassandra > Issue Type: Bug > Components: Hadoop > Affects Versions: 1.2.1 > Environment: Red hat linux 5.4, Hadoop 1.0.3, pig 0.11.1 > Reporter: Shamim Ahmed > Assignee: Alex Liu > Attachments: Screen Shot 2013-05-26 at 4.49.48 PM.png > > > We have got very strange beheviour of hadoop cluster after upgrading > Cassandra from 1.1.5 to Cassandra 1.2.1. We have 5 nodes cluster of > Cassandra, where three of them are hodoop slaves. Now when we are submitting > job through Pig script, only one map assigns in task running on one of the > hadoop slaves regardless of > volume of data (already tried with more than million rows). > Configure of pig as follows: > export PIG_HOME=/oracle/pig-0.10.0 > export PIG_CONF_DIR=${HADOOP_HOME}/conf > export PIG_INITIAL_ADDRESS=192.168.157.103 > export PIG_RPC_PORT=9160 > export PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner > Also we have these following properties in hadoop: > <property> > <name>mapred.tasktracker.map.tasks.maximum</name> > <value>10</value> > </property> > <property> > <name>mapred.map.tasks</name> > <value>4</value> > </property> -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira