[ https://issues.apache.org/jira/browse/CASSANDRA-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeremy Hanna reopened CASSANDRA-5488: ------------------------------------- There ended up being a secondary problem that was hidden by the first NPE. It seems to be related to getting the AbstractType. The NPE was for this line: https://github.com/apache/cassandra/blob/cassandra-1.1/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java#L307 which I decomposed to find out what it was NPEing on, and got this: {code} List<AbstractType> atList = getDefaultMarshallers(cfDef); AbstractType at = atList.get(2); Object o = at.compose(key); //NPE from this line setTupleValue(tuple, 0, o); //setTupleValue(tuple, 0, getDefaultMarshallers(cfDef).get(2).compose(key)); {code} So it seems unrelated to the original NPE, but still matches the description of this ticket. To reproduce, here is my schema: {code} CREATE KEYSPACE circus with placement_strategy = 'SimpleStrategy' and strategy_options = {replication_factor:1}; use circus; CREATE COLUMN FAMILY acrobats WITH comparator = UTF8Type AND key_validation_class=UTF8Type AND default_validation_class = UTF8Type; {code} Here is a pycassa script to create the data: {code} from pycassa.pool import ConnectionPool from pycassa.columnfamily import ColumnFamily pool = ConnectionPool('circus') col_fam = pycassa.ColumnFamily(pool, 'acrobats') for i in range(1, 10): for j in range(1, 200000): col_fam.insert('row_key' + str(i), {str(j): 'val'}) {code} Here is the pig (0.9.2) that I'm running in local mode: {code} rows = LOAD 'cassandra://circus/acrobats?widerows=true&limit=200000' USING CassandraStorage(); filtered = filter rows by key == 'row_key1'; columns = foreach filtered generate flatten(columns); counted = foreach (group columns all) generate COUNT($1); dump counted; {code} > CassandraStorage throws NullPointerException (NPE) when widerows is set to > 'true' > --------------------------------------------------------------------------------- > > Key: CASSANDRA-5488 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5488 > Project: Cassandra > Issue Type: Bug > Components: Hadoop > Affects Versions: 1.1.9, 1.2.4 > Environment: Ubuntu 12.04.1 x64, Cassandra 1.2.4 > Reporter: Sheetal Gosrani > Assignee: Sheetal Gosrani > Priority: Minor > Labels: cassandra, hadoop, pig > Fix For: 1.1.12, 1.2.6 > > Attachments: 5488-2.txt, 5488.txt > > > CassandraStorage throws NPE when widerows is set to 'true'. > 2 problems in getNextWide: > 1. Creation of tuple without specifying size > 2. Calling addKeyToTuple on lastKey instead of key > java.lang.NullPointerException > at > org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:167) > at > org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:124) > at org.apache.cassandra.cql.jdbc.JdbcUTF8.getString(JdbcUTF8.java:73) > at org.apache.cassandra.cql.jdbc.JdbcUTF8.compose(JdbcUTF8.java:93) > at org.apache.cassandra.db.marshal.UTF8Type.compose(UTF8Type.java:34) > at org.apache.cassandra.db.marshal.UTF8Type.compose(UTF8Type.java:26) > at > org.apache.cassandra.hadoop.pig.CassandraStorage.addKeyToTuple(CassandraStorage.java:313) > at > org.apache.cassandra.hadoop.pig.CassandraStorage.getNextWide(CassandraStorage.java:196) > at > org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(CassandraStorage.java:224) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:194) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > 2013-04-16 12:28:03,671 INFO org.apache.hadoop.mapred.Task: Runnning cleanup > for the task -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira