[ https://issues.apache.org/jira/browse/CASSANDRA-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646374#comment-13646374 ]
Jonathan Ellis edited comment on CASSANDRA-5529 at 5/1/13 4:41 AM: ------------------------------------------------------------------- Rob, your analysis looks spot on. WTF. Creating a new TBinaryProtocol for each message would be pretty ludicrous. The genesis of this readLength_ business is hidden in the murky archives of the Thrift incubator svn repro. It looks to me like it's kind of a really ugly hack for pre-Framed transports that could call setReadLength in between messages based on some kind of per-application knowledge. Because I can't think of any use for "expiring" a connection after X bytes otherwise. I don't think we should be using it at all. Attached is a patch that rips it out, on the Cassandra server side as well. I feel sorry for any poor bastard who ever pulled his hair out over Cassandra erroring out his connection apparently randomly... was (Author: jbellis): Rob, your analysis looks spot on. WTF. Creating a new TBinaryProtocol for each message would be pretty ludicrous. The genesis of this readLength_ business is hidden in the murky archives of the Thrift incubator svn repro. It looks to me like it's kind of a really ugly hack for pre-Framed transports that could call setReadLength in between messages based on some kind of per-application knowledge. Because I can't think of any use for "expiring" a connection after X bytes otherwise. I don't think we should be using it at all. Attached is a patch that rips it out, on the Cassandra server side as well. I feel sorry for any poor bastard who ever pulled his head out over Cassandra erroring out his connection apparently randomly... > ColumnFamilyRecordReader fails for large datasets > ------------------------------------------------- > > Key: CASSANDRA-5529 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5529 > Project: Cassandra > Issue Type: Bug > Components: API, Hadoop > Affects Versions: 0.6 > Reporter: Rob Timpe > Assignee: Jonathan Ellis > Fix For: 1.1.12, 1.2.5 > > Attachments: 5529.txt > > > When running mapreduce jobs that read directly from cassandra, the job will > sometimes fail with an exception like this: > java.lang.RuntimeException: com.rockmelt.org.apache.thrift.TException: > Message length exceeded: 40 > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:400) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:406) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:329) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:109) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:522) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:547) > at > org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: com.rockmelt.org.apache.thrift.TException: Message length > exceeded: 40 > at > com.rockmelt.org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393) > at > com.rockmelt.org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363) > at org.apache.cassandra.thrift.Column.read(Column.java:528) > at > org.apache.cassandra.thrift.ColumnOrSuperColumn.read(ColumnOrSuperColumn.java:507) > at org.apache.cassandra.thrift.KeySlice.read(KeySlice.java:408) > at > org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12422) > at > com.rockmelt.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) > at > org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:696) > at > org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:680) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:362) > ... 16 more > In ColumnFamilyRecordReader#initialize, a TBinaryProtocol is created as > follows: > TTransport transport = > ConfigHelper.getInputTransportFactory(conf).openTransport(socket, conf); > TBinaryProtocol binaryProtocol = new TBinaryProtocol(transport, > ConfigHelper.getThriftMaxMessageLength(conf)); > client = new Cassandra.Client(binaryProtocol); > But each time a call to cassandra is made, checkReadLength(int length) is > called in TBinaryProtocol, which includes this: > readLength_ -= length; > if (readLength_ < 0) { > throw new TException("Message length exceeded: " + length); > } > The result is that readLength_ is decreased each time, until it goes negative > and exception is thrown. This will only happen if you're reading a lot of > data and your split size is large (which is maybe why people haven't noticed > it earlier). This happens regardless of whether you use wide row support. > I'm not sure what the right fix is. It seems like you could either reset the > length of TBinaryProtocol after each call or just use a new TBinaryProtocol > each time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira