[ https://issues.apache.org/jira/browse/CASSANDRA-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039418#comment-13039418 ]
Brandon Williams commented on CASSANDRA-2707: --------------------------------------------- That generally depends on how much io your data mount can pump out. > Cassandra throws an exception when querying a very large dataset. > ----------------------------------------------------------------- > > Key: CASSANDRA-2707 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2707 > Project: Cassandra > Issue Type: Bug > Affects Versions: 0.8 beta 1 > Environment: Eight cassandra instances, with a replication factor of > three. > DB is running on EC2, all machines are in the same availability zone. > All machines are m1.xlarge, under 70% disk usage for the cassandra data > drive, and with 16G of RAM. > java version "1.6.0_04" > Java(TM) SE Runtime Environment (build 1.6.0_04-b12) > Java HotSpot(TM) 64-Bit Server VM (build 10.0-b19, mixed mode) > Reporter: Michael Amygdalidis > > Cassandra reliably throws a runtime exception (without terminating) when > querying a very large dataset. > The cluster performs just fine in normal situations with data sets of 10,000 > or so. However, when querying a column family through either fauna/cassandra > or through CLI for all of the values matching a certain key, with a limit of > 100, the following exception is thrown. > ERROR [ReadStage:126] 2011-05-25 14:14:46,260 AbstractCassandraDaemon.java > (line 113) Fatal exception in thread Thread[ReadStage:126,5,main] > java.lang.RuntimeException: java.io.IOException: Corrupt (negative) value > length encountered > at > org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:126) > at > org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:49) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) > at > org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108) > at > org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:283) > at > org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326) > at > org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230) > at > org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:69) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) > at > org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:116) > at > org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:130) > at > org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1302) > at > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1187) > at > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1144) > at org.apache.cassandra.db.Table.getRow(Table.java:385) > at > org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:61) > at > org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:69) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.io.IOException: Corrupt (negative) value length encountered > at > org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:348) > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:126) > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:82) > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:72) > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:36) > at > org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:179) > at > org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:121) > ... 22 more > Additional Info: > * I can confirm that the same exception is reliably thrown on three instances > at about the same as the query is executed. > * The timeout for a remote procedure call is between nodes is 10 seconds, > which is about the time it takes for the query to respond with null. > * Asking for forward or reverse search does not affect results, however, in > production we'd need to do a reverse search. > Steps to Reproduce: > Have a column family with at least 100 million values, including at least 30 > million with the same key. Try to get 100 items of a given key from that > column family. > Expected behaviour: To get back the 100 items we queried for, which is what > happens when the number of items under a given key is not so large. The > unexpected behaviour only manifests itself when the number of possible items > is extremely large. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira