Re: Linux Filesystem for Cassandra
What OS are you using FreeBSD 8.3 64 bit PRERELEASE
Re: Is there a way to update column's TTL only?
You cannot set the the TTL without also setting the column value. Could you keep a record of future deletes in a CF and then action them as a batch process ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/04/2012, at 2:00 PM, 金剑 wrote: Hi, We would like to leverage Cassandra's expiration to manage our data's lifecycle. We need to: Delete data after a period, like: 1 hour , when user clicked the Delete button. We need to read and insert the column in order to update the TTL, but this is unacceptable in our system that might need to readout gigabytes of data. Is there a way to do this? Best Regards! Jian Jin
Re: server down
Sun or Open JDK ? Either way I would suggest upgrading to the latest JDK, upgrading cassandra to 1.0.8 and running nodetool upgradetables. If the fault persists after that I would look at IO or memory issues. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/04/2012, at 4:31 PM, Michael Vaknin wrote: i am usuinc cassandra 1.0.3 java 6.0.17 on ubuntu 10.04 this is a stable version for 6 months now. On Wed, Apr 4, 2012 at 9:51 PM, aaron morton aa...@thelastpickle.com wrote: What version of cassandra are you using ? What java vendor / version ? What OS vendor / version ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 4/04/2012, at 11:33 PM, Michael Vaknine wrote: I am recently having problems with 1 of my 4ty cluster servers. I would appreciate any help ERROR 11:23:14,331 Fatal exception in thread Thread[MutationStage:19,5,main] 225 java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code 226 at org.apache.cassandra.io.util.AbstractDataInput.readInt(AbstractDataInput.java:196) 227 at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:348) 228 at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:120) 229 at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:83) 230 at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:73) 231 at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37) 232 at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:79) 233 at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40) 234 at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) 235 at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) 236 at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:107) 237 at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:145) 238 at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:124) 239 at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98) 240 at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) 241 at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) 242 at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:116) 243 at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:144) 244 at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:227) 245 at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62) 246 at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1274) 247 at org.apache.cassandra.db.ColumnFamilyStore.cacheRow(ColumnFamilyStore.java:1141) 248 at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1170) 249 at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1122) 250 at org.apache.cassandra.db.Table.readCurrentIndexedColumns(Table.java:504) 251 at org.apache.cassandra.db.Table.apply(Table.java:441) 252 at org.apache.cassandra.db.commitlog.CommitLog$2.runMayThrow(CommitLog.java:338) 253 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) 254 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 255 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 256 at java.util.concurrent.FutureTask.run(FutureTask.java:138) 257 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 258 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 259 at java.lang.Thread.run(Thread.java:619) 260 INFO 11:23:17,640 Finished reading /var/lib/cassandra/commitlog/CommitLog-1333465745624.log 261 ERROR 11:23:17,641 Exception encountered during startup 262 java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code 263 at org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:508) Thanks Michael
Bulk loading errors with 1.0.8
Hi All, I'm experiencing the following errors while bulk loading data into a cluster ERROR [Thread-23] 2012-04-05 09:58:12,252 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[Thread-23,5,main] java.lang.RuntimeException: Insufficient disk space to flush 7813594056494754913 bytes at org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:635) at org.apache.cassandra.streaming.StreamIn.getContextMapping(StreamIn.java:92) at org.apache.cassandra.streaming.IncomingStreamReader.init(IncomingStreamReader.java:68) at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81) Here I'm not really sure I was able to generate 7 exa bytes of data ;) ERROR [Thread-46] 2012-04-05 09:58:14,453 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[Thread-46,5,main] java.lang.NullPointerException at org.apache.cassandra.io.sstable.SSTable.getMinimalKey(SSTable.java:156) at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:334) at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:302) at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:155) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:89) at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81) This one sounds like a null key added to the SSTable at some point, but I'm rather confident I'm checking for key nullity. Errors are seen in different nodes. Some nodes succeeded. All 5 cluster nodes run 1.0.8 with JNA enabled. Basically I'm generating SSTables in a Hadoop Reducer, storing them in HDFS. After the job finished, I download them back into a single node and streaming the files into Cassandra (Yes, I definitely need to try 1.1). Do someone have a hint or a pointer where to start looking ? Thanks, Benoit.
Re: Will Cassandra balance load across replicas?
Thanks for your reply. My question is about the impact of replication on load balancing. Say we have nodes ABCD... in the ring. ReplicationFactor is 3 so the data on A will also have replicas on B and C. If we are reading data own by A, and A is already very busy, will the requests be forwarded to B and C? How about update requests? Thanks, Zhiming On Thu, Apr 5, 2012 at 12:33 AM, Watanabe Maki watanabe.m...@gmail.comwrote: I assume you are talking about nodes, rather than replicas. The data distribution over ring depends on Partitioner and Replica placement strategy you use. If you are using Random Partitioner and Simple Strategy, your data will be automatically distributed over the nodes in the ring. maki On 2012/04/05, at 12:31, zhiming shen zhiming.s...@gmail.com wrote: Hi, Can any one tell me whether Cassandra can do load balancing across replicas? How to configure it for this purpose? Thanks very much. Best Regards, Zhiming
Re: Will Cassandra balance load across replicas?
On Thu, Apr 5, 2012 at 9:22 AM, zhiming shen zhiming.s...@gmail.com wrote: Thanks for your reply. My question is about the impact of replication on load balancing. Say we have nodes ABCD... in the ring. ReplicationFactor is 3 so the data on A will also have replicas on B and C. If we are reading data own by A, and A is already very busy, will the requests be forwarded to B and C? How about update requests? Google cassandra dynamic snitch. -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Request timeout and host marked down
Hi all, We are using Hector and ofter we see lots of timeout exception in the log, I know that the hector can failover to other node, but I want to reduce the number of timeouts. any hector parameter I should change to reduce this error? also, on the server side, any kind of tunning need to do for the timeout? Thanks in advance. 12/04/04 15:13:20 ERROR com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 1 ms 12/04/04 15:13:25 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.28.78.123(10.28.78.123):9160 12/04/04 15:13:25 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: ConcurrentCassandraClientPoolByHost:{10.28.78.123(10.28.78.123):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 12/04/04 15:13:44 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.240.113.171(10.240.113.171):9160 12/04/04 15:13:44 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: ConcurrentCassandraClientPoolByHost:{10.240.113.171(10.240.113.171):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.28.78.123(10.28.78.123):9160 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: ConcurrentCassandraClientPoolByHost:{10.28.78.123(10.28.78.123):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.123.83.114(10.123.83.114):9160 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: ConcurrentCassandraClientPoolByHost:{10.123.83.114(10.123.83.114):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.6.115.239(10.6.115.239):9160 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: ConcurrentCassandraClientPoolByHost:{10.6.115.239(10.6.115.239):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19 12/04/04 15:13:49 ERROR com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 1 ms 12/04/04 15:13:49 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.120.205.48(10.120.205.48):9160 12/04/04 15:13:49 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: ConcurrentCassandraClientPoolByHost:{10.120.205.48(10.120.205.48):9160}; IsActive?: true; Active: 3; Blocked: 0; Idle: 3; NumBeforeExhausted: 17 12/04/04 15:13:50 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.28.20.200(10.28.20.200):9160 12/04/04 15:13:50 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: ConcurrentCassandraClientPoolByHost:{10.28.20.200(10.28.20.200):9160}; IsActive?: true; Active: 2; Blocked: 0; Idle: 4; NumBeforeExhausted: 18 12/04/04 15:13:51 ERROR com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 1 ms
Re: Server side scripting support in Cassandra - go Python !
Just like the Oracle Store Procedure. 2012/3/26 Data Craftsman database.crafts...@gmail.com: Howdy, Some Polyglot Persistence(NoSQL) products started support server side scripting, similar to RDBMS store procedure. E.g. Redis Lua scripting. I wish it is Python when Cassandra has the server side scripting feature. FYI, http://antirez.com/post/250 http://nosql.mypopescu.com/post/19949274021/alchemydb-an-integrated-graphdb-rdbms-kv-store server side scripting support is an extremely powerful tool. Having processing close to data (i.e. data locality) is a well known advantage, ..., it can open the doors to completely new features. Thanks, Charlie (@mujiang) 一个 木匠 === Data Architect Developer http://mujiang.blogspot.com
Resident size growth
Hi, I'm experiencing a steady growth in resident size of JVM running Cassandra 1.0.7. I disabled JNA and off-heap row cache, tested with and without mlockall disabling paging, and upgraded to JRE 1.6.0_31 to prevent this bug [1] to leak memory. Still JVM's resident set size grows steadily. A process with Xmx=2048M has grown to 6GB resident size and one with Xmx=8192M to 16GB in a few hours and increasing. Has anyone experienced this? Any idea how to deal with this issue? Thanks, Omid [1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7066129
Re: Materialized Views or Index CF - data model question
Will 1500 bytes row size be large or small for Cassandra from your understanding? performance degradation starts at 500MB rows, its very slow if you hit this limit.
upgrade from cassandra 0.8 to 1.0
Hello It's looks that cassandra 1.0.x is stable, and have interesting things like offheap memtables and row cashes, so we want to upgrade to 1.0 version from 0.8. Is it possible to do without cluster downtime (while we upgrade all nodes)? I mean follow: when we begin upgrade at some point in working cluster will be mix of 0.8 (nodes that are not upgraded yet) and 1.0(nodes that already upgraded) so i am concerned about this situation, i.e. communications between nodes can be broken because version communication protocol incompatibilies
Re: upgrade from cassandra 0.8 to 1.0
On 04/05/2012 03:19 PM, ruslan usifov wrote: Hello It's looks that cassandra 1.0.x is stable, and have interesting things like offheap memtables and row cashes, so we want to upgrade to 1.0 version from 0.8. Is it possible to do without cluster downtime (while we upgrade all nodes)? I mean follow: when we begin upgrade at some point in working cluster will be mix of 0.8 (nodes that are not upgraded yet) and 1.0(nodes that already upgraded) so i am concerned about this situation, i.e. communications between nodes can be broken because version communication protocol incompatibilies This is from the NEWs file in the 1.0 upgrade: 1.0 === Upgrading - - Upgrading from version 0.7.1+ or 0.8.2+ can be done with a rolling restart, one node at a time. (0.8.0 or 0.8.1 are NOT network-compatible with 1.0: upgrade to the most recent 0.8 release first.) You do not need to bring down the whole cluster at once. - After upgrading, run nodetool scrub against each node before running repair, moving nodes, or adding new ones.
Re: really bad select performance
Thanks for all the help everyone. The values were meant to be binary. I ended making the possible values between 0 and 50 instead of just 0 or 1. That way no single index row gets that wide. I now run queries for everything from 1 to 50 to get 'queued' items and set the value to 0 when I'm done (I will never query for row_loaded = 0). It's unfortunate Cassandra doesn't delegate the query execution to a node that had the index row on it, but rather tries to move the entire index row to the node that is queried. -Chris - Original Message - From: David Leimbach leim...@gmail.com To: user@cassandra.apache.org Sent: Monday, April 2, 2012 8:51:46 AM Subject: Re: really bad select performance This is all very hypothetical, but I've been bitten by this before. Does row_loaded happen to be a binary or boolean value? If so the secondary index generated by Cassandra will have at most 2 rows, and they'll be REALLY wide if you have a lot of entries. Since Cassandra doesn't distribute columns over rows, those potentially very wide index rows, and their replicas, must live in SSTables in their entirety on the nodes that own them (and their replicas). Even though you limit 1, I'm not sure what behind the scenes things Cassandra does. I've received advice to avoid the built in secondary indexes in Cassandra for some of these reasons. Also if row_loaded is meant to implement some kind of queuing behavior, it could be the wrong problem space for Cassandra as a result of all of the above. On Sat, Mar 31, 2012 at 12:22 PM, aaron morton aa...@thelastpickle.com wrote: Is there anything in the logs when you run the queries ? Try turning the logging up to DEBUG on the node that fails to return and see what happens. You will see it send messages to other nodes and do work itself. One thing to note, a query that uses secondary indexes runs on a node for each token range. So it will use more than CL number of nodes. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 30/03/2012, at 11:52 AM, Chris Hart wrote: Hi, I have the following cluster: 136112946768375385385349842972707284580 ip address MountainViewRAC1 Up Normal 1.86 GB 20.00% 0 ip address MountainViewRAC1 Up Normal 2.17 GB 33.33% 56713727820156410577229101238628035242 ip address MountainViewRAC1 Up Normal 2.41 GB 33.33% 113427455640312821154458202477256070485 ip address Rackspace RAC1 Up Normal 3.9 GB 13.33% 136112946768375385385349842972707284580 The following query runs quickly on all nodes except 1 MountainView node: select * from Access_Log where row_loaded = 0 limit 1; There is a secondary index on row_loaded. The query usually doesn't complete (but sometimes does) on the bad node and returns very quickly on all other nodes. I've upping the rpc timeout to a full minute (rpc_timeout_in_ms: 6) in the yaml, but it still often doesn't complete in a minute. It seems just as likely to complete and takes about the same amount of time whether the limit is 1, 100 or 1000. Thanks for any help, Chris
Re: Will Cassandra balance load across replicas?
In your case, cassandra will read the data from the nearest node, and read digest from other two nodes. When those read meet requested consistency level, cassandra will return the result. maki From iPhone On 2012/04/06, at 1:22, zhiming shen zhiming.s...@gmail.com wrote: Thanks for your reply. My question is about the impact of replication on load balancing. Say we have nodes ABCD... in the ring. ReplicationFactor is 3 so the data on A will also have replicas on B and C. If we are reading data own by A, and A is already very busy, will the requests be forwarded to B and C? How about update requests? Thanks, Zhiming On Thu, Apr 5, 2012 at 12:33 AM, Watanabe Maki watanabe.m...@gmail.com wrote: I assume you are talking about nodes, rather than replicas. The data distribution over ring depends on Partitioner and Replica placement strategy you use. If you are using Random Partitioner and Simple Strategy, your data will be automatically distributed over the nodes in the ring. maki On 2012/04/05, at 12:31, zhiming shen zhiming.s...@gmail.com wrote: Hi, Can any one tell me whether Cassandra can do load balancing across replicas? How to configure it for this purpose? Thanks very much. Best Regards, Zhiming
leveled compaction - improve log message
it would be really helpfull if leveled compaction prints level into syslog. Demo: INFO [CompactionExecutor:891] 2012-04-05 22:39:27,043 CompactionTask.java (line 113) Compacting ***LEVEL 1*** [SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19690-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19688-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19691-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19700-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19686-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19696-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19687-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19695-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19689-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19694-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19693-Data.db')] INFO [CompactionExecutor:891] 2012-04-05 22:39:57,299 CompactionTask.java (line 221) *** LEVEL 1 *** Compacted to [/var/lib/cassandra/data/rapidshare/querycache-hc-19701-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19702-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19703-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19704-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19705-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19706-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19707-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19708-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19709-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19710-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19711-Data.db,]. 59,643,011 to 57,564,216 (~96% of original) bytes for 590,909 keys at 1.814434MB/s. Time: 30,256ms.
Re: Is there a way to update column's TTL only?
We do not have a central point to do such work yet, but it seems this is only way to do it a little bit more efficiently. Thanks. Best Regards! Jian Jin 2012/4/5 aaron morton aa...@thelastpickle.com You cannot set the the TTL without also setting the column value. Could you keep a record of future deletes in a CF and then action them as a batch process ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/04/2012, at 2:00 PM, 金剑 wrote: Hi, We would like to leverage Cassandra's expiration to manage our data's lifecycle. We need to: Delete data after a period, like: 1 hour , when user clicked the Delete button. We need to read and insert the column in order to update the TTL, but this is unacceptable in our system that might need to readout gigabytes of data. Is there a way to do this? Best Regards! Jian Jin
Re: really bad select performance
But now when you set to 0 that index row will get very wide as it collects everything completed. You may want to consider deleting the indexed column for completed rows when done. Cassandra is not a great queue to use with built in indexes. Yo cold write your own index here and potentially do better. On Thursday, April 5, 2012, Chris Hart wrote: Thanks for all the help everyone. The values were meant to be binary. I ended making the possible values between 0 and 50 instead of just 0 or 1. That way no single index row gets that wide. I now run queries for everything from 1 to 50 to get 'queued' items and set the value to 0 when I'm done (I will never query for row_loaded = 0). It's unfortunate Cassandra doesn't delegate the query execution to a node that had the index row on it, but rather tries to move the entire index row to the node that is queried. -Chris - Original Message - From: David Leimbach leim...@gmail.com javascript:; To: user@cassandra.apache.org javascript:; Sent: Monday, April 2, 2012 8:51:46 AM Subject: Re: really bad select performance This is all very hypothetical, but I've been bitten by this before. Does row_loaded happen to be a binary or boolean value? If so the secondary index generated by Cassandra will have at most 2 rows, and they'll be REALLY wide if you have a lot of entries. Since Cassandra doesn't distribute columns over rows, those potentially very wide index rows, and their replicas, must live in SSTables in their entirety on the nodes that own them (and their replicas). Even though you limit 1, I'm not sure what behind the scenes things Cassandra does. I've received advice to avoid the built in secondary indexes in Cassandra for some of these reasons. Also if row_loaded is meant to implement some kind of queuing behavior, it could be the wrong problem space for Cassandra as a result of all of the above. On Sat, Mar 31, 2012 at 12:22 PM, aaron morton aa...@thelastpickle.comjavascript:; wrote: Is there anything in the logs when you run the queries ? Try turning the logging up to DEBUG on the node that fails to return and see what happens. You will see it send messages to other nodes and do work itself. One thing to note, a query that uses secondary indexes runs on a node for each token range. So it will use more than CL number of nodes. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 30/03/2012, at 11:52 AM, Chris Hart wrote: Hi, I have the following cluster: 136112946768375385385349842972707284580 ip address MountainViewRAC1 Up Normal 1.86 GB 20.00% 0 ip address MountainViewRAC1 Up Normal 2.17 GB 33.33% 56713727820156410577229101238628035242 ip address MountainViewRAC1 Up Normal 2.41 GB 33.33% 113427455640312821154458202477256070485 ip address Rackspace RAC1 Up Normal 3.9 GB 13.33% 136112946768375385385349842972707284580 The following query runs quickly on all nodes except 1 MountainView node: select * from Access_Log where row_loaded = 0 limit 1; There is a secondary index on row_loaded. The query usually doesn't complete (but sometimes does) on the bad node and returns very quickly on all other nodes. I've upping the rpc timeout to a full minute (rpc_timeout_in_ms: 6) in the yaml, but it still often doesn't complete in a minute. It seems just as likely to complete and takes about the same amount of time whether the limit is 1, 100 or 1000. Thanks for any help, Chris