are there many rows like this? did you check the logs on the other nodes for exceptions?
On Mon, Oct 19, 2009 at 7:40 PM, Edmond Lau <edm...@ooyala.com> wrote: > Usually I'm trying to read 500 columns (~250KB) out of the 30K columns > (~15MB) of the supercolumn. But the same issues happen when I drop > down to 100 (~50KB) columns. The columns I request from get_slice() > are named, i.e. I'm not reading 500 consecutive columns. > > Edmond > > On Mon, Oct 19, 2009 at 5:36 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >> How much of the row that fails are you trying to read at once? >> >> On Mon, Oct 19, 2009 at 7:30 PM, Edmond Lau <edm...@ooyala.com> wrote: >>> Whenever I try to do a quorum read on a row with a particularly large >>> supercolumn with get_slice under high load, cassandra throws timeouts. >>> The reads for that row repeatedly fail until load decreases, but >>> smaller reads still succeed during that time. bin/nodeprobe info >>> shows that the read latency for the column family spikes to up 6-8 >>> seconds. I've run into this issue since I started to play with >>> cassandra, but thought that it might go away with beefier nodes. I've >>> since gotten more powerful machines, but the timeouts still happen. >>> >>> Some details: >>> - cassandra 0.4.1 >>> - 5 nodes, each with 12-core 800MHz with 8GB RAM, 5GB heap size >>> - replication factor of 3 >>> - RandomPartitioner >>> - row that fails has a supercolumn with ~30K subcolumns, ~500 bytes >>> per cell, ~15MB total >>> - my failed quorum read lists 500 columns to read in the get_slice >>> call, but the same happens even when I read 100. >>> >>> The nodes either timeout with 0 or 1 responses (2 of 3 required for a >>> quorum read): >>> >>> ERROR [pool-1-thread-24] 2009-10-20 00:07:43,851 Cassandra.java (line >>> 679) Internal error processing get_slice >>> java.lang.RuntimeException: java.util.concurrent.TimeoutException: >>> Operation timed out - received only 0 responses from . >>> at >>> org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:108) >>> at >>> org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182) >>> at >>> org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251) >>> at >>> org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220) >>> at >>> org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671) >>> at >>> org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627) >>> at >>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>> at java.lang.Thread.run(Thread.java:619) >>> Caused by: java.util.concurrent.TimeoutException: Operation timed out >>> - received only 0 responses from . >>> at >>> org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88) >>> at >>> org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:395) >>> at >>> org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:317) >>> at >>> org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100) >>> ... 9 more >>> >>> ERROR [pool-1-thread-32] 2009-10-19 23:47:21,045 Cassandra.java (line >>> 679) Internal error processing get_slice >>> java.lang.RuntimeException: java.util.concurrent.TimeoutException: >>> Operation timed out - received only 1 responses from >>> 172.16.129.75:7000 . >>> at >>> org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:108) >>> at >>> org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182) >>> at >>> org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251) >>> at >>> org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220) >>> at >>> org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671) >>> at >>> org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627) >>> at >>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>> at java.lang.Thread.run(Thread.java:619) >>> Caused by: java.util.concurrent.TimeoutException: Operation timed out >>> - received only 1 responses from 172.16.129.75:7000 . >>> at >>> org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88) >>> at >>> org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:395) >>> at >>> org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:317) >>> at >>> org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100) >>> ... 9 more >>> >>> Any ideas what the issue might be? >>> >>> Edmond >>> >> >