Question about consistency levels

2013-11-09 Thread graham sanderson
I’m trying to be more succinct this time since no answers on my last attempt.

We are currently using 2.0.2 in test (no C* in production yet), and use 
(LOCAL_)QUORUM CL on read and writes which guarantees (if successful) that we 
read latest data.

That said, it is highly likely that (LOCAL_)ONE would return our data since it 
isn’t read for quite some time after write.

Given that we must do our best to return data, we want to see what options we 
have when a quorum read fails (say 2 of 3 replicas go down with 3 replicas - 
note we have also seen this issue with bugs related to CF deletion/re-creating 
during compaction or load causing data corruption in which case 1 bad node can 
screw things up)

One option is to fall back to (LOCAL_)ONE if we detect the right exception from 
(LOCAL_)QUORUM from the client side, but that obviously degrades your 
consistency.

That said we ONLY ever do idempotent writes, and NEVER delete. So once again I 
wonder if there is a (reasonable) use case for a CL whereby you will accept the 
first non empty response from any replica?

smime.p7s
Description: S/MIME cryptographic signature


A lot of MUTATION and REQUEST_RESPONSE messages dropped

2013-11-09 Thread srmore
I recently upgraded to 1.2.9 and I am seeing a lot of REQUEST_RESPONSE and
MUTATION messages are being dropped.

This happens when I have multiple nodes in the cluster (about 3 nodes) and
I send traffic to only one node. I don't think the traffic is that high, it
is around 400 msg/sec with 100 threads. When I take down other two nodes I
don't see any errors (at least on the client side) I am using Pelops.

On the client I get UnavailableException, but the nodes are up. Initially I
thought I am hitting CASSANDRA-6297 (gossip thread blocking) so I changed
memtable_flush_writers to 3. Still no luck.

UnavailableException:
org.scale7.cassandra.pelops.exceptions.UnavailableException: null at
org.scale7.cassandra.pelops.exceptions.IExceptionTranslator$ExceptionTranslator.translate(IExceptionTranslator.java:61)
~[na:na] at

In the debug log on the cassandra node this is the exception I see

DEBUG [Thrift:78] 2013-11-09 16:47:28,212 CustomTThreadPoolServer.java
Thrift transport error occurred during processing of message.
org.apache.thrift.transport.TTransportException
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
at
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:206)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)

Could this be because of high load ? with Cassandra 1.0.011 I did not see
this issue.

Thanks,
Sandeep