Re: How can I scale my read rate?

2017-03-23 Thread Alain Rastoul

On 24/03/2017 01:00, Eric Stevens wrote:

Assuming an even distribution of data in your cluster, and an even
distribution across those keys by your readers, you would not need to
increase RF with cluster size to increase read performance.  If you have
3 nodes with RF=3, and do 3 million reads, with good distribution, each
node has served 1 million read requests.  If you increase to 6 nodes and
keep RF=3, then each node now owns half as much data and serves only
500,000 reads.  Or more meaningfully in the same time it takes to do 3
million reads under the 3 node cluster you ought to be able to do 6
million reads under the 6 node cluster since each node is just
responsible for 1 million total reads.


Hi Eric,

I think I got your point.
In case of really evenly distributed  reads it may (or should?) not make 
any difference,


But when you do not distribute well the reads (and in that case only),
my understanding about RF was that it could help spreading the load :
In that case, with RF= 4 instead of 3,  with several clients accessing keys
same key ranges, a coordinator could pick up one node to handle the request
in 4 replicas instead of picking up one node in 3 , thus having
more "workers" to handle a request ?

Am I wrong here ?

Thank you for the clarification


--
best,
Alain



RE: [Cassandra 3.0.9] Cannot allocate memory

2017-03-23 Thread Abhishek Kumar Maheshwari
Thanks Jayesh,

I found the fix for the same.

I make below changes :

In /etc/sysctl.conf I make below change:
vm.max_map_count = 1048575

in the /etc/security/limits.d file:

root - memlock unlimited
root - nofile 10
root - nproc 32768
root - as unlimited


Thanks & Regards,
Abhishek Kumar Maheshwari
+91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company
FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

From: Thakrar, Jayesh [mailto:jthak...@conversantmedia.com]
Sent: Thursday, March 23, 2017 8:36 PM
To: Abhishek Kumar Maheshwari ; 
user@cassandra.apache.org
Subject: Re: [Cassandra 3.0.9] Cannot allocate memory

Dmesg will often print a message saying that it had to kill a process if the 
server was short of memory, so you will have to dump the output to a file and 
check.
If a process is killed to reclaim memory for the system, then it will dump a 
list of all processes and the actual process that was killed.
So maybe, you can check for a kill like this - "dmesg | grep -i kill"
If you do find a line (or two), then you need to examine the output carefully.

In production, I tend to dump a lot of GC output also which helps 
troubleshooting.
E.g. Below is what I have.
If you look, I also have a flag that says that if the heap runs out of memory 
(which is rare), then dump files.
If dmesg does not show your processes being killed, then you may have to dump 
gc logging info to get some insight.

-XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42
-Xms16G
-Xmx16G
-Xmn4800M
-XX:+HeapDumpOnOutOfMemoryError
-Xss256k
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:+ParallelRefProcEnabled
-XX:+CMSClassUnloadingEnabled
-XX:CMSInitiatingOccupancyFraction=80
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseParNewGC
-XX:MaxTenuringThreshold=2
-XX:SurvivorRatio=8
-XX:+UnlockDiagnosticVMOptions
-XX:ParGCCardsPerStrideChunk=32768
-XX:NewSize=750m
-XX:MaxNewSize=750m
-XX:+UseCondCardMark
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-Xloggc:
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=1M
-Djava.net.preferIPv4Stack=true
-Dcom.sun.management.jmxremote.port=7199
-Dcom.sun.management.jmxremote.ssl=
-Dcom.sun.management.jmxremote.authenticate=




From: Abhishek Kumar Maheshwari 
>
Date: Wednesday, March 22, 2017 at 5:18 PM
To: "user@cassandra.apache.org" 
>
Subject: RE: [Cassandra 3.0.9] Cannot allocate memory

JVM config is as below:

-Xms16G
-Xmx16G
-Xmn3000M

What I need to check in dmesg?

From: Thakrar, Jayesh [mailto:jthak...@conversantmedia.com]
Sent: 23 March 2017 03:39
To: Abhishek Kumar Maheshwari 
>;
 user@cassandra.apache.org
Subject: RE: [Cassandra 3.0.9] Cannot allocate memory


And what is the configured max heap?
Sometimes you may also be able to see some useful messages in "dmesg" output.

Jayesh


From: Abhishek Kumar Maheshwari 
>
Sent: Wednesday, March 22, 2017 5:05:14 PM
To: Thakrar, Jayesh; user@cassandra.apache.org
Subject: RE: [Cassandra 3.0.9] Cannot allocate memory

No only Cassandra is running on these servers.

From: Thakrar, Jayesh [mailto:jthak...@conversantmedia.com]
Sent: 22 March 2017 22:27
To: Abhishek Kumar Maheshwari 
>;
 user@cassandra.apache.org
Subject: Re: [Cassandra 3.0.9] Cannot allocate memory

Is/are the Cassandra server(s) shared?
E.g. do they run mesos + spark?

From: Abhishek Kumar Maheshwari 
>
Date: Wednesday, March 22, 2017 at 12:45 AM
To: "user@cassandra.apache.org" 
>
Subject: [Cassandra 3.0.9] Cannot allocate memory

Hi all,

I am using Cassandra 3.0.9. while I am adding new server after some time I am 
getting below exception. JVM option is attaches.
Hardware info:
Ram 64 GB.
Core: 40


Java HotSpot(TM) 64-Bit Server VM warning: INFO: 
os::commit_memory(0x7fe9c44ee000, 12288, 0) failed; error='Cannot allocate 
memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 12288 bytes for committing 
reserved 

Re: How can I scale my read rate?

2017-03-23 Thread Eric Stevens
Assuming an even distribution of data in your cluster, and an even
distribution across those keys by your readers, you would not need to
increase RF with cluster size to increase read performance.  If you have 3
nodes with RF=3, and do 3 million reads, with good distribution, each node
has served 1 million read requests.  If you increase to 6 nodes and keep
RF=3, then each node now owns half as much data and serves only 500,000
reads.  Or more meaningfully in the same time it takes to do 3 million
reads under the 3 node cluster you ought to be able to do 6 million reads
under the 6 node cluster since each node is just responsible for 1 million
total reads.

On Mon, Mar 20, 2017 at 11:24 PM Alain Rastoul 
wrote:

> On 20/03/2017 22:05, Michael Wojcikiewicz wrote:
> > Not sure if someone has suggested this, but I believe it's not
> > sufficient to simply add nodes to a cluster to increase read
> > performance: you also need to alter the ReplicationFactor of the
> > keyspace to a larger value as you increase your cluster gets larger.
> >
> > ie. data is available from more nodes in the cluster for each query.
> >
> Yes, good point in case of cluster growth, there would be more replica
> to handle same key ranges.
> And also readjust token ranges :
> https://cassandra.apache.org/doc/latest/operating/topo_changes.html
>
> SG, can you give some information (or share your code) about how you
> generate your data and how you read it ?
>
> --
> best,
> Alain
>
>


Re: SSTable Ancestors information in Cassandra 3.0.x

2017-03-23 Thread Jeff Jirsa
The ancestors were used primarily to clean up leftovers in the case that
cassandra was killed right as compaction finished, where the
source/origin/ancestors were still on the disk at the same time as the
compaction result.

It's not timestamp based, though - that compaction process has moved to
using a transaction log, which tracks the source/results on a per
compaction basis, and cassandra uses those logs/journals rather than
inspecting the ancestors.

- Jeff



On Thu, Mar 23, 2017 at 4:35 PM, Rajath Subramanyam 
wrote:

> Thanks, Jeff. Did all the internal tasks and the compaction tasks move to a
> timestamp-based approach?
>
> Regards,
> Rajath
>
> 
> Rajath Subramanyam
>
>
> On Thu, Mar 23, 2017 at 2:12 PM, Jeff Jirsa  wrote:
>
> > That information was removed, because it was really meant to be used for
> a
> > handful of internal tasks, most of which were no longer used. The
> remaining
> > use was cleaning up compaction leftovers, and the compaction leftover
> code
> > was rewritten in 3.0 / CASSANDRA-7066 (note, though, that it's somewhat
> > incomplete in the upgrade case , so CASSANDRA-13313 may be interesting to
> > people who are very very very very very very very sensitive to data
> > consistency)
> >
> >
> > On Thu, Mar 23, 2017 at 2:00 PM, Rajath Subramanyam 
> > wrote:
> >
> > > Hello Cassandra-Users and Cassandra-dev,
> > >
> > > One of the handy features in sstablemetadata that was part of Cassandra
> > > 2.1.15 was that it displayed Ancestor information of an SSTable. Here
> is
> > a
> > > sample output of the sstablemetadata tool with the ancestors
> information
> > in
> > > C* 2.1.15:
> > > [centos@chen-datos test1-b83746000fef11e7bdfc8bb2d6662df7]$
> > > sstablemetadata
> > > ks3-test1-ka-2-Statistics.db | grep "Ancestors"
> > > Ancestors: [1]
> > > [centos@chen-datos test1-b83746000fef11e7bdfc8bb2d6662df7]$
> > >
> > > However, the same tool in Cassandra 3.0.x no longer gives us that
> > > information. Here is a sample output of the sstablemetadata grepping
> for
> > > Ancestors information in C* 3.0 (the output is empty since it is no
> > longer
> > > available):
> > > [centos@rj-cassandra-1 elsevier1-ab7389f0fafb11e6ac23e7ccf62f494b]$
> > > sstablemetadata mc-5-big-Statistics.db | grep "Ancestors"
> > > [centos@rj-cassandra-1 elsevier1-ab7389f0fafb11e6ac23e7ccf62f494b]$
> > >
> > > My question, how can I get this information in C* 3.0.x ?
> > >
> > > Thank you !
> > >
> > > Regards,
> > > Rajath
> > >
> > > 
> > > Rajath Subramanyam
> > >
> >
>


Re: SSTable Ancestors information in Cassandra 3.0.x

2017-03-23 Thread Rajath Subramanyam
Thanks, Jeff. Did all the internal tasks and the compaction tasks move to a
timestamp-based approach?

Regards,
Rajath


Rajath Subramanyam


On Thu, Mar 23, 2017 at 2:12 PM, Jeff Jirsa  wrote:

> That information was removed, because it was really meant to be used for a
> handful of internal tasks, most of which were no longer used. The remaining
> use was cleaning up compaction leftovers, and the compaction leftover code
> was rewritten in 3.0 / CASSANDRA-7066 (note, though, that it's somewhat
> incomplete in the upgrade case , so CASSANDRA-13313 may be interesting to
> people who are very very very very very very very sensitive to data
> consistency)
>
>
> On Thu, Mar 23, 2017 at 2:00 PM, Rajath Subramanyam 
> wrote:
>
> > Hello Cassandra-Users and Cassandra-dev,
> >
> > One of the handy features in sstablemetadata that was part of Cassandra
> > 2.1.15 was that it displayed Ancestor information of an SSTable. Here is
> a
> > sample output of the sstablemetadata tool with the ancestors information
> in
> > C* 2.1.15:
> > [centos@chen-datos test1-b83746000fef11e7bdfc8bb2d6662df7]$
> > sstablemetadata
> > ks3-test1-ka-2-Statistics.db | grep "Ancestors"
> > Ancestors: [1]
> > [centos@chen-datos test1-b83746000fef11e7bdfc8bb2d6662df7]$
> >
> > However, the same tool in Cassandra 3.0.x no longer gives us that
> > information. Here is a sample output of the sstablemetadata grepping for
> > Ancestors information in C* 3.0 (the output is empty since it is no
> longer
> > available):
> > [centos@rj-cassandra-1 elsevier1-ab7389f0fafb11e6ac23e7ccf62f494b]$
> > sstablemetadata mc-5-big-Statistics.db | grep "Ancestors"
> > [centos@rj-cassandra-1 elsevier1-ab7389f0fafb11e6ac23e7ccf62f494b]$
> >
> > My question, how can I get this information in C* 3.0.x ?
> >
> > Thank you !
> >
> > Regards,
> > Rajath
> >
> > 
> > Rajath Subramanyam
> >
>


Re: SSTable Ancestors information in Cassandra 3.0.x

2017-03-23 Thread Jeff Jirsa
That information was removed, because it was really meant to be used for a
handful of internal tasks, most of which were no longer used. The remaining
use was cleaning up compaction leftovers, and the compaction leftover code
was rewritten in 3.0 / CASSANDRA-7066 (note, though, that it's somewhat
incomplete in the upgrade case , so CASSANDRA-13313 may be interesting to
people who are very very very very very very very sensitive to data
consistency)


On Thu, Mar 23, 2017 at 2:00 PM, Rajath Subramanyam 
wrote:

> Hello Cassandra-Users and Cassandra-dev,
>
> One of the handy features in sstablemetadata that was part of Cassandra
> 2.1.15 was that it displayed Ancestor information of an SSTable. Here is a
> sample output of the sstablemetadata tool with the ancestors information in
> C* 2.1.15:
> [centos@chen-datos test1-b83746000fef11e7bdfc8bb2d6662df7]$
> sstablemetadata
> ks3-test1-ka-2-Statistics.db | grep "Ancestors"
> Ancestors: [1]
> [centos@chen-datos test1-b83746000fef11e7bdfc8bb2d6662df7]$
>
> However, the same tool in Cassandra 3.0.x no longer gives us that
> information. Here is a sample output of the sstablemetadata grepping for
> Ancestors information in C* 3.0 (the output is empty since it is no longer
> available):
> [centos@rj-cassandra-1 elsevier1-ab7389f0fafb11e6ac23e7ccf62f494b]$
> sstablemetadata mc-5-big-Statistics.db | grep "Ancestors"
> [centos@rj-cassandra-1 elsevier1-ab7389f0fafb11e6ac23e7ccf62f494b]$
>
> My question, how can I get this information in C* 3.0.x ?
>
> Thank you !
>
> Regards,
> Rajath
>
> 
> Rajath Subramanyam
>


SSTable Ancestors information in Cassandra 3.0.x

2017-03-23 Thread Rajath Subramanyam
Hello Cassandra-Users and Cassandra-dev,

One of the handy features in sstablemetadata that was part of Cassandra
2.1.15 was that it displayed Ancestor information of an SSTable. Here is a
sample output of the sstablemetadata tool with the ancestors information in
C* 2.1.15:
[centos@chen-datos test1-b83746000fef11e7bdfc8bb2d6662df7]$ sstablemetadata
ks3-test1-ka-2-Statistics.db | grep "Ancestors"
Ancestors: [1]
[centos@chen-datos test1-b83746000fef11e7bdfc8bb2d6662df7]$

However, the same tool in Cassandra 3.0.x no longer gives us that
information. Here is a sample output of the sstablemetadata grepping for
Ancestors information in C* 3.0 (the output is empty since it is no longer
available):
[centos@rj-cassandra-1 elsevier1-ab7389f0fafb11e6ac23e7ccf62f494b]$
sstablemetadata mc-5-big-Statistics.db | grep "Ancestors"
[centos@rj-cassandra-1 elsevier1-ab7389f0fafb11e6ac23e7ccf62f494b]$

My question, how can I get this information in C* 3.0.x ?

Thank you !

Regards,
Rajath


Rajath Subramanyam


[ANNOUNCE] Apache Gora 0.7 Release

2017-03-23 Thread lewis john mcgibbney
Hi Folks,

The Apache Gora team are pleased to announce the immediate availability of
Apache Gora 0.7.
The Apache Gora open source framework provides an in-memory data model and
persistence for big data. Gora supports persisting to column stores, key
value stores, document stores and RDBMSs, and analyzing the data with
extensive Apache Hadoop™ MapReduce support.

The Gora DOAP can be found at http://gora.apache.org/current/doap_Gora.rdf

This release addresses 80 issues, for a breakdown please see the release
report . Drop by our mailing lists and ask
questions for information on any of the above.

Gora 0.7 provides support for the following projects

   - Apache Avro  1.8.1
   - Apache Hadoop  2.5.2
   - Apache HBase  1.2.3
   - Apache Cassandra  2.0.2
   - Apache Solr  5.5.1
   - MongoDB  (driver) 3.4.2
   - Apache Accumlo  1.7.1
   - Apache Spark  1.4.1
   - Apache CouchDB  1.4.2 (test containers
    1.1.0)
   - Amazon DynamoDB  (driver) 1.10.55
   - Infinispan  7.2.5.Final
   - JCache  1.0.0 with Hazelcast
    3.6.4 support.

Gora is released as both source code, downloads for which can be found at
our downloads page , as well as
Maven artifacts which can be found on Maven central
.
Thanks


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Using datastax driver, how can I read a non-primitive column as a JSON string?

2017-03-23 Thread S G
Hi,

I have several non-primitive columns in my cassandra tables.
Some of them are user-defined-types UDTs.

While querying them through datastax driver, I want to convert such UDTs
into JSON values.
More specifically, I want to get JSON string for the value object below:

Row row = itr.next();

ColumnDefinitions cds = row.getColumnDefinitions();

cds.asList().forEach((ColumnDefinitions.Definition cd) -> {

String name = cd.getName();

Object value = row.getObject(name);

  }

I have gone through
http://docs.datastax.com/en/developer/java-driver/3.1/manual/custom_codecs/

But I do not want to add a codec for every UDT I have.


Can the driver somehow return me direct JSON without explicit meddling with
codecs and all?


Thanks

SG


Re: [Cassandra 3.0.9] Cannot allocate memory

2017-03-23 Thread Thakrar, Jayesh
Dmesg will often print a message saying that it had to kill a process if the 
server was short of memory, so you will have to dump the output to a file and 
check.
If a process is killed to reclaim memory for the system, then it will dump a 
list of all processes and the actual process that was killed.
So maybe, you can check for a kill like this - "dmesg | grep -i kill"
If you do find a line (or two), then you need to examine the output carefully.

In production, I tend to dump a lot of GC output also which helps 
troubleshooting.
E.g. Below is what I have.
If you look, I also have a flag that says that if the heap runs out of memory 
(which is rare), then dump files.
If dmesg does not show your processes being killed, then you may have to dump 
gc logging info to get some insight.

-XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42
-Xms16G
-Xmx16G
-Xmn4800M
-XX:+HeapDumpOnOutOfMemoryError
-Xss256k
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:+ParallelRefProcEnabled
-XX:+CMSClassUnloadingEnabled
-XX:CMSInitiatingOccupancyFraction=80
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseParNewGC
-XX:MaxTenuringThreshold=2
-XX:SurvivorRatio=8
-XX:+UnlockDiagnosticVMOptions
-XX:ParGCCardsPerStrideChunk=32768
-XX:NewSize=750m
-XX:MaxNewSize=750m
-XX:+UseCondCardMark
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-Xloggc:
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=1M
-Djava.net.preferIPv4Stack=true
-Dcom.sun.management.jmxremote.port=7199
-Dcom.sun.management.jmxremote.ssl=
-Dcom.sun.management.jmxremote.authenticate=




From: Abhishek Kumar Maheshwari 
Date: Wednesday, March 22, 2017 at 5:18 PM
To: "user@cassandra.apache.org" 
Subject: RE: [Cassandra 3.0.9] Cannot allocate memory

JVM config is as below:

-Xms16G
-Xmx16G
-Xmn3000M

What I need to check in dmesg?

From: Thakrar, Jayesh [mailto:jthak...@conversantmedia.com]
Sent: 23 March 2017 03:39
To: Abhishek Kumar Maheshwari ; 
user@cassandra.apache.org
Subject: RE: [Cassandra 3.0.9] Cannot allocate memory


And what is the configured max heap?
Sometimes you may also be able to see some useful messages in "dmesg" output.

Jayesh


From: Abhishek Kumar Maheshwari 
>
Sent: Wednesday, March 22, 2017 5:05:14 PM
To: Thakrar, Jayesh; user@cassandra.apache.org
Subject: RE: [Cassandra 3.0.9] Cannot allocate memory

No only Cassandra is running on these servers.

From: Thakrar, Jayesh [mailto:jthak...@conversantmedia.com]
Sent: 22 March 2017 22:27
To: Abhishek Kumar Maheshwari 
>;
 user@cassandra.apache.org
Subject: Re: [Cassandra 3.0.9] Cannot allocate memory

Is/are the Cassandra server(s) shared?
E.g. do they run mesos + spark?

From: Abhishek Kumar Maheshwari 
>
Date: Wednesday, March 22, 2017 at 12:45 AM
To: "user@cassandra.apache.org" 
>
Subject: [Cassandra 3.0.9] Cannot allocate memory

Hi all,

I am using Cassandra 3.0.9. while I am adding new server after some time I am 
getting below exception. JVM option is attaches.
Hardware info:
Ram 64 GB.
Core: 40


Java HotSpot(TM) 64-Bit Server VM warning: INFO: 
os::commit_memory(0x7fe9c44ee000, 12288, 0) failed; error='Cannot allocate 
memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 12288 bytes for committing 
reserved memory.
Java HotSpot(TM) 64-Bit Server VM warning: INFO: 
os::commit_memory(0x7f5c056ab000, 12288, 0) failed; error='Cannot allocate 
memory' (errno=12)
[thread 140033204860672 also had an error]
Java HotSpot(TM) 64-Bit Server VM warning: INFO: 
os::commit_memory(0x7f5c0566a000, 12288, 0) failed; error='Cannot allocate 
memory' (errno=12)
[thread 140033204594432 also had an error]Java HotSpot(TM) 64-Bit Server VM 
warning:
INFO: os::commit_memory(0x7fe9c420c000, 12288, 0) failed; error='Cannot 
allocate memory' (errno=12)
Java HotSpot(TM) 64-Bit Server VM warning: [thread 140641994852096 also had an 
error]INFO: os::commit_memory(0x7f5c055a7000, 12288, 0) failed; 
error='Cannot allocate memory' (errno=12)

Please let me know what I miss?

Thanks & Regards,
Abhishek Kumar Maheshwari
+91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company
FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
P Please do not print this email unless it is 

Re: ONE has much higher latency than LOCAL_ONE

2017-03-23 Thread Shannon Carey
Thanks for the link, I hadn't seen that before.

It's unfortunate that they don't explain what they mean by "closest replica". 
The nodes in the remote DC should not be regarded as "closest". Also, it's not 
clear what the black arrows mean… the coordinator sends the read to all three 
replicas, but only one of them responds?

Reading further (assuming this article from 2012 is still accurate for 
Cassandra 3.0 
http://www.datastax.com/dev/blog/dynamic-snitching-in-cassandra-past-present-and-future),
 it appears that by "closest replica" what they really mean is the replica 
chosen by the "dynamic snitch". The structure of the documentation 
https://docs.datastax.com/en/cassandra/3.0/cassandra/architecture/archSnitchesAbout.html
 is misleading in this regard: it puts the "Dynamic snitching" section side by 
side with the other snitch implementations, implying that it's one of the 
choices you can configure as a snitch, which is why I hadn't read that section 
(I didn't want a snitch that "monitors the performance of reads"). Instead, the 
info about the dynamic snitch should be in the top-level page. In any case, the 
dynamic snitch is apparently governed by read latency, the node's state, and 
whether the node is doing a compaction ("severity"). So why is it routing 
requests to nodes with latency that's ~20 times larger? I don't know, but I 
wish it wasn't.

I guess it's important to differentiate between that and the load balancing 
policy called LatencyAwarePolicy… even if you're not using the 
LatencyAwarePolicy, internally the snitch is still doing stuff based on latency.

This is also unfortunate because it makes the DCAwareRoundRobinPolicy useless 
for anything but local consistency levels, and (if you read it at face value) 
contradicts the description in the documentation that "This policy queries 
nodes of the local data-center in a round-robin fashion; optionally, it can 
also try a configurable number of hosts in remote data centers if all local 
hosts failed."

Also, if you're right that the requests are getting routed to the remote DC, 
then those requests aren't showing up in my graph of read request rate… which 
is problematic because I'm not getting an accurate representation of what's 
actually happening. I can't find any other metric beyond 
org.apache.cassandra.metrics.ClientRequest.* which might include these internal 
read requests.

I am wondering if perhaps there's a mistake with the way that the dynamic 
snitch measures latency… if it's only measuring requests coming from clients, 
then if a remote node happens to win the dynamic snitch's favor momentarily, 
the latency of the local node will increase (because it's querying the remote 
node), and then the dynamic snitch will see that the local node is performing 
poorly, and will continue directing traffic to the remote cluster.  Or, perhaps 
they're measuring the latency of each node based not on how long the client 
request takes but based on how long the internal request takes… which, again, 
could mislead the snitch into thinking that the remote host is providing a 
better deal to the client than it really is. It seems like a mistake that the 
dynamic switch would think that a remote node will be faster or less work than 
the local node which actually contains a copy of the data being queried.

Looks like I'm not the only one who's run into this: 
https://issues.apache.org/jira/browse/CASSANDRA-6908

I think I'm going to try setting the system property 
"cassandra.ignore_dynamic_snitch_severity" to "true" and see what happens. That 
or "dynamic_snitch: false"… it's not documented in 
https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html
 but it appears to be a valid config option.




From: Eric Plowe >
Reply-To: "user@cassandra.apache.org" 
>
Date: Wednesday, March 22, 2017 at 11:44 AM
To: "user@cassandra.apache.org" 
>
Subject: Re: ONE has much higher latency than LOCAL_ONE

Yes, your request from the client is going to the LocalDC that you've defined 
for the data center aware load balancing policy, but with a consistency level 
of ONE, there is a chance for the coordinator (the node your client has 
connected to) to route the request across DC's.

Please see: 
https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dmlClientRequestsRead.html#dmlClientRequestsRead__two-dc-one

"A two datacenter cluster with a consistency level of ONE
"In a multiple datacenter cluster with a replication factor of 3, and a read 
consistency of ONE, the closest replica for the given row, regardless of 
datacenter, is contacted to fulfill the read request. In the background a read 
repair is potentially initiated, based on the read_repair_chance setting of the 
table,