Re: Consistency Level vs. Retry Policy when no local nodes are available

2017-03-21 Thread Shannon Carey
Thanks for the perspective Ben, it's food for thought.

At minimum, it seems like the documentation should be updated to mention that 
the retry policy will not be consulted when using a local consistency level but 
with no local nodes available. That way, people won't be surprised by it. It 
looks like the docs are included in the Github repo, so I guess I'll try to 
contribute an update there.


From: Ben Slater mailto:ben.sla...@instaclustr.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, March 20, 2017 at 6:25 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Consistency Level vs. Retry Policy when no local nodes are 
available

I think the general assumption is that DC failover happens at the client app 
level rather than the Cassandra level due to the potentially very significant 
difference in request latency if you move from a app-local DC to a remote DC. 
The preferred pattern for most people is that the app fails in a failed  DC and 
some load balancer above the app redirects traffic to a different DC.

The other factor is that the fail-back scenario from a failed DC and LOCAL_* 
consistencies is potentially complex. Do you want to immediately start using 
the new DC when it becomes available (with missing data) or wait until it 
catches up on writes (and how do you know when that has happened)?

Note also QUORUM is a clear majority of replicas across both DCs. Some people 
run 3 DCs with RF 3 in each and QUORUM to maintain strong consistency across 
DCs even with DC failure.

Cheers
Ben

On Tue, 21 Mar 2017 at 10:00 Shannon Carey 
mailto:sca...@expedia.com>> wrote:
Specifically, this puts us in an awkward position because LOCAL_QUORUM is 
desirable so that we don't have unnecessary cross-DC traffic from the client by 
default, but we can't use it because it will cause complete failure if the 
local DC goes down. And we can't use QUORUM because it would fail if there's 
not a quorum in either DC (as would happen if one DC goes down). So it seems 
like we are forced to use a lesser consistency such as ONE or TWO.

-Shannon

From: Shannon Carey mailto:sca...@expedia.com>>
Date: Monday, March 20, 2017 at 5:25 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Consistency Level vs. Retry Policy when no local nodes are available

I am running DSE 5.0, and I have a Java client using the Datastax 3.0.0 client 
library.

The client is configured to use a DCAwareRoundRobinPolicy wrapped in a 
TokenAwarePolicy. Nothing special.

When I run my query, I set a custom retry policy.

I am testing cross-DC failover. I have disabled connectivity to the "local" DC 
(relative to my client) in order to perform the test. When I run a query with 
the first consistency level set to LOCAL_ONE (or local anything), my retry 
policy is never called and I always get this exception:
"com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
tried for query failed (no host was tried)"

getErrors() on the exception is empty.

This is contrary to my expectation that the first attempt would fail and would 
allow my RetryPolicy to attempt a different (non-LOCAL) consistency level. I 
have no choice but to avoid using any kind of LOCAL consistency level 
throughout my applications. Is this expected? Or is there anything I can do 
about it? Thanks! It certainly seems like a bug to me or at least something 
that should be improved.

-Shannon
--

Ben Slater
Chief Product Officer
[https://cdn2.hubspot.net/hubfs/2549680/Instaclustr-Navy-logo-new.png]

[http://cdn2.hubspot.net/hubfs/184235/dev_images/signature_app/facebook_sig.png]
  
[http://cdn2.hubspot.net/hubfs/184235/dev_images/signature_app/twitter_sig.png] 
   
[http://cdn2.hubspot.net/hubfs/184235/dev_images/signature_app/linkedin_sig.png]
 

Read our latest technical blog posts here.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and 
Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally privileged 
information.  If you are not the intended recipient, do not copy or disclose 
its content, but please reply to this email immediately and highlight the error 
to the sender and then immediately delete the message.


Re: question on maximum disk seeks

2017-03-21 Thread preetika tyagi
Thank you Jan & Jeff for the responses. That was really useful.

Jan - I have one follow-up question. When the data is spread over more than
one SSTable in case of updates as you mentioned, we will need two seeks per
SSTable (one for partition index and another for SSTable itself). I'm
curious to know how partition index is structured internally. I was
assuming it to be a table with  pairs. In case of an
update to the same key for several times, how it is recorded in the
partition index?

Thanks,
Preetika

On Mon, Mar 20, 2017 at 10:37 PM,  wrote:

> Hi,
>
>
>
> youre right – one seek with hit in the partition key cache and two if not.
>
>
>
> Thats the theory – but two thinge to mention:
>
>
>
> First, you need two seeks per sstable not per entire read. So if you data
> is spread over multiple sstables on disk you obviously need more then two
> reads. Think of often updated partition keys – in combination with memory
> preassure you can easily end up with maaany sstables (ok they will be
> compacted some time in the future).
>
>
>
> Second, there could be fragmentation on disk which leads to seeks during
> sequential reads.
>
>
>
> Jan
>
>
>
> Gesendet von meinem Windows 10 Phone
>
>
>
> *Von: *preetika tyagi 
> *Gesendet: *Montag, 20. März 2017 21:18
> *An: *user@cassandra.apache.org
> *Betreff: *question on maximum disk seeks
>
>
>
> I'm trying to understand the maximum number of disk seeks required in a
> read operation in Cassandra. I looked at several online articles including
> this one: https://docs.datastax.com/en/cassandra/3.0/
> cassandra/dml/dmlAboutReads.html
>
> As per my understanding, two disk seeks are required in the worst case.
> One is for reading the partition index and another is to read the actual
> data from the compressed partition. The index of the data in compressed
> partitions is obtained from the compression offset tables (which is stored
> in memory). Am I on the right track here? Will there ever be a case when
> more than 1 disk seek is required to read the data?
>
> Thanks,
>
> Preetika
>
>
>
>
>


Re: question on maximum disk seeks

2017-03-21 Thread Jonathan Haddad
The partition index is never updated, as sstables are immutable.

On Tue, Mar 21, 2017 at 9:40 AM preetika tyagi 
wrote:

> Thank you Jan & Jeff for the responses. That was really useful.
>
> Jan - I have one follow-up question. When the data is spread over more
> than one SSTable in case of updates as you mentioned, we will need two
> seeks per SSTable (one for partition index and another for SSTable itself).
> I'm curious to know how partition index is structured internally. I was
> assuming it to be a table with  pairs. In case of an
> update to the same key for several times, how it is recorded in the
> partition index?
>
> Thanks,
> Preetika
>
> On Mon, Mar 20, 2017 at 10:37 PM,  wrote:
>
> Hi,
>
>
>
> youre right – one seek with hit in the partition key cache and two if not.
>
>
>
> Thats the theory – but two thinge to mention:
>
>
>
> First, you need two seeks per sstable not per entire read. So if you data
> is spread over multiple sstables on disk you obviously need more then two
> reads. Think of often updated partition keys – in combination with memory
> preassure you can easily end up with maaany sstables (ok they will be
> compacted some time in the future).
>
>
>
> Second, there could be fragmentation on disk which leads to seeks during
> sequential reads.
>
>
>
> Jan
>
>
>
> Gesendet von meinem Windows 10 Phone
>
>
>
> *Von: *preetika tyagi 
> *Gesendet: *Montag, 20. März 2017 21:18
> *An: *user@cassandra.apache.org
> *Betreff: *question on maximum disk seeks
>
>
>
> I'm trying to understand the maximum number of disk seeks required in a
> read operation in Cassandra. I looked at several online articles including
> this one:
> https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlAboutReads.html
>
> As per my understanding, two disk seeks are required in the worst case.
> One is for reading the partition index and another is to read the actual
> data from the compressed partition. The index of the data in compressed
> partitions is obtained from the compression offset tables (which is stored
> in memory). Am I on the right track here? Will there ever be a case when
> more than 1 disk seek is required to read the data?
>
> Thanks,
>
> Preetika
>
>
>
>
>
>


Re: question on maximum disk seeks

2017-03-21 Thread preetika tyagi
Yes, I understand that. However, what I'm trying to understand is the
internal structure of partition index. When a record associate with the
same partition key is updated, we have two different records with different
timestamps. There are chances of these two records being split across two
different SSTables (of course as long as compaction is not merging them
into one SSTable eventually). How partition index looks like in such case?
For the same key, we have two different records in different SSTables. How
does partition index store such information? Can it have repeated partition
keys with different disk offsets pointing to different SSTables?

On Tue, Mar 21, 2017 at 10:09 AM, Jonathan Haddad  wrote:

> The partition index is never updated, as sstables are immutable.
>
> On Tue, Mar 21, 2017 at 9:40 AM preetika tyagi 
> wrote:
>
>> Thank you Jan & Jeff for the responses. That was really useful.
>>
>> Jan - I have one follow-up question. When the data is spread over more
>> than one SSTable in case of updates as you mentioned, we will need two
>> seeks per SSTable (one for partition index and another for SSTable itself).
>> I'm curious to know how partition index is structured internally. I was
>> assuming it to be a table with  pairs. In case of an
>> update to the same key for several times, how it is recorded in the
>> partition index?
>>
>> Thanks,
>> Preetika
>>
>> On Mon, Mar 20, 2017 at 10:37 PM,  wrote:
>>
>> Hi,
>>
>>
>>
>> youre right – one seek with hit in the partition key cache and two if not.
>>
>>
>>
>> Thats the theory – but two thinge to mention:
>>
>>
>>
>> First, you need two seeks per sstable not per entire read. So if you data
>> is spread over multiple sstables on disk you obviously need more then two
>> reads. Think of often updated partition keys – in combination with memory
>> preassure you can easily end up with maaany sstables (ok they will be
>> compacted some time in the future).
>>
>>
>>
>> Second, there could be fragmentation on disk which leads to seeks during
>> sequential reads.
>>
>>
>>
>> Jan
>>
>>
>>
>> Gesendet von meinem Windows 10 Phone
>>
>>
>>
>> *Von: *preetika tyagi 
>> *Gesendet: *Montag, 20. März 2017 21:18
>> *An: *user@cassandra.apache.org
>> *Betreff: *question on maximum disk seeks
>>
>>
>>
>> I'm trying to understand the maximum number of disk seeks required in a
>> read operation in Cassandra. I looked at several online articles including
>> this one: https://docs.datastax.com/en/cassandra/3.0/
>> cassandra/dml/dmlAboutReads.html
>>
>> As per my understanding, two disk seeks are required in the worst case.
>> One is for reading the partition index and another is to read the actual
>> data from the compressed partition. The index of the data in compressed
>> partitions is obtained from the compression offset tables (which is stored
>> in memory). Am I on the right track here? Will there ever be a case when
>> more than 1 disk seek is required to read the data?
>>
>> Thanks,
>>
>> Preetika
>>
>>
>>
>>
>>
>>


Re: question on maximum disk seeks

2017-03-21 Thread Jonathan Haddad
Each sstable has it's own partition index, therefore it's never updated.

On Tue, Mar 21, 2017 at 11:04 AM preetika tyagi 
wrote:

> Yes, I understand that. However, what I'm trying to understand is the
> internal structure of partition index. When a record associate with the
> same partition key is updated, we have two different records with different
> timestamps. There are chances of these two records being split across two
> different SSTables (of course as long as compaction is not merging them
> into one SSTable eventually). How partition index looks like in such case?
> For the same key, we have two different records in different SSTables. How
> does partition index store such information? Can it have repeated partition
> keys with different disk offsets pointing to different SSTables?
>
> On Tue, Mar 21, 2017 at 10:09 AM, Jonathan Haddad 
> wrote:
>
> The partition index is never updated, as sstables are immutable.
>
> On Tue, Mar 21, 2017 at 9:40 AM preetika tyagi 
> wrote:
>
> Thank you Jan & Jeff for the responses. That was really useful.
>
> Jan - I have one follow-up question. When the data is spread over more
> than one SSTable in case of updates as you mentioned, we will need two
> seeks per SSTable (one for partition index and another for SSTable itself).
> I'm curious to know how partition index is structured internally. I was
> assuming it to be a table with  pairs. In case of an
> update to the same key for several times, how it is recorded in the
> partition index?
>
> Thanks,
> Preetika
>
> On Mon, Mar 20, 2017 at 10:37 PM,  wrote:
>
> Hi,
>
>
>
> youre right – one seek with hit in the partition key cache and two if not.
>
>
>
> Thats the theory – but two thinge to mention:
>
>
>
> First, you need two seeks per sstable not per entire read. So if you data
> is spread over multiple sstables on disk you obviously need more then two
> reads. Think of often updated partition keys – in combination with memory
> preassure you can easily end up with maaany sstables (ok they will be
> compacted some time in the future).
>
>
>
> Second, there could be fragmentation on disk which leads to seeks during
> sequential reads.
>
>
>
> Jan
>
>
>
> Gesendet von meinem Windows 10 Phone
>
>
>
> *Von: *preetika tyagi 
> *Gesendet: *Montag, 20. März 2017 21:18
> *An: *user@cassandra.apache.org
> *Betreff: *question on maximum disk seeks
>
>
>
> I'm trying to understand the maximum number of disk seeks required in a
> read operation in Cassandra. I looked at several online articles including
> this one:
> https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlAboutReads.html
>
> As per my understanding, two disk seeks are required in the worst case.
> One is for reading the partition index and another is to read the actual
> data from the compressed partition. The index of the data in compressed
> partitions is obtained from the compression offset tables (which is stored
> in memory). Am I on the right track here? Will there ever be a case when
> more than 1 disk seek is required to read the data?
>
> Thanks,
>
> Preetika
>
>
>
>
>
>
>


Re: question on maximum disk seeks

2017-03-21 Thread preetika tyagi
Oh I see. I understand it now. Thank you for the clarification!

Preetika

On Tue, Mar 21, 2017 at 11:07 AM, Jonathan Haddad  wrote:

> Each sstable has it's own partition index, therefore it's never updated.
>
> On Tue, Mar 21, 2017 at 11:04 AM preetika tyagi 
> wrote:
>
>> Yes, I understand that. However, what I'm trying to understand is the
>> internal structure of partition index. When a record associate with the
>> same partition key is updated, we have two different records with different
>> timestamps. There are chances of these two records being split across two
>> different SSTables (of course as long as compaction is not merging them
>> into one SSTable eventually). How partition index looks like in such case?
>> For the same key, we have two different records in different SSTables. How
>> does partition index store such information? Can it have repeated partition
>> keys with different disk offsets pointing to different SSTables?
>>
>> On Tue, Mar 21, 2017 at 10:09 AM, Jonathan Haddad 
>> wrote:
>>
>> The partition index is never updated, as sstables are immutable.
>>
>> On Tue, Mar 21, 2017 at 9:40 AM preetika tyagi 
>> wrote:
>>
>> Thank you Jan & Jeff for the responses. That was really useful.
>>
>> Jan - I have one follow-up question. When the data is spread over more
>> than one SSTable in case of updates as you mentioned, we will need two
>> seeks per SSTable (one for partition index and another for SSTable itself).
>> I'm curious to know how partition index is structured internally. I was
>> assuming it to be a table with  pairs. In case of an
>> update to the same key for several times, how it is recorded in the
>> partition index?
>>
>> Thanks,
>> Preetika
>>
>> On Mon, Mar 20, 2017 at 10:37 PM,  wrote:
>>
>> Hi,
>>
>>
>>
>> youre right – one seek with hit in the partition key cache and two if not.
>>
>>
>>
>> Thats the theory – but two thinge to mention:
>>
>>
>>
>> First, you need two seeks per sstable not per entire read. So if you data
>> is spread over multiple sstables on disk you obviously need more then two
>> reads. Think of often updated partition keys – in combination with memory
>> preassure you can easily end up with maaany sstables (ok they will be
>> compacted some time in the future).
>>
>>
>>
>> Second, there could be fragmentation on disk which leads to seeks during
>> sequential reads.
>>
>>
>>
>> Jan
>>
>>
>>
>> Gesendet von meinem Windows 10 Phone
>>
>>
>>
>> *Von: *preetika tyagi 
>> *Gesendet: *Montag, 20. März 2017 21:18
>> *An: *user@cassandra.apache.org
>> *Betreff: *question on maximum disk seeks
>>
>>
>>
>> I'm trying to understand the maximum number of disk seeks required in a
>> read operation in Cassandra. I looked at several online articles including
>> this one: https://docs.datastax.com/en/cassandra/3.0/
>> cassandra/dml/dmlAboutReads.html
>>
>> As per my understanding, two disk seeks are required in the worst case.
>> One is for reading the partition index and another is to read the actual
>> data from the compressed partition. The index of the data in compressed
>> partitions is obtained from the compression offset tables (which is stored
>> in memory). Am I on the right track here? Will there ever be a case when
>> more than 1 disk seek is required to read the data?
>>
>> Thanks,
>>
>> Preetika
>>
>>
>>
>>
>>
>>
>>


How to add a node with zero downtime

2017-03-21 Thread Cogumelos Maravilha
Hi list,

I'm using C* 3.10;

authenticator: PasswordAuthenticator and authorizer: CassandraAuthorizer

When adding a node and before |nodetool repair system_auth| finished all
my clients die with:

cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers',
{'10.100.100.19': AuthenticationFailed('Failed to authenticate to ...

Thanks in advance.



Re: How to add a node with zero downtime

2017-03-21 Thread daemeon reiydelle
Possible areas to check:
- too few nodes (node overload) - you did not indicate either replication
factor, number of nodes. Assume nodes are *rather* full.
- network overload (check your TORS's errors, also the tcp stats on the
relevant nodes)
- look for stop the world garbage collection on multiple nodes.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Tue, Mar 21, 2017 at 11:17 AM, Cogumelos Maravilha <
cogumelosmaravi...@sapo.pt> wrote:

> Hi list,
>
> I'm using C* 3.10;
>
> authenticator: PasswordAuthenticator and authorizer: CassandraAuthorizer
>
> When adding a node and before nodetool repair system_auth finished all my
> clients die with:
>
> cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers',
> {'10.100.100.19': AuthenticationFailed('Failed to authenticate to ...
>
> Thanks in advance.
>


Re: Issue with Cassandra consistency in results

2017-03-21 Thread srinivasarao daruna
The same issue is appearing in CQL Shell as well.

1) Entered into cqlsh
2) SET CONSISTENCY QUORUM;
3) Ran a select * with partition key in where cluase.

First result gave 0 records,
and Next records gave results.

Its really freaking out us at the moment. And nothing in debug.log or
system.log.

Thank You,
Regards,
Srini

On Fri, Mar 17, 2017 at 2:33 AM, daemeon reiydelle 
wrote:

> The prep is needed. If I recall correctly it must remain in cache for the
> query to complete. I don't have the docs to dig out the yaml parm to adjust
> query cache. I had run into the problem stress testing a smallish cluster
> with many queries at once.
>
> Do you have a sense of how many distinct queries are hitting the cluster
> at peak?
>
> If many clients, how do you balance the connection load or do you always
> hit the same node?
>
>
> sent from my mobile
> Daemeon Reiydelle
> skype daemeon.c.m.reiydelle
> USA 415.501.0198
>
> On Mar 16, 2017 3:25 PM, "srinivasarao daruna" 
> wrote:
>
>> Hi reiydelle,
>>
>> I cannot confirm the range as the volume of data is huge and the query
>> frequency is also high.
>> If the cache is the cause of issue, can we increase cache size or is
>> there solution to avoid dropped prep statements.?
>>
>>
>>
>>
>>
>>
>> Thank You,
>> Regards,
>> Srini
>>
>> On Thu, Mar 16, 2017 at 2:13 PM, daemeon reiydelle 
>> wrote:
>>
>>> The discard due to oom is causing the zero returned. I would guess a
>>> cache miss problem of some sort, but not sure. Are you using row, index,
>>> etc. caches? Are you seeing the failed prep statement on random nodes (duh,
>>> nodes that have the relevant data ranges)?
>>>
>>>
>>> *...*
>>>
>>>
>>>
>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <+1%20415-501-0198>London
>>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>>
>>> On Thu, Mar 16, 2017 at 10:56 AM, Ryan Svihla  wrote:
>>>
 Depends actually, restore just restores what's there, so if only one
 node had a copy of the data then only one node had a copy of the data
 meaning quorum will still be wrong sometimes.

 On Thu, Mar 16, 2017 at 1:53 PM, Arvydas Jonusonis <
 arvydas.jonuso...@gmail.com> wrote:

> If the data was written at ONE, consistency is not guaranteed. ..but
> considering you just restored the cluster, there's a good chance something
> else is off.
>
> On Thu, Mar 16, 2017 at 18:19 srinivasarao daruna <
> sree.srin...@gmail.com> wrote:
>
>> Want to make read and write QUORUM as well.
>>
>>
>> On Mar 16, 2017 1:09 PM, "Ryan Svihla"  wrote:
>>
>> Replication factor is 3, and write consistency is ONE and
>> read consistency is QUORUM.
>>
>> That combination is not gonna work well:
>>
>> *Write succeeds to NODE A but fails on node B,C*
>>
>> *Read goes to NODE B, C*
>>
>> If you can tolerate some temporary inaccuracy you can use QUORUM but
>> may still have the situation where
>>
>> Write succeeds on node A a timestamp 1, B succeeds at timestamp 2
>> Read succeeds on node B and C at timestamp 1
>>
>> If you need fully race condition free counts I'm afraid you need to
>> use SERIAL or LOCAL_SERIAL (for in DC only accuracy)
>>
>> On Thu, Mar 16, 2017 at 1:04 PM, srinivasarao daruna <
>> sree.srin...@gmail.com> wrote:
>>
>> Replication strategy is SimpleReplicationStrategy.
>>
>> Smith is : EC2 snitch. As we deployed cluster on EC2 instances.
>>
>> I was worried that CL=ALL have more read latency and read failures.
>> But won't rule out trying it.
>>
>> Should I switch select count (*) to select partition_key column?
>> Would that be of any help.?
>>
>>
>> Thank you
>> Regards
>> Srini
>>
>> On Mar 16, 2017 12:46 PM, "Arvydas Jonusonis" <
>> arvydas.jonuso...@gmail.com> wrote:
>>
>> What are your replication strategy and snitch settings?
>>
>> Have you tried doing a read at CL=ALL? If it's an actual
>> inconsistency issue (missing data), this should cause the correct results
>> to be returned. You'll need to run a repair to fix the inconsistencies.
>>
>> If all the data is actually there, you might have one or several
>> nodes that aren't identifying the correct replicas.
>>
>> Arvydas
>>
>>
>>
>> On Thu, Mar 16, 2017 at 5:31 PM, srinivasarao daruna <
>> sree.srin...@gmail.com> wrote:
>>
>> Hi Team,
>>
>> We are struggling with a problem related to cassandra counts, after
>> backup and restore of the cluster. Aaron Morton has suggested to send 
>> this
>> to user list, so some one of the list will be able to help me.
>>
>> We are have a rest api to talk to cassandra and one of our query
>> which fetches count is creating problems for us.
>>
>> We have done backup and restore and copied all the data to new
>> cluster. We have done nodetool refresh on

ONE has much higher latency than LOCAL_ONE

2017-03-21 Thread Shannon Carey
I am seeing unexpected behavior: consistency level ONE increases read latency 
99th percentile to ~108ms (95th percentile to 5ms-90ms) up from ~5ms (99th 
percentile) when using LOCAL_ONE.

I am using DSE 5.0 with Datastax client 3.0.0. The client is configured with a 
TokenAwarePolicy wrapping a DCAwareRoundRobinPolicy with usedHostsPerRemoteDc 
set to a very high number. Cassandra cluster has two datacenters.

I would expect that when the cluster is operating normally (all local nodes 
reachable), ONE would behave the same as LOCAL_ONE. The  Does anyone know why 
this is not the case?


Re: ONE has much higher latency than LOCAL_ONE

2017-03-21 Thread Matija Gobec
Are you running a multi DC cluster? If yes do you have application in both
data centers/regions ?

On Tue, Mar 21, 2017 at 8:07 PM, Shannon Carey  wrote:

> I am seeing unexpected behavior: consistency level ONE increases read
> latency 99th percentile to ~108ms (95th percentile to 5ms-90ms) up from
> ~5ms (99th percentile) when using LOCAL_ONE.
>
> I am using DSE 5.0 with Datastax client 3.0.0. The client is configured
> with a TokenAwarePolicy wrapping a DCAwareRoundRobinPolicy with
> usedHostsPerRemoteDc set to a very high number. Cassandra cluster has two
> datacenters.
>
> I would expect that when the cluster is operating normally (all local
> nodes reachable), ONE would behave the same as LOCAL_ONE. The  Does anyone
> know why this is not the case?
>


Re: spikes in blocked native transport requests

2017-03-21 Thread Nate McCall
See the details on: https://issues.apache.org/jira/browse/CASSANDRA-11363

You may need to add -Dcassandra.max_queued_native_transport_requests=4096
as a startup parameter. YMMV though, I suggest reading through the above to
get a complete picture.

On Mon, Mar 20, 2017 at 11:10 PM, Roland Otta 
wrote:
>
> well. i checked it now.
>
> we have some stw collections from 100 to 200ms every 5 to 60 seconds.
> i am not sure whether the blocked threads are related to that but anyway
these pauses are too long for low latency applications.
>
> so i wil check gc tuning first and will check afterwards whether the
blocked threads still exist afterwards.
>
>
>
> On Mon, 2017-03-20 at 08:55 +0100, benjamin roth wrote:
>
> Did you check STW GCs?
> You can do that with 'nodetool gcstats', by looking at the gc.log or
observing GC related JMX metrics.
>
> 2017-03-20 8:52 GMT+01:00 Roland Otta :
>
> we have a datacenter which is currently used exlusively for spark batch
> jobs.
>
> in case batch jobs are running against that environment we can see very
> high peaks in blocked native transport requests (up to 10k / minute).
>
> i am concerned because i guess that will slow other queries (in case
> other applications are going to use that dc as well).
>
> i already tried increasing native_transport_max_threads +
> concurrent_reads without success.
>
> during the jobs i cant find any resource limitiations on my hardware
> (iops, disk usage, cpu, ... is fine).
>
> am i missing something? any suggestions how to cope with that?
>
> br//
> roland
>
>
>
>
>



--
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: ONE has much higher latency than LOCAL_ONE

2017-03-21 Thread Shannon Carey
The cluster is in two DCs, and yes the client is deployed locally to each DC.

From: Matija Gobec mailto:matija0...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, March 21, 2017 at 2:56 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: ONE has much higher latency than LOCAL_ONE

Are you running a multi DC cluster? If yes do you have application in both data 
centers/regions ?

On Tue, Mar 21, 2017 at 8:07 PM, Shannon Carey 
mailto:sca...@expedia.com>> wrote:
I am seeing unexpected behavior: consistency level ONE increases read latency 
99th percentile to ~108ms (95th percentile to 5ms-90ms) up from ~5ms (99th 
percentile) when using LOCAL_ONE.

I am using DSE 5.0 with Datastax client 3.0.0. The client is configured with a 
TokenAwarePolicy wrapping a DCAwareRoundRobinPolicy with usedHostsPerRemoteDc 
set to a very high number. Cassandra cluster has two datacenters.

I would expect that when the cluster is operating normally (all local nodes 
reachable), ONE would behave the same as LOCAL_ONE. The  Does anyone know why 
this is not the case?



Re: ONE has much higher latency than LOCAL_ONE

2017-03-21 Thread Nate McCall
On Wed, Mar 22, 2017 at 12:48 PM, Shannon Carey  wrote:
>
> The cluster is in two DCs, and yes the client is deployed locally to each
DC.

First off, what is the goal of using ONE instead of LOCAL_ONE? If it's
failover, this could be addressed with a RetryPolicy starting wth LOCAL_ONE
and falling back to ONE.

Are you using the ".withLocalDc" option in the DCAwareRoundRobinPolicy
builder? (It's been a while since I've gone through this in detail,
though). If you could provide a snippet that included the complete options
passed to the builder that might be helpful.

Also, check for the complete forms of these two logging messages on the app
side during startup (the second one is at INFO so adjust if needed):
"Some contact points don't match local data center. Local DC = {}.
Non-conforming contact points: {}"
"Using data-center name '{}' for DCAwareRoundRobinPolicy..."

Make sure those line up with the cluster topology and your expectations.

Actually, in typing that up, it may be more appropriate to move the
conversation over here since this is probably driver specific:
https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user



--
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: ONE has much higher latency than LOCAL_ONE

2017-03-21 Thread Nate McCall
On Wed, Mar 22, 2017 at 1:11 PM, Nate McCall  wrote:

>
>
> On Wed, Mar 22, 2017 at 12:48 PM, Shannon Carey 
> wrote:
> >
> > The cluster is in two DCs, and yes the client is deployed locally to
> each DC.
>
> First off, what is the goal of using ONE instead of LOCAL_ONE? If it's
> failover, this could be addressed with a RetryPolicy starting wth LOCAL_ONE
> and falling back to ONE.
>
>
Just read your previous thread about this. That's pretty un-intuitive and
counter to the way I remember that working (though admittedly, it's been a
while).

Do please open a thread on the driver mailing list, i'm curious about the
response.


Re: ONE has much higher latency than LOCAL_ONE

2017-03-21 Thread Eric Plowe
ONE means at least one replica node to ack the write, but doesn't require
that the coordinator route the request to a node in the local data center.

LOCAL_ONE was introduced to handle the case of when you have multiple data
centers and cross data center traffic is not desirable.

In multiple datacenter clusters, a consistency level of ONE is often
desirable, but cross-DC traffic is not. LOCAL_ONEaccomplishes this. For
security and quality reasons, you can use this consistency level in an
offline datacenter to prevent automatic connection to online nodes in other
datacenters if an offline node goes down.

From:
https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_config_consistency_c.html

Regards,

Eric

On Tue, Mar 21, 2017 at 7:49 PM Shannon Carey  wrote:

> The cluster is in two DCs, and yes the client is deployed locally to each
> DC.
>
> From: Matija Gobec 
> Reply-To: "user@cassandra.apache.org" 
> Date: Tuesday, March 21, 2017 at 2:56 PM
> To: "user@cassandra.apache.org" 
> Subject: Re: ONE has much higher latency than LOCAL_ONE
>
> Are you running a multi DC cluster? If yes do you have application in both
> data centers/regions ?
>
> On Tue, Mar 21, 2017 at 8:07 PM, Shannon Carey  wrote:
>
> I am seeing unexpected behavior: consistency level ONE increases read
> latency 99th percentile to ~108ms (95th percentile to 5ms-90ms) up from
> ~5ms (99th percentile) when using LOCAL_ONE.
>
> I am using DSE 5.0 with Datastax client 3.0.0. The client is configured
> with a TokenAwarePolicy wrapping a DCAwareRoundRobinPolicy with
> usedHostsPerRemoteDc set to a very high number. Cassandra cluster has two
> datacenters.
>
> I would expect that when the cluster is operating normally (all local
> nodes reachable), ONE would behave the same as LOCAL_ONE. The  Does anyone
> know why this is not the case?
>
>
>


Re: Scrubbing corrupted SStable.

2017-03-21 Thread Nate McCall
The snapshots are hard links on the file system, so everything is included.
You can use the "--no-snapshot" option to disable snapshots.

On Tue, Mar 21, 2017 at 5:01 PM, Pranay akula 
wrote:
>
> I am trying to scrub a Column family using nodetool scrub,  is it going
to create snapshots for sstables which are corrupted or for all the
sstables it is going to scrub ?? and to remove snapshots created does
running nodetool clearsnapshot is enough or do i need to manually delete
pre-scrub data from snapshots of that Column family ??
>
> I can see significant increase in Data after starting scrub.
>
>
>
>
> Thanks
> Pranay.




--
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Issue with Cassandra consistency in results

2017-03-21 Thread Shubham Jaju
Hi

This issue used to appear with me . What I figured in my case was
  1. I had 3 machines
  2. Inserted the data with ONE consistency (i.e there is no guarantee that
data was propagated to remaining nodes , cassandra is supposed to take care
of that).
  3. Later I figured also that one of machines has different hard disk
space compared to other two (less and data size was more ) . i.e it was not
able to contain whole set of data.

So I think in above cases ( 2,3 )if you will query you will get different
results as nodes are not in sync.
nodetool repair should solve this problem. But it takes more time if you
have more data.
Check if this solves you problem.

Regards

Shubham Jaju

On Wed, Mar 22, 2017 at 12:23 AM, srinivasarao daruna <
sree.srin...@gmail.com> wrote:

> The same issue is appearing in CQL Shell as well.
>
> 1) Entered into cqlsh
> 2) SET CONSISTENCY QUORUM;
> 3) Ran a select * with partition key in where cluase.
>
> First result gave 0 records,
> and Next records gave results.
>
> Its really freaking out us at the moment. And nothing in debug.log or
> system.log.
>
> Thank You,
> Regards,
> Srini
>
> On Fri, Mar 17, 2017 at 2:33 AM, daemeon reiydelle 
> wrote:
>
>> The prep is needed. If I recall correctly it must remain in cache for the
>> query to complete. I don't have the docs to dig out the yaml parm to adjust
>> query cache. I had run into the problem stress testing a smallish cluster
>> with many queries at once.
>>
>> Do you have a sense of how many distinct queries are hitting the cluster
>> at peak?
>>
>> If many clients, how do you balance the connection load or do you always
>> hit the same node?
>>
>>
>> sent from my mobile
>> Daemeon Reiydelle
>> skype daemeon.c.m.reiydelle
>> USA 415.501.0198
>>
>> On Mar 16, 2017 3:25 PM, "srinivasarao daruna" 
>> wrote:
>>
>>> Hi reiydelle,
>>>
>>> I cannot confirm the range as the volume of data is huge and the query
>>> frequency is also high.
>>> If the cache is the cause of issue, can we increase cache size or is
>>> there solution to avoid dropped prep statements.?
>>>
>>>
>>>
>>>
>>>
>>>
>>> Thank You,
>>> Regards,
>>> Srini
>>>
>>> On Thu, Mar 16, 2017 at 2:13 PM, daemeon reiydelle 
>>> wrote:
>>>
 The discard due to oom is causing the zero returned. I would guess a
 cache miss problem of some sort, but not sure. Are you using row, index,
 etc. caches? Are you seeing the failed prep statement on random nodes (duh,
 nodes that have the relevant data ranges)?


 *...*



 *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <+1%20415-501-0198>London
 (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*

 On Thu, Mar 16, 2017 at 10:56 AM, Ryan Svihla  wrote:

> Depends actually, restore just restores what's there, so if only one
> node had a copy of the data then only one node had a copy of the data
> meaning quorum will still be wrong sometimes.
>
> On Thu, Mar 16, 2017 at 1:53 PM, Arvydas Jonusonis <
> arvydas.jonuso...@gmail.com> wrote:
>
>> If the data was written at ONE, consistency is not guaranteed. ..but
>> considering you just restored the cluster, there's a good chance 
>> something
>> else is off.
>>
>> On Thu, Mar 16, 2017 at 18:19 srinivasarao daruna <
>> sree.srin...@gmail.com> wrote:
>>
>>> Want to make read and write QUORUM as well.
>>>
>>>
>>> On Mar 16, 2017 1:09 PM, "Ryan Svihla"  wrote:
>>>
>>> Replication factor is 3, and write consistency is ONE and
>>> read consistency is QUORUM.
>>>
>>> That combination is not gonna work well:
>>>
>>> *Write succeeds to NODE A but fails on node B,C*
>>>
>>> *Read goes to NODE B, C*
>>>
>>> If you can tolerate some temporary inaccuracy you can use QUORUM but
>>> may still have the situation where
>>>
>>> Write succeeds on node A a timestamp 1, B succeeds at timestamp 2
>>> Read succeeds on node B and C at timestamp 1
>>>
>>> If you need fully race condition free counts I'm afraid you need to
>>> use SERIAL or LOCAL_SERIAL (for in DC only accuracy)
>>>
>>> On Thu, Mar 16, 2017 at 1:04 PM, srinivasarao daruna <
>>> sree.srin...@gmail.com> wrote:
>>>
>>> Replication strategy is SimpleReplicationStrategy.
>>>
>>> Smith is : EC2 snitch. As we deployed cluster on EC2 instances.
>>>
>>> I was worried that CL=ALL have more read latency and read failures.
>>> But won't rule out trying it.
>>>
>>> Should I switch select count (*) to select partition_key column?
>>> Would that be of any help.?
>>>
>>>
>>> Thank you
>>> Regards
>>> Srini
>>>
>>> On Mar 16, 2017 12:46 PM, "Arvydas Jonusonis" <
>>> arvydas.jonuso...@gmail.com> wrote:
>>>
>>> What are your replication strategy and snitch settings?
>>>
>>> Have you tried doing a read at CL=ALL? If it's an actual
>>> inconsistency issue (mi

[Cassandra 3.0.9] Cannot allocate memory

2017-03-21 Thread Abhishek Kumar Maheshwari
Hi all,

I am using Cassandra 3.0.9. while I am adding new server after some time I am 
getting below exception. JVM option is attaches.
Hardware info:
Ram 64 GB.
Core: 40


Java HotSpot(TM) 64-Bit Server VM warning: INFO: 
os::commit_memory(0x7fe9c44ee000, 12288, 0) failed; error='Cannot allocate 
memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 12288 bytes for committing 
reserved memory.
Java HotSpot(TM) 64-Bit Server VM warning: INFO: 
os::commit_memory(0x7f5c056ab000, 12288, 0) failed; error='Cannot allocate 
memory' (errno=12)
[thread 140033204860672 also had an error]
Java HotSpot(TM) 64-Bit Server VM warning: INFO: 
os::commit_memory(0x7f5c0566a000, 12288, 0) failed; error='Cannot allocate 
memory' (errno=12)
[thread 140033204594432 also had an error]Java HotSpot(TM) 64-Bit Server VM 
warning:
INFO: os::commit_memory(0x7fe9c420c000, 12288, 0) failed; error='Cannot 
allocate memory' (errno=12)
Java HotSpot(TM) 64-Bit Server VM warning: [thread 140641994852096 also had an 
error]INFO: os::commit_memory(0x7f5c055a7000, 12288, 0) failed; 
error='Cannot allocate memory' (errno=12)

Please let me know what I miss?

Thanks & Regards,
Abhishek Kumar Maheshwari
+91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company
FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

YES Bank & The Economic Times Global Business Summit (GBS) is back on 27-28 
March. The 3rd edition of GBS will see participation of 2000+ delegates from 
over 20 countries as they bear witness to leaders sharing insights into how to 
best navigate a dynamic future. Visit www.et-gbs.com


jvm.options
Description: jvm.options