RE: Consistency Issues

2015-10-05 Thread Walsh, Stephen
It did, but a ran it again on one node – that node never recovered. ☹

From: Robert Coli [mailto:rc...@eventbrite.com]
Sent: 02 October 2015 21:20
To: user@cassandra.apache.org
Subject: Re: Consistency Issues

On Fri, Oct 2, 2015 at 1:32 AM, Walsh, Stephen 
mailto:stephen.wa...@aspect.com>> wrote:
Sorry for the late reply, I ran the nodetool resetlocalschema on all nodes but 
in the end it just removed all the schemas and crashed the applications.
I need to reset and try again. I’ll try get you the gc stats today ☺

FTR, running resetlocalschema on all nodes (especially simultaneously) seems 
likely to nuke all of your schema.

=Rob

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


Re: Consistency Issues

2015-10-02 Thread Robert Coli
On Fri, Oct 2, 2015 at 1:32 AM, Walsh, Stephen 
wrote:

> Sorry for the late reply, I ran the nodetool resetlocalschema on all
> nodes but in the end it just removed all the schemas and crashed the
> applications.
>
> I need to reset and try again. I’ll try get you the gc stats today J
>

FTR, running resetlocalschema on all nodes (especially simultaneously)
seems likely to nuke all of your schema.

=Rob


RE: Consistency Issues

2015-10-02 Thread Walsh, Stephen

Using the following cmd  - sudo su cassandra -c "jstat -gccause 4162”

Gave this (not sure if it will present correctly on the webpage)

But during load we only see data move between the survivor spaces in Eden and 
the old gen never really grows

  S0  S1E  O M CCS  YGC   YGCTFGC   
 FGCT  GCTLGCC  GCC
  0.00  70.57  48.69  26.29  97.86  96.62119   14.087 20.100   
14.187 Allocation Failure   No GC
  0.00  70.57  49.02  26.29  97.86  96.62119   14.087 20.100   
14.187 Allocation Failure   No GC
  0.00  70.57  78.38  26.29  97.86  96.62119   14.087 20.100   
14.187 Allocation Failure   No GC
  0.00  70.57  83.99  26.29  97.86  96.62119   14.087 20.100   
14.187 Allocation Failure   No GC
  0.00  70.57  90.07  26.29  97.86  96.62119   14.087 20.100   
14.187 Allocation Failure   No GC
  0.00  70.57  90.30  26.29  97.86  96.62119   14.087 20.100   
14.187 Allocation Failure   No GC
  0.00  70.57  90.40  26.29  97.86  96.62119   14.087 20.100   
14.187 Allocation Failure   No GC

From: Walsh, Stephen [mailto:stephen.wa...@aspect.com]
Sent: 02 October 2015 09:32
To: user@cassandra.apache.org
Subject: RE: Consistency Issues

Sorry for the late reply, I ran the nodetool resetlocalschema on all nodes but 
in the end it just removed all the schemas and crashed the applications.
I need to reset and try again. I’ll try get you the gc stats today ☺


From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: 01 October 2015 16:01
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Consistency Issues

You say that you don't think GC is your issue... but did you actually check?  
The reasons you suggest aren't very convincing.  Can you provide your GC 
settings, and take a look at jstat --gccause?

http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstat.html#gccause_option


On Thu, Oct 1, 2015 at 4:50 AM Walsh, Stephen 
mailto:stephen.wa...@aspect.com>> wrote:
If you’re looking for the clean-up of the old gen in the jvm heap, it doesn’t 
happen.
We have a new gen turning 15 times before its pushed to old gen.
Seems all our data only has a TTL of 10 seconds – very little data is sent to 
the old gen.

Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is our 
issue.


I’m more worried about error messages in the Cassandra log file that state.


UnknownColumnFamilyException reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
cfId=cf411b50-6785-11e5-a435-e7be20c92086

and

cassandra OutboundTcpConnection.java:313 - error writing to Connection.



But I really need to understand this best practice that was mentioned (on 
number of CF’s) by Jack Krupansky.
Anyone more information on this?


Many thanks for all your help guys keep it coming ☺
Steve

From: Ricardo Sancho 
[mailto:sancho.rica...@gmail.com<mailto:sancho.rica...@gmail.com>]
Sent: 01 October 2015 09:39
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: Consistency Issues


Can you tell us how much time your gcs are taking?
Do you see any especially long ones?
On 1 Oct 2015 09:37, "Walsh, Stephen" 
mailto:stephen.wa...@aspect.com>> wrote:
There is no load balancer in front of Cassandra,  it’s in front of our 
application.
Everyone seems hung up on this point? But it’s not the root causing of the 
inconsistency issue.

Can anyone verify the best practice for number of CF’s?


From: Robert Coli [mailto:rc...@eventbrite.com<mailto:rc...@eventbrite.com>]
Sent: 30 September 2015 18:45
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Consistency Issues

On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen 
mailto:stephen.wa...@aspect.com>> wrote:

We never had these issue with our first run. Its only when we added another 25% 
of writes.

As Jack said, you are probably pushing your GC over a threshold, leading to 
long pause times and inability to meet quorum.

As Sebastian said, you probably shouldn't need a load balancer in front of 
Cassandra.

=Rob

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute th

RE: Consistency Issues

2015-10-02 Thread Walsh, Stephen
Sorry for the late reply, I ran the nodetool resetlocalschema on all nodes but 
in the end it just removed all the schemas and crashed the applications.
I need to reset and try again. I’ll try get you the gc stats today ☺


From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: 01 October 2015 16:01
To: user@cassandra.apache.org
Subject: Re: Consistency Issues

You say that you don't think GC is your issue... but did you actually check?  
The reasons you suggest aren't very convincing.  Can you provide your GC 
settings, and take a look at jstat --gccause?

http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstat.html#gccause_option


On Thu, Oct 1, 2015 at 4:50 AM Walsh, Stephen 
mailto:stephen.wa...@aspect.com>> wrote:
If you’re looking for the clean-up of the old gen in the jvm heap, it doesn’t 
happen.
We have a new gen turning 15 times before its pushed to old gen.
Seems all our data only has a TTL of 10 seconds – very little data is sent to 
the old gen.

Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is our 
issue.


I’m more worried about error messages in the Cassandra log file that state.


UnknownColumnFamilyException reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
cfId=cf411b50-6785-11e5-a435-e7be20c92086

and

cassandra OutboundTcpConnection.java:313 - error writing to Connection.



But I really need to understand this best practice that was mentioned (on 
number of CF’s) by Jack Krupansky.
Anyone more information on this?


Many thanks for all your help guys keep it coming ☺
Steve

From: Ricardo Sancho 
[mailto:sancho.rica...@gmail.com<mailto:sancho.rica...@gmail.com>]
Sent: 01 October 2015 09:39
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: Consistency Issues


Can you tell us how much time your gcs are taking?
Do you see any especially long ones?
On 1 Oct 2015 09:37, "Walsh, Stephen" 
mailto:stephen.wa...@aspect.com>> wrote:
There is no load balancer in front of Cassandra,  it’s in front of our 
application.
Everyone seems hung up on this point? But it’s not the root causing of the 
inconsistency issue.

Can anyone verify the best practice for number of CF’s?


From: Robert Coli [mailto:rc...@eventbrite.com<mailto:rc...@eventbrite.com>]
Sent: 30 September 2015 18:45
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Consistency Issues

On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen 
mailto:stephen.wa...@aspect.com>> wrote:

We never had these issue with our first run. Its only when we added another 25% 
of writes.

As Jack said, you are probably pushing your GC over a threshold, leading to 
long pause times and inability to meet quorum.

As Sebastian said, you probably shouldn't need a load balancer in front of 
Cassandra.

=Rob

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


Re: Consistency Issues

2015-10-01 Thread Jonathan Haddad
You say that you don't think GC is your issue... but did you actually
check?  The reasons you suggest aren't very convincing.  Can you provide
your GC settings, and take a look at jstat --gccause?

http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstat.html#gccause_option



On Thu, Oct 1, 2015 at 4:50 AM Walsh, Stephen 
wrote:

> If you’re looking for the clean-up of the old gen in the jvm heap, it
> doesn’t happen.
>
> We have a new gen turning 15 times before its pushed to old gen.
>
> Seems all our data only has a TTL of 10 seconds – very little data is sent
> to the old gen.
>
>
>
> Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is
> our issue.
>
>
>
>
>
> I’m more worried about error messages in the Cassandra log file that state.
>
>
>
>
>
> UnknownColumnFamilyException reading from socket; closing
>
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
> cfId=cf411b50-6785-11e5-a435-e7be20c92086
>
>
>
> and
>
>
>
> cassandra OutboundTcpConnection.java:313 - error writing to Connection.
>
>
>
>
>
>
>
> But I really need to understand this best practice that was mentioned (on
> number of CF’s) by Jack Krupansky.
>
> Anyone more information on this?
>
>
>
>
>
> Many thanks for all your help guys keep it coming J
>
> Steve
>
>
>
> *From:* Ricardo Sancho [mailto:sancho.rica...@gmail.com]
> *Sent:* 01 October 2015 09:39
> *To:* user@cassandra.apache.org
> *Subject:* RE: Consistency Issues
>
>
>
> Can you tell us how much time your gcs are taking?
> Do you see any especially long ones?
>
> On 1 Oct 2015 09:37, "Walsh, Stephen"  wrote:
>
> There is no load balancer in front of Cassandra,  it’s in front of our
> application.
>
> Everyone seems hung up on this point? But it’s not the root causing of the
> inconsistency issue.
>
>
>
> Can anyone verify the best practice for number of CF’s?
>
>
>
>
>
> *From:* Robert Coli [mailto:rc...@eventbrite.com]
> *Sent:* 30 September 2015 18:45
> *To:* user@cassandra.apache.org
> *Subject:* Re: Consistency Issues
>
>
>
> On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen 
> wrote:
>
>
>
> We never had these issue with our first run. Its only when we added
> another 25% of writes.
>
>
>
> As Jack said, you are probably pushing your GC over a threshold, leading
> to long pause times and inability to meet quorum.
>
>
>
> As Sebastian said, you probably shouldn't need a load balancer in front of
> Cassandra.
>
>
>
> =Rob
>
>
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>


Re: Consistency Issues

2015-10-01 Thread Carlos Alonso
Well... I wasn't expecting that, as both OpsCenter 5.2.1 and cqlsh in
Cassandra 2.1.x both use native protocol. I was expecting them having
different protocols, so

Have no further ideas :(

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 1 October 2015 at 14:36, Walsh, Stephen  wrote:

> Thanks Jake, I’ll try test out 2.1.9 to see if it resolved the issue and
> ill try “nodetool resetlocalschema” now to see if it helps.
>
>
>
> Cassandra is 2.1.6
>
> OpsCenter is 5.2.1
>
>
>
> *From:* Jake Luciani [mailto:jak...@gmail.com]
> *Sent:* 01 October 2015 14:00
> *To:* user 
> *Subject:* Re: Consistency Issues
>
>
>
> Onur, was responding to Stephen's issue.
>
>
>
>
>
> On Thu, Oct 1, 2015 at 8:56 AM, Onur Yalazı 
> wrote:
>
> Thank you Jake.
>
> The issue is I do not have missing CF's and upgrading beyond 2.1.3 is not
> a possibility because of the deprecation of cql dialects. Our application
> is using Hector and migrating to cql3 is a huge refactoring.
>
>
>
>
> On 01/10/15 15:48, Jake Luciani wrote:
>
> Couple things to try.
>
> 1. nodetool resetlocalschema on the nodes with missing CFs. This will
> refresh the schema on the local node.
> 2. upgrade to 2.1.9. There are some pretty major issues in 2.1.6 (nothing
> specific to this problem but worth upgrading)
>
>
>
>
>
>
>
> --
>
> http://twitter.com/tjake
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>


RE: Consistency Issues

2015-10-01 Thread Sebastian Estevez
You're running describe with CL quorum aren't you?

To see the inconsistency you'd have to check the system.schema_column
family tables on each node.
On Oct 1, 2015 8:07 AM, "Walsh, Stephen"  wrote:

> No such thing as a stupid questionJ
>
> I know they exist in some nodes, but if they replicated correctly is a
> different story.
>
> I’m  checking this one now,
>
>
>
> Ok, hooked up OpsCenter to see what it was saying,
>
> Out of the 100 keyspaces creted,
>
> 9 are missing one CF
>
> 2 are missing two CF’s
>
> 1 is missing three CF’s
>
>
>
> It looks like the replication of the tables did not complete to all nodes?
>
>
>
> Looking at each of the 4 nodes at the keyspace with 3 missing CF’s
>
> (via CQLSH_HOST=x.x.x.x cqlsh & “Describe keyspace XXX;”)
>
>
>
> Node 1 : has all CF’s
>
> Node 2 : has all CF’s
>
> Node 3 : has all CF’s
>
> Node 4 : has all CF’s
>
>
>
>
>
> This is indeed very strange….
>
>
>
>
>
> *From:* Carlos Alonso [mailto:i...@mrcalonso.com]
> *Sent:* 01 October 2015 12:05
> *To:* user@cassandra.apache.org
> *Subject:* Re: Consistency Issues
>
>
>
> And that's a stupid one, I know, but does the column you're trying to
> access actually exist?
>
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
>
>
> On 1 October 2015 at 11:09, Walsh, Stephen 
> wrote:
>
> I did think of that and they are all the same version J
>
>
>
>
>
> *From:* Carlos Alonso [mailto:i...@mrcalonso.com]
> *Sent:* 01 October 2015 10:11
>
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: Consistency Issues
>
>
>
> Hi Stephen.
>
>
>
> The UnknownColumnFamilyException made me thought of a possible schema
> disagreement in which any of your nodes has a different version and
> therefore you cannot reach quorum?
>
>
>
> Can you run nodetool describecluster and see if all nodes have the same
> schema versions?
>
>
>
> Cheers!
>
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
>
>
> On 1 October 2015 at 09:49, Walsh, Stephen 
> wrote:
>
> If you’re looking for the clean-up of the old gen in the jvm heap, it
> doesn’t happen.
>
> We have a new gen turning 15 times before its pushed to old gen.
>
> Seems all our data only has a TTL of 10 seconds – very little data is sent
> to the old gen.
>
>
>
> Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is
> our issue.
>
>
>
>
>
> I’m more worried about error messages in the Cassandra log file that state.
>
>
>
>
>
> UnknownColumnFamilyException reading from socket; closing
>
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
> cfId=cf411b50-6785-11e5-a435-e7be20c92086
>
>
>
> and
>
>
>
> cassandra OutboundTcpConnection.java:313 - error writing to Connection.
>
>
>
>
>
>
>
> But I really need to understand this best practice that was mentioned (on
> number of CF’s) by Jack Krupansky.
>
> Anyone more information on this?
>
>
>
>
>
> Many thanks for all your help guys keep it coming J
>
> Steve
>
>
>
> *From:* Ricardo Sancho [mailto:sancho.rica...@gmail.com]
> *Sent:* 01 October 2015 09:39
> *To:* user@cassandra.apache.org
> *Subject:* RE: Consistency Issues
>
>
>
> Can you tell us how much time your gcs are taking?
> Do you see any especially long ones?
>
> On 1 Oct 2015 09:37, "Walsh, Stephen"  wrote:
>
> There is no load balancer in front of Cassandra,  it’s in front of our
> application.
>
> Everyone seems hung up on this point? But it’s not the root causing of the
> inconsistency issue.
>
>
>
> Can anyone verify the best practice for number of CF’s?
>
>
>
>
>
> *From:* Robert Coli [mailto:rc...@eventbrite.com]
> *Sent:* 30 September 2015 18:45
> *To:* user@cassandra.apache.org
> *Subject:* Re: Consistency Issues
>
>
>
> On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen 
> wrote:
>
>
>
> We never had these issue with our first run. Its only when we added
> another 25% of writes.
>
>
>
> As Jack said, you are probably pushing your GC over a threshold, leading
> to long pause times and inability to meet quorum.
>
>
>
> As Sebastian said, you probably shouldn't need a load balancer in front of
> Cassandra.
>
>
>
> =Rob
>
>
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If yo

RE: Consistency Issues

2015-10-01 Thread Walsh, Stephen
Thanks Jake, I’ll try test out 2.1.9 to see if it resolved the issue and ill 
try “nodetool resetlocalschema” now to see if it helps.

Cassandra is 2.1.6
OpsCenter is 5.2.1

From: Jake Luciani [mailto:jak...@gmail.com]
Sent: 01 October 2015 14:00
To: user 
Subject: Re: Consistency Issues

Onur, was responding to Stephen's issue.


On Thu, Oct 1, 2015 at 8:56 AM, Onur Yalazı 
mailto:onur.yal...@8digits.com>> wrote:
Thank you Jake.

The issue is I do not have missing CF's and upgrading beyond 2.1.3 is not a 
possibility because of the deprecation of cql dialects. Our application is 
using Hector and migrating to cql3 is a huge refactoring.



On 01/10/15 15:48, Jake Luciani wrote:
Couple things to try.

1. nodetool resetlocalschema on the nodes with missing CFs. This will refresh 
the schema on the local node.
2. upgrade to 2.1.9. There are some pretty major issues in 2.1.6 (nothing 
specific to this problem but worth upgrading)




--
http://twitter.com/tjake
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


Re: Consistency Issues

2015-10-01 Thread Jake Luciani
Onur, was responding to Stephen's issue.


On Thu, Oct 1, 2015 at 8:56 AM, Onur Yalazı  wrote:

> Thank you Jake.
>
> The issue is I do not have missing CF's and upgrading beyond 2.1.3 is not
> a possibility because of the deprecation of cql dialects. Our application
> is using Hector and migrating to cql3 is a huge refactoring.
>
>
>
> On 01/10/15 15:48, Jake Luciani wrote:
>
>> Couple things to try.
>>
>> 1. nodetool resetlocalschema on the nodes with missing CFs. This will
>> refresh the schema on the local node.
>> 2. upgrade to 2.1.9. There are some pretty major issues in 2.1.6 (nothing
>> specific to this problem but worth upgrading)
>>
>
>


-- 
http://twitter.com/tjake


Re: Consistency Issues

2015-10-01 Thread Onur Yalazı

Thank you Jake.

The issue is I do not have missing CF's and upgrading beyond 2.1.3 is 
not a possibility because of the deprecation of cql dialects. Our 
application is using Hector and migrating to cql3 is a huge refactoring.



On 01/10/15 15:48, Jake Luciani wrote:

Couple things to try.

1. nodetool resetlocalschema on the nodes with missing CFs. This will 
refresh the schema on the local node.
2. upgrade to 2.1.9. There are some pretty major issues in 2.1.6 
(nothing specific to this problem but worth upgrading)




Re: Consistency Issues

2015-10-01 Thread Jake Luciani
Couple things to try.

1. nodetool resetlocalschema on the nodes with missing CFs. This will
refresh the schema on the local node.
2. upgrade to 2.1.9. There are some pretty major issues in 2.1.6 (nothing
specific to this problem but worth upgrading)


Re: Consistency Issues

2015-10-01 Thread Carlos Alonso
Which versions of Cassandra and OpsCenter are you using? Because probably
opscenter and your app are using cql and cqlsh is using thrift or vice
versa and that's why depending on where you access from you see different
things?

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 1 October 2015 at 13:06, Walsh, Stephen  wrote:

> No such thing as a stupid questionJ
>
> I know they exist in some nodes, but if they replicated correctly is a
> different story.
>
> I’m  checking this one now,
>
>
>
> Ok, hooked up OpsCenter to see what it was saying,
>
> Out of the 100 keyspaces creted,
>
> 9 are missing one CF
>
> 2 are missing two CF’s
>
> 1 is missing three CF’s
>
>
>
> It looks like the replication of the tables did not complete to all nodes?
>
>
>
> Looking at each of the 4 nodes at the keyspace with 3 missing CF’s
>
> (via CQLSH_HOST=x.x.x.x cqlsh & “Describe keyspace XXX;”)
>
>
>
> Node 1 : has all CF’s
>
> Node 2 : has all CF’s
>
> Node 3 : has all CF’s
>
> Node 4 : has all CF’s
>
>
>
>
>
> This is indeed very strange….
>
>
>
>
>
> *From:* Carlos Alonso [mailto:i...@mrcalonso.com]
> *Sent:* 01 October 2015 12:05
> *To:* user@cassandra.apache.org
> *Subject:* Re: Consistency Issues
>
>
>
> And that's a stupid one, I know, but does the column you're trying to
> access actually exist?
>
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
>
>
> On 1 October 2015 at 11:09, Walsh, Stephen 
> wrote:
>
> I did think of that and they are all the same version J
>
>
>
>
>
> *From:* Carlos Alonso [mailto:i...@mrcalonso.com]
> *Sent:* 01 October 2015 10:11
>
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: Consistency Issues
>
>
>
> Hi Stephen.
>
>
>
> The UnknownColumnFamilyException made me thought of a possible schema
> disagreement in which any of your nodes has a different version and
> therefore you cannot reach quorum?
>
>
>
> Can you run nodetool describecluster and see if all nodes have the same
> schema versions?
>
>
>
> Cheers!
>
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
>
>
> On 1 October 2015 at 09:49, Walsh, Stephen 
> wrote:
>
> If you’re looking for the clean-up of the old gen in the jvm heap, it
> doesn’t happen.
>
> We have a new gen turning 15 times before its pushed to old gen.
>
> Seems all our data only has a TTL of 10 seconds – very little data is sent
> to the old gen.
>
>
>
> Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is
> our issue.
>
>
>
>
>
> I’m more worried about error messages in the Cassandra log file that state.
>
>
>
>
>
> UnknownColumnFamilyException reading from socket; closing
>
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
> cfId=cf411b50-6785-11e5-a435-e7be20c92086
>
>
>
> and
>
>
>
> cassandra OutboundTcpConnection.java:313 - error writing to Connection.
>
>
>
>
>
>
>
> But I really need to understand this best practice that was mentioned (on
> number of CF’s) by Jack Krupansky.
>
> Anyone more information on this?
>
>
>
>
>
> Many thanks for all your help guys keep it coming J
>
> Steve
>
>
>
> *From:* Ricardo Sancho [mailto:sancho.rica...@gmail.com]
> *Sent:* 01 October 2015 09:39
> *To:* user@cassandra.apache.org
> *Subject:* RE: Consistency Issues
>
>
>
> Can you tell us how much time your gcs are taking?
> Do you see any especially long ones?
>
> On 1 Oct 2015 09:37, "Walsh, Stephen"  wrote:
>
> There is no load balancer in front of Cassandra,  it’s in front of our
> application.
>
> Everyone seems hung up on this point? But it’s not the root causing of the
> inconsistency issue.
>
>
>
> Can anyone verify the best practice for number of CF’s?
>
>
>
>
>
> *From:* Robert Coli [mailto:rc...@eventbrite.com]
> *Sent:* 30 September 2015 18:45
> *To:* user@cassandra.apache.org
> *Subject:* Re: Consistency Issues
>
>
>
> On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen 
> wrote:
>
>
>
> We never had these issue with our first run. Its only when we added
> another 25% of writes.
>
>
>
> As Jack said, you are probably pushing your GC over a threshold, leading
> to long pause times and inability to meet quorum.
>
>
>
> As Sebastian said, you probably shouldn't need a load balancer in front of
> Cassandra.
>
>
>
> =Rob
>
>
>
> 

RE: Consistency Issues

2015-10-01 Thread Walsh, Stephen
No such thing as a stupid question☺
I know they exist in some nodes, but if they replicated correctly is a 
different story.
I’m  checking this one now,

Ok, hooked up OpsCenter to see what it was saying,
Out of the 100 keyspaces creted,
9 are missing one CF
2 are missing two CF’s
1 is missing three CF’s

It looks like the replication of the tables did not complete to all nodes?

Looking at each of the 4 nodes at the keyspace with 3 missing CF’s
(via CQLSH_HOST=x.x.x.x cqlsh & “Describe keyspace XXX;”)

Node 1 : has all CF’s
Node 2 : has all CF’s
Node 3 : has all CF’s
Node 4 : has all CF’s


This is indeed very strange….


From: Carlos Alonso [mailto:i...@mrcalonso.com]
Sent: 01 October 2015 12:05
To: user@cassandra.apache.org
Subject: Re: Consistency Issues

And that's a stupid one, I know, but does the column you're trying to access 
actually exist?

Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso>

On 1 October 2015 at 11:09, Walsh, Stephen 
mailto:stephen.wa...@aspect.com>> wrote:
I did think of that and they are all the same version ☺


From: Carlos Alonso [mailto:i...@mrcalonso.com<mailto:i...@mrcalonso.com>]
Sent: 01 October 2015 10:11

To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Consistency Issues

Hi Stephen.

The UnknownColumnFamilyException made me thought of a possible schema 
disagreement in which any of your nodes has a different version and therefore 
you cannot reach quorum?

Can you run nodetool describecluster and see if all nodes have the same schema 
versions?

Cheers!

Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso>

On 1 October 2015 at 09:49, Walsh, Stephen 
mailto:stephen.wa...@aspect.com>> wrote:
If you’re looking for the clean-up of the old gen in the jvm heap, it doesn’t 
happen.
We have a new gen turning 15 times before its pushed to old gen.
Seems all our data only has a TTL of 10 seconds – very little data is sent to 
the old gen.

Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is our 
issue.


I’m more worried about error messages in the Cassandra log file that state.


UnknownColumnFamilyException reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
cfId=cf411b50-6785-11e5-a435-e7be20c92086

and

cassandra OutboundTcpConnection.java:313 - error writing to Connection.



But I really need to understand this best practice that was mentioned (on 
number of CF’s) by Jack Krupansky.
Anyone more information on this?


Many thanks for all your help guys keep it coming ☺
Steve

From: Ricardo Sancho 
[mailto:sancho.rica...@gmail.com<mailto:sancho.rica...@gmail.com>]
Sent: 01 October 2015 09:39
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: Consistency Issues


Can you tell us how much time your gcs are taking?
Do you see any especially long ones?
On 1 Oct 2015 09:37, "Walsh, Stephen" 
mailto:stephen.wa...@aspect.com>> wrote:
There is no load balancer in front of Cassandra,  it’s in front of our 
application.
Everyone seems hung up on this point? But it’s not the root causing of the 
inconsistency issue.

Can anyone verify the best practice for number of CF’s?


From: Robert Coli [mailto:rc...@eventbrite.com<mailto:rc...@eventbrite.com>]
Sent: 30 September 2015 18:45
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Consistency Issues

On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen 
mailto:stephen.wa...@aspect.com>> wrote:

We never had these issue with our first run. Its only when we added another 25% 
of writes.

As Jack said, you are probably pushing your GC over a threshold, leading to 
long pause times and inability to meet quorum.

As Sebastian said, you probably shouldn't need a load balancer in front of 
Cassandra.

=Rob

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete

Re: Consistency Issues

2015-10-01 Thread Carlos Alonso
And that's a stupid one, I know, but does the column you're trying to
access actually exist?

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 1 October 2015 at 11:09, Walsh, Stephen  wrote:

> I did think of that and they are all the same version J
>
>
>
>
>
> *From:* Carlos Alonso [mailto:i...@mrcalonso.com]
> *Sent:* 01 October 2015 10:11
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: Consistency Issues
>
>
>
> Hi Stephen.
>
>
>
> The UnknownColumnFamilyException made me thought of a possible schema
> disagreement in which any of your nodes has a different version and
> therefore you cannot reach quorum?
>
>
>
> Can you run nodetool describecluster and see if all nodes have the same
> schema versions?
>
>
>
> Cheers!
>
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
>
>
> On 1 October 2015 at 09:49, Walsh, Stephen 
> wrote:
>
> If you’re looking for the clean-up of the old gen in the jvm heap, it
> doesn’t happen.
>
> We have a new gen turning 15 times before its pushed to old gen.
>
> Seems all our data only has a TTL of 10 seconds – very little data is sent
> to the old gen.
>
>
>
> Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is
> our issue.
>
>
>
>
>
> I’m more worried about error messages in the Cassandra log file that state.
>
>
>
>
>
> UnknownColumnFamilyException reading from socket; closing
>
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
> cfId=cf411b50-6785-11e5-a435-e7be20c92086
>
>
>
> and
>
>
>
> cassandra OutboundTcpConnection.java:313 - error writing to Connection.
>
>
>
>
>
>
>
> But I really need to understand this best practice that was mentioned (on
> number of CF’s) by Jack Krupansky.
>
> Anyone more information on this?
>
>
>
>
>
> Many thanks for all your help guys keep it coming J
>
> Steve
>
>
>
> *From:* Ricardo Sancho [mailto:sancho.rica...@gmail.com]
> *Sent:* 01 October 2015 09:39
> *To:* user@cassandra.apache.org
> *Subject:* RE: Consistency Issues
>
>
>
> Can you tell us how much time your gcs are taking?
> Do you see any especially long ones?
>
> On 1 Oct 2015 09:37, "Walsh, Stephen"  wrote:
>
> There is no load balancer in front of Cassandra,  it’s in front of our
> application.
>
> Everyone seems hung up on this point? But it’s not the root causing of the
> inconsistency issue.
>
>
>
> Can anyone verify the best practice for number of CF’s?
>
>
>
>
>
> *From:* Robert Coli [mailto:rc...@eventbrite.com]
> *Sent:* 30 September 2015 18:45
> *To:* user@cassandra.apache.org
> *Subject:* Re: Consistency Issues
>
>
>
> On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen 
> wrote:
>
>
>
> We never had these issue with our first run. Its only when we added
> another 25% of writes.
>
>
>
> As Jack said, you are probably pushing your GC over a threshold, leading
> to long pause times and inability to meet quorum.
>
>
>
> As Sebastian said, you probably shouldn't need a load balancer in front of
> Cassandra.
>
>
>
> =Rob
>
>
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>


RE: Consistency Issues

2015-10-01 Thread Walsh, Stephen
I did think of that and they are all the same version ☺


From: Carlos Alonso [mailto:i...@mrcalonso.com]
Sent: 01 October 2015 10:11
To: user@cassandra.apache.org
Subject: Re: Consistency Issues

Hi Stephen.

The UnknownColumnFamilyException made me thought of a possible schema 
disagreement in which any of your nodes has a different version and therefore 
you cannot reach quorum?

Can you run nodetool describecluster and see if all nodes have the same schema 
versions?

Cheers!

Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso>

On 1 October 2015 at 09:49, Walsh, Stephen 
mailto:stephen.wa...@aspect.com>> wrote:
If you’re looking for the clean-up of the old gen in the jvm heap, it doesn’t 
happen.
We have a new gen turning 15 times before its pushed to old gen.
Seems all our data only has a TTL of 10 seconds – very little data is sent to 
the old gen.

Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is our 
issue.


I’m more worried about error messages in the Cassandra log file that state.


UnknownColumnFamilyException reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
cfId=cf411b50-6785-11e5-a435-e7be20c92086

and

cassandra OutboundTcpConnection.java:313 - error writing to Connection.



But I really need to understand this best practice that was mentioned (on 
number of CF’s) by Jack Krupansky.
Anyone more information on this?


Many thanks for all your help guys keep it coming ☺
Steve

From: Ricardo Sancho 
[mailto:sancho.rica...@gmail.com<mailto:sancho.rica...@gmail.com>]
Sent: 01 October 2015 09:39
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: Consistency Issues


Can you tell us how much time your gcs are taking?
Do you see any especially long ones?
On 1 Oct 2015 09:37, "Walsh, Stephen" 
mailto:stephen.wa...@aspect.com>> wrote:
There is no load balancer in front of Cassandra,  it’s in front of our 
application.
Everyone seems hung up on this point? But it’s not the root causing of the 
inconsistency issue.

Can anyone verify the best practice for number of CF’s?


From: Robert Coli [mailto:rc...@eventbrite.com<mailto:rc...@eventbrite.com>]
Sent: 30 September 2015 18:45
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Consistency Issues

On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen 
mailto:stephen.wa...@aspect.com>> wrote:

We never had these issue with our first run. Its only when we added another 25% 
of writes.

As Jack said, you are probably pushing your GC over a threshold, leading to 
long pause times and inability to meet quorum.

As Sebastian said, you probably shouldn't need a load balancer in front of 
Cassandra.

=Rob

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


Re: Consistency Issues

2015-10-01 Thread Carlos Alonso
Hi Stephen.

The UnknownColumnFamilyException made me thought of a possible schema
disagreement in which any of your nodes has a different version and
therefore you cannot reach quorum?

Can you run nodetool describecluster and see if all nodes have the same
schema versions?

Cheers!

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 1 October 2015 at 09:49, Walsh, Stephen  wrote:

> If you’re looking for the clean-up of the old gen in the jvm heap, it
> doesn’t happen.
>
> We have a new gen turning 15 times before its pushed to old gen.
>
> Seems all our data only has a TTL of 10 seconds – very little data is sent
> to the old gen.
>
>
>
> Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is
> our issue.
>
>
>
>
>
> I’m more worried about error messages in the Cassandra log file that state.
>
>
>
>
>
> UnknownColumnFamilyException reading from socket; closing
>
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
> cfId=cf411b50-6785-11e5-a435-e7be20c92086
>
>
>
> and
>
>
>
> cassandra OutboundTcpConnection.java:313 - error writing to Connection.
>
>
>
>
>
>
>
> But I really need to understand this best practice that was mentioned (on
> number of CF’s) by Jack Krupansky.
>
> Anyone more information on this?
>
>
>
>
>
> Many thanks for all your help guys keep it coming J
>
> Steve
>
>
>
> *From:* Ricardo Sancho [mailto:sancho.rica...@gmail.com]
> *Sent:* 01 October 2015 09:39
> *To:* user@cassandra.apache.org
> *Subject:* RE: Consistency Issues
>
>
>
> Can you tell us how much time your gcs are taking?
> Do you see any especially long ones?
>
> On 1 Oct 2015 09:37, "Walsh, Stephen"  wrote:
>
> There is no load balancer in front of Cassandra,  it’s in front of our
> application.
>
> Everyone seems hung up on this point? But it’s not the root causing of the
> inconsistency issue.
>
>
>
> Can anyone verify the best practice for number of CF’s?
>
>
>
>
>
> *From:* Robert Coli [mailto:rc...@eventbrite.com]
> *Sent:* 30 September 2015 18:45
> *To:* user@cassandra.apache.org
> *Subject:* Re: Consistency Issues
>
>
>
> On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen 
> wrote:
>
>
>
> We never had these issue with our first run. Its only when we added
> another 25% of writes.
>
>
>
> As Jack said, you are probably pushing your GC over a threshold, leading
> to long pause times and inability to meet quorum.
>
>
>
> As Sebastian said, you probably shouldn't need a load balancer in front of
> Cassandra.
>
>
>
> =Rob
>
>
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>


RE: Consistency Issues

2015-10-01 Thread Walsh, Stephen
If you’re looking for the clean-up of the old gen in the jvm heap, it doesn’t 
happen.
We have a new gen turning 15 times before its pushed to old gen.
Seems all our data only has a TTL of 10 seconds – very little data is sent to 
the old gen.

Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is our 
issue.


I’m more worried about error messages in the Cassandra log file that state.


UnknownColumnFamilyException reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
cfId=cf411b50-6785-11e5-a435-e7be20c92086

and

cassandra OutboundTcpConnection.java:313 - error writing to Connection.



But I really need to understand this best practice that was mentioned (on 
number of CF’s) by Jack Krupansky.
Anyone more information on this?


Many thanks for all your help guys keep it coming ☺
Steve

From: Ricardo Sancho [mailto:sancho.rica...@gmail.com]
Sent: 01 October 2015 09:39
To: user@cassandra.apache.org
Subject: RE: Consistency Issues


Can you tell us how much time your gcs are taking?
Do you see any especially long ones?
On 1 Oct 2015 09:37, "Walsh, Stephen" 
mailto:stephen.wa...@aspect.com>> wrote:
There is no load balancer in front of Cassandra,  it’s in front of our 
application.
Everyone seems hung up on this point? But it’s not the root causing of the 
inconsistency issue.

Can anyone verify the best practice for number of CF’s?


From: Robert Coli [mailto:rc...@eventbrite.com<mailto:rc...@eventbrite.com>]
Sent: 30 September 2015 18:45
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Consistency Issues

On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen 
mailto:stephen.wa...@aspect.com>> wrote:

We never had these issue with our first run. Its only when we added another 25% 
of writes.

As Jack said, you are probably pushing your GC over a threshold, leading to 
long pause times and inability to meet quorum.

As Sebastian said, you probably shouldn't need a load balancer in front of 
Cassandra.

=Rob

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Consistency Issues

2015-10-01 Thread Ricardo Sancho
Can you tell us how much time your gcs are taking?
Do you see any especially long ones?
On 1 Oct 2015 09:37, "Walsh, Stephen"  wrote:

> There is no load balancer in front of Cassandra,  it’s in front of our
> application.
>
> Everyone seems hung up on this point? But it’s not the root causing of the
> inconsistency issue.
>
>
>
> Can anyone verify the best practice for number of CF’s?
>
>
>
>
>
> *From:* Robert Coli [mailto:rc...@eventbrite.com]
> *Sent:* 30 September 2015 18:45
> *To:* user@cassandra.apache.org
> *Subject:* Re: Consistency Issues
>
>
>
> On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen 
> wrote:
>
>
>
> We never had these issue with our first run. Its only when we added
> another 25% of writes.
>
>
>
> As Jack said, you are probably pushing your GC over a threshold, leading
> to long pause times and inability to meet quorum.
>
>
>
> As Sebastian said, you probably shouldn't need a load balancer in front of
> Cassandra.
>
>
>
> =Rob
>
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>


RE: Consistency Issues

2015-10-01 Thread Walsh, Stephen
There is no load balancer in front of Cassandra,  it’s in front of our 
application.
Everyone seems hung up on this point? But it’s not the root causing of the 
inconsistency issue.

Can anyone verify the best practice for number of CF’s?


From: Robert Coli [mailto:rc...@eventbrite.com]
Sent: 30 September 2015 18:45
To: user@cassandra.apache.org
Subject: Re: Consistency Issues

On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen 
mailto:stephen.wa...@aspect.com>> wrote:

We never had these issue with our first run. Its only when we added another 25% 
of writes.

As Jack said, you are probably pushing your GC over a threshold, leading to 
long pause times and inability to meet quorum.

As Sebastian said, you probably shouldn't need a load balancer in front of 
Cassandra.

=Rob

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


Re: Consistency Issues

2015-09-30 Thread Robert Coli
On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen 
wrote:

>
> We never had these issue with our first run. Its only when we added
> another 25% of writes.
>

As Jack said, you are probably pushing your GC over a threshold, leading to
long pause times and inability to meet quorum.

As Sebastian said, you probably shouldn't need a load balancer in front of
Cassandra.

=Rob


RE: Consistency Issues

2015-09-30 Thread Walsh, Stephen
Many thanks for your reply Sebastian,
But the load balancer is being used with our applications, not with Cassandra.
It just allows use to increase the through-put to Cassandra

Our Generation Tool  -> Load Balancer -> Our Processing Application -> Cassandra

Sebastian, is jack correct in best practices for the number of CF’s?


From: Sebastian Estevez [mailto:sebastian.este...@datastax.com]
Sent: 30 September 2015 17:29
To: user@cassandra.apache.org
Subject: Re: Consistency Issues

Can you provide exact details on where your load balancer is? Like Michael 
said, you shouldn't need one between your client and the c* cluster if you're 
using a DataStax driver.


All the best,



[datastax_logo.png]<http://www.datastax.com/>

Sebastián Estévez

Solutions Architect | 954 905 8615 | 
sebastian.este...@datastax.com<mailto:sebastian.este...@datastax.com>

[linkedin.png]<https://www.linkedin.com/company/datastax>[facebook.png]<https://www.facebook.com/datastax>[twitter.png]<https://twitter.com/datastax>[g+.png]<https://plus.google.com/+Datastax/about>[https://lh6.googleusercontent.com/24_538J0j5M0NHQx-jkRiV_IHrhsh-98hpi--Qz9b0-I4llvWuYI6LgiVJsul0AhxL0gMTOHgw3G0SvIXaT2C7fsKKa_DdQ2uOJ-bQ6h_mQ7k7iMybcR1dr1VhWgLMxcmg]<http://feeds.feedburner.com/datastax>

[http://datastax.com/images/Summit_Email.png]<http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>

DataStax is the fastest, most scalable distributed database technology, 
delivering Apache Cassandra to the world’s most innovative enterprises. 
Datastax is built to be agile, always-on, and predictably scalable to any size. 
With more than 500 customers in 45 countries, DataStax is the database 
technology and transactional backbone of choice for the worlds most innovative 
companies such as Netflix, Adobe, Intuit, and eBay.

On Wed, Sep 30, 2015 at 12:06 PM, Walsh, Stephen 
mailto:stephen.wa...@aspect.com>> wrote:
Many thanks all,

The Load balancers are only between our own node and not as a middle-man to 
Cassandra. It’s just so we can push more data into Cassandra.
The only reason we are not using 2.1.9 is time , we haven’t had time to test 
upgrades.

I wasn’t able to find any best practices for number of CF, where do you see 
this documented?
I see a lot of comments on 1,000 CF’s Vs 1,000 key spaces.

Errors around a few times a second, about 10 or so.
They are constant.

Our TTL is 10 seconds on data with gc_grace_seconds set to 0 on each CF.
We don’t seem to get any OOM errors.

We never had these issue with our first run. Its only when we added another 25% 
of writes.

Many thanks for taking the time to reply Jack



From: Jack Krupansky 
[mailto:jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>]
Sent: 30 September 2015 16:53
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Consistency Issues

More than "low hundreds" (200 or 300 max, and preferably under 100) of 
tables/column families is not exactly a recommended best practice. You may be 
able to get it to work, but probably only with very heavy tuning (i.e., lots of 
time and playing with options) on your own part. IOW, no quick and easy 
solution.

The only immediate issue that pops to mind is that you are hitting a GC pause 
due to the large heap size and high volume.

How frequent are these errors occurring? Like, how much data can you load 
before the first one pops up, and are they then frequent/constant or just 
occasionally/rarely?

Can you test to see if you can see similar timeouts with say only 100 or 50 
tables? At least that might isolate whether the issue relates at all to the 
number of tables vs. raw data rate or GC pause.

Sometimes you can reduce/eliminate the GC pause issue by reducing the heap so 
that it is only modestly above the minimum required to avoid OOM.


-- Jack Krupansky

On Wed, Sep 30, 2015 at 11:22 AM, Walsh, Stephen 
mailto:stephen.wa...@aspect.com>> wrote:
More information,

I’ve just setup a NTP server to rule out any timing issues.
And I also see this in the Cassandra node log files

MessagingService-Incoming-/172.31.22.4<http://172.31.22.4>] 2015-09-30 
15:19:14,769 IncomingTcpConnection.java:97 - UnknownColumnFamilyException 
reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
cfId=cf411b50-6785-11e5-a435-e7be20c92086

Any idea what this is related too?
All these tests are run with a clean setup of Cassandra  nodes followed by a 
nodetool repair.
Before any data hits them.


From: Walsh, Stephen 
[mailto:stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>]
Sent: 30 September 2015 15:17
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Consistency Issues

Hi there,

We are having some issues with consistency. I’ll try my best to explain.

We have an application that was 

Re: Consistency Issues

2015-09-30 Thread Sebastian Estevez
Can you provide exact details on where your load balancer is? Like Michael
said, you shouldn't need one between your client and the c* cluster if
you're using a DataStax driver.

All the best,


[image: datastax_logo.png] <http://www.datastax.com/>

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax>

<http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Wed, Sep 30, 2015 at 12:06 PM, Walsh, Stephen 
wrote:

> Many thanks all,
>
>
>
> The Load balancers are only between our own node and not as a middle-man
> to Cassandra. It’s just so we can push more data into Cassandra.
>
> The only reason we are not using 2.1.9 is time , we haven’t had time to
> test upgrades.
>
>
>
> I wasn’t able to find any best practices for number of CF, where do you
> see this documented?
>
> I see a lot of comments on 1,000 CF’s Vs 1,000 key spaces.
>
>
>
> Errors around a few times a second, about 10 or so.
>
> They are constant.
>
>
>
> Our TTL is 10 seconds on data with gc_grace_seconds set to 0 on each CF.
>
> We don’t seem to get any OOM errors.
>
>
>
> We never had these issue with our first run. Its only when we added
> another 25% of writes.
>
>
>
> Many thanks for taking the time to reply Jack
>
>
>
>
>
>
>
> *From:* Jack Krupansky [mailto:jack.krupan...@gmail.com]
> *Sent:* 30 September 2015 16:53
> *To:* user@cassandra.apache.org
> *Subject:* Re: Consistency Issues
>
>
>
> More than "low hundreds" (200 or 300 max, and preferably under 100) of
> tables/column families is not exactly a recommended best practice. You may
> be able to get it to work, but probably only with very heavy tuning (i.e.,
> lots of time and playing with options) on your own part. IOW, no quick and
> easy solution.
>
>
>
> The only immediate issue that pops to mind is that you are hitting a GC
> pause due to the large heap size and high volume.
>
>
>
> How frequent are these errors occurring? Like, how much data can you load
> before the first one pops up, and are they then frequent/constant or just
> occasionally/rarely?
>
>
>
> Can you test to see if you can see similar timeouts with say only 100 or
> 50 tables? At least that might isolate whether the issue relates at all to
> the number of tables vs. raw data rate or GC pause.
>
>
>
> Sometimes you can reduce/eliminate the GC pause issue by reducing the heap
> so that it is only modestly above the minimum required to avoid OOM.
>
>
>
>
> -- Jack Krupansky
>
>
>
> On Wed, Sep 30, 2015 at 11:22 AM, Walsh, Stephen 
> wrote:
>
> More information,
>
>
>
> I’ve just setup a NTP server to rule out any timing issues.
>
> And I also see this in the Cassandra node log files
>
>
>
> MessagingService-Incoming-/172.31.22.4] 2015-09-30 15:19:14,769
> IncomingTcpConnection.java:97 - UnknownColumnFamilyException reading from
> socket; closing
>
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
> cfId=cf411b50-6785-11e5-a435-e7be20c92086
>
>
>
> Any idea what this is related too?
>
> All these tests are run with a clean setup of Cassandra  nodes followed by
> a nodetool repair.
>
> Before any data hits them.
>
>
>
>
>
> *From:* Walsh, Stephen [mailto:stephen.wa...@aspect.com]
> *Sent:* 30 September 2015 15:17
> *To:* user@cassandra.apache.org
> *Subject:* Consistency Issues
>
>
>
> Hi there,
>
>
>
> We are having some issues with consistency. I’ll try my best to explain.
>
>
>
> We have an application that was able to
>
> Write ~1000 p/s
>
> Read ~300 p/s
>
> Total CF created: 400
>
> Total Keyspaces created : 80
>
>
>
> On a 4 node Cassandra Cluster with
>
> Version 2.1.6
>
> Replication : 3
>
> Consistency  (Read & Write) : LOCAL_QUORUM
>
> Cores : 4
>
> Ram : 15 GB
&g

RE: Consistency Issues

2015-09-30 Thread Walsh, Stephen
Many thanks all,

The Load balancers are only between our own node and not as a middle-man to 
Cassandra. It’s just so we can push more data into Cassandra.
The only reason we are not using 2.1.9 is time , we haven’t had time to test 
upgrades.

I wasn’t able to find any best practices for number of CF, where do you see 
this documented?
I see a lot of comments on 1,000 CF’s Vs 1,000 key spaces.

Errors around a few times a second, about 10 or so.
They are constant.

Our TTL is 10 seconds on data with gc_grace_seconds set to 0 on each CF.
We don’t seem to get any OOM errors.

We never had these issue with our first run. Its only when we added another 25% 
of writes.

Many thanks for taking the time to reply Jack



From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
Sent: 30 September 2015 16:53
To: user@cassandra.apache.org
Subject: Re: Consistency Issues

More than "low hundreds" (200 or 300 max, and preferably under 100) of 
tables/column families is not exactly a recommended best practice. You may be 
able to get it to work, but probably only with very heavy tuning (i.e., lots of 
time and playing with options) on your own part. IOW, no quick and easy 
solution.

The only immediate issue that pops to mind is that you are hitting a GC pause 
due to the large heap size and high volume.

How frequent are these errors occurring? Like, how much data can you load 
before the first one pops up, and are they then frequent/constant or just 
occasionally/rarely?

Can you test to see if you can see similar timeouts with say only 100 or 50 
tables? At least that might isolate whether the issue relates at all to the 
number of tables vs. raw data rate or GC pause.

Sometimes you can reduce/eliminate the GC pause issue by reducing the heap so 
that it is only modestly above the minimum required to avoid OOM.


-- Jack Krupansky

On Wed, Sep 30, 2015 at 11:22 AM, Walsh, Stephen 
mailto:stephen.wa...@aspect.com>> wrote:
More information,

I’ve just setup a NTP server to rule out any timing issues.
And I also see this in the Cassandra node log files

MessagingService-Incoming-/172.31.22.4<http://172.31.22.4>] 2015-09-30 
15:19:14,769 IncomingTcpConnection.java:97 - UnknownColumnFamilyException 
reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
cfId=cf411b50-6785-11e5-a435-e7be20c92086

Any idea what this is related too?
All these tests are run with a clean setup of Cassandra  nodes followed by a 
nodetool repair.
Before any data hits them.


From: Walsh, Stephen 
[mailto:stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>]
Sent: 30 September 2015 15:17
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Consistency Issues

Hi there,

We are having some issues with consistency. I’ll try my best to explain.

We have an application that was able to
Write ~1000 p/s
Read ~300 p/s
Total CF created: 400
Total Keyspaces created : 80

On a 4 node Cassandra Cluster with
Version 2.1.6
Replication : 3
Consistency  (Read & Write) : LOCAL_QUORUM
Cores : 4
Ram : 15 GB
Heap Size 8GB

This was fine and worked, but was pushing our application to the max.

-

Next we added a load balancer (HaProxy) to our application.
So now we have 3 of our nodes talking to 4 Cassandra Nodes with a load of
Write ~1250 p/s
Read 0p/s
Total CF created: 450
Total Keyspaces created : 100

On our application we now see
Cassandra timeout during write query at consistency LOCAL_QUORUM (2 replica 
were required but only 1 acknowledged the write)
(we are using java Cassandra driver 2.1.6)

So we increased the number of Cassandra nodes
To 5, then 6  and each time got the same replication error.

So then we double the spec of every node to
8 cores
30GB  RAM
Heap size 15GB

And we still get this replication error (2 replica were required but only 1 
acknowledged the write)

We know that when we introduce HaProxy Load balancer with 3 of our nodes that 
its hits Cassandra 3 times quicker.
But we’ve now increased the Cassandra spec nearly 3 fold, and only for an extra 
250 writes p/s and it still doesn’t work.

We’re having a hard time finding out why replication is an issue with the size 
of a cluster.

We tried to get OpsCenter working to monitor the nodes, but due to the amount 
of CF’s in Cassandra the datastax-agent takes 90% of the CPU on every node.

Any suggestion / recommendation would be very welcome.

Regards
Stephen Walsh



This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information

Re: Consistency Issues

2015-09-30 Thread Jack Krupansky
More than "low hundreds" (200 or 300 max, and preferably under 100) of
tables/column families is not exactly a recommended best practice. You may
be able to get it to work, but probably only with very heavy tuning (i.e.,
lots of time and playing with options) on your own part. IOW, no quick and
easy solution.

The only immediate issue that pops to mind is that you are hitting a GC
pause due to the large heap size and high volume.

How frequent are these errors occurring? Like, how much data can you load
before the first one pops up, and are they then frequent/constant or just
occasionally/rarely?

Can you test to see if you can see similar timeouts with say only 100 or 50
tables? At least that might isolate whether the issue relates at all to the
number of tables vs. raw data rate or GC pause.

Sometimes you can reduce/eliminate the GC pause issue by reducing the heap
so that it is only modestly above the minimum required to avoid OOM.


-- Jack Krupansky

On Wed, Sep 30, 2015 at 11:22 AM, Walsh, Stephen 
wrote:

> More information,
>
>
>
> I’ve just setup a NTP server to rule out any timing issues.
>
> And I also see this in the Cassandra node log files
>
>
>
> MessagingService-Incoming-/172.31.22.4] 2015-09-30 15:19:14,769
> IncomingTcpConnection.java:97 - UnknownColumnFamilyException reading from
> socket; closing
>
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
> cfId=cf411b50-6785-11e5-a435-e7be20c92086
>
>
>
> Any idea what this is related too?
>
> All these tests are run with a clean setup of Cassandra  nodes followed by
> a nodetool repair.
>
> Before any data hits them.
>
>
>
>
>
> *From:* Walsh, Stephen [mailto:stephen.wa...@aspect.com]
> *Sent:* 30 September 2015 15:17
> *To:* user@cassandra.apache.org
> *Subject:* Consistency Issues
>
>
>
> Hi there,
>
>
>
> We are having some issues with consistency. I’ll try my best to explain.
>
>
>
> We have an application that was able to
>
> Write ~1000 p/s
>
> Read ~300 p/s
>
> Total CF created: 400
>
> Total Keyspaces created : 80
>
>
>
> On a 4 node Cassandra Cluster with
>
> Version 2.1.6
>
> Replication : 3
>
> Consistency  (Read & Write) : LOCAL_QUORUM
>
> Cores : 4
>
> Ram : 15 GB
>
> Heap Size 8GB
>
>
>
> This was fine and worked, but was pushing our application to the max.
>
>
>
> -
>
>
>
> Next we added a load balancer (HaProxy) to our application.
>
> So now we have 3 of our nodes talking to 4 Cassandra Nodes with a load of
>
> Write ~1250 p/s
>
> Read 0p/s
>
> Total CF created: 450
>
> Total Keyspaces created : 100
>
>
>
> On our application we now see
>
> Cassandra timeout during write query at consistency LOCAL_QUORUM (2
> replica were required but only 1 acknowledged the write)
>
> (we are using java Cassandra driver 2.1.6)
>
>
>
> So we increased the number of Cassandra nodes
>
> To 5, then 6  and each time got the same replication error.
>
>
>
> So then we double the spec of every node to
>
> 8 cores
>
> 30GB  RAM
>
> Heap size 15GB
>
>
>
> And we still get this replication error (2 replica were required but only
> 1 acknowledged the write)
>
>
>
> We know that when we introduce HaProxy Load balancer with 3 of our nodes
> that its hits Cassandra 3 times quicker.
>
> But we’ve now increased the Cassandra spec nearly 3 fold, and only for an
> extra 250 writes p/s and it still doesn’t work.
>
>
>
> We’re having a hard time finding out why replication is an issue with the
> size of a cluster.
>
>
>
> We tried to get OpsCenter working to monitor the nodes, but due to the
> amount of CF’s in Cassandra the datastax-agent takes 90% of the CPU on
> every node.
>
>
>
> Any suggestion / recommendation would be very welcome.
>
>
>
> Regards
>
> Stephen Walsh
>
>
>
>
>
>
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>


Re: Consistency Issues

2015-09-30 Thread Laing, Michael
What client are you using?

Official java and python clients should not have a LB between them and the
C* nodes AFAIK.

Why aren't you using 2.1.9?

Have you checked for schema agreement amongst all nodes?

ml

On Wed, Sep 30, 2015 at 11:22 AM, Walsh, Stephen 
wrote:

> More information,
>
>
>
> I’ve just setup a NTP server to rule out any timing issues.
>
> And I also see this in the Cassandra node log files
>
>
>
> MessagingService-Incoming-/172.31.22.4] 2015-09-30 15:19:14,769
> IncomingTcpConnection.java:97 - UnknownColumnFamilyException reading from
> socket; closing
>
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
> cfId=cf411b50-6785-11e5-a435-e7be20c92086
>
>
>
> Any idea what this is related too?
>
> All these tests are run with a clean setup of Cassandra  nodes followed by
> a nodetool repair.
>
> Before any data hits them.
>
>
>
>
>
> *From:* Walsh, Stephen [mailto:stephen.wa...@aspect.com]
> *Sent:* 30 September 2015 15:17
> *To:* user@cassandra.apache.org
> *Subject:* Consistency Issues
>
>
>
> Hi there,
>
>
>
> We are having some issues with consistency. I’ll try my best to explain.
>
>
>
> We have an application that was able to
>
> Write ~1000 p/s
>
> Read ~300 p/s
>
> Total CF created: 400
>
> Total Keyspaces created : 80
>
>
>
> On a 4 node Cassandra Cluster with
>
> Version 2.1.6
>
> Replication : 3
>
> Consistency  (Read & Write) : LOCAL_QUORUM
>
> Cores : 4
>
> Ram : 15 GB
>
> Heap Size 8GB
>
>
>
> This was fine and worked, but was pushing our application to the max.
>
>
>
> -
>
>
>
> Next we added a load balancer (HaProxy) to our application.
>
> So now we have 3 of our nodes talking to 4 Cassandra Nodes with a load of
>
> Write ~1250 p/s
>
> Read 0p/s
>
> Total CF created: 450
>
> Total Keyspaces created : 100
>
>
>
> On our application we now see
>
> Cassandra timeout during write query at consistency LOCAL_QUORUM (2
> replica were required but only 1 acknowledged the write)
>
> (we are using java Cassandra driver 2.1.6)
>
>
>
> So we increased the number of Cassandra nodes
>
> To 5, then 6  and each time got the same replication error.
>
>
>
> So then we double the spec of every node to
>
> 8 cores
>
> 30GB  RAM
>
> Heap size 15GB
>
>
>
> And we still get this replication error (2 replica were required but only
> 1 acknowledged the write)
>
>
>
> We know that when we introduce HaProxy Load balancer with 3 of our nodes
> that its hits Cassandra 3 times quicker.
>
> But we’ve now increased the Cassandra spec nearly 3 fold, and only for an
> extra 250 writes p/s and it still doesn’t work.
>
>
>
> We’re having a hard time finding out why replication is an issue with the
> size of a cluster.
>
>
>
> We tried to get OpsCenter working to monitor the nodes, but due to the
> amount of CF’s in Cassandra the datastax-agent takes 90% of the CPU on
> every node.
>
>
>
> Any suggestion / recommendation would be very welcome.
>
>
>
> Regards
>
> Stephen Walsh
>
>
>
>
>
>
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>


RE: Consistency Issues

2015-09-30 Thread Walsh, Stephen
More information,

I've just setup a NTP server to rule out any timing issues.
And I also see this in the Cassandra node log files

MessagingService-Incoming-/172.31.22.4] 2015-09-30 15:19:14,769 
IncomingTcpConnection.java:97 - UnknownColumnFamilyException reading from 
socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
cfId=cf411b50-6785-11e5-a435-e7be20c92086

Any idea what this is related too?
All these tests are run with a clean setup of Cassandra  nodes followed by a 
nodetool repair.
Before any data hits them.


From: Walsh, Stephen [mailto:stephen.wa...@aspect.com]
Sent: 30 September 2015 15:17
To: user@cassandra.apache.org
Subject: Consistency Issues

Hi there,

We are having some issues with consistency. I'll try my best to explain.

We have an application that was able to
Write ~1000 p/s
Read ~300 p/s
Total CF created: 400
Total Keyspaces created : 80

On a 4 node Cassandra Cluster with
Version 2.1.6
Replication : 3
Consistency  (Read & Write) : LOCAL_QUORUM
Cores : 4
Ram : 15 GB
Heap Size 8GB

This was fine and worked, but was pushing our application to the max.

-

Next we added a load balancer (HaProxy) to our application.
So now we have 3 of our nodes talking to 4 Cassandra Nodes with a load of
Write ~1250 p/s
Read 0p/s
Total CF created: 450
Total Keyspaces created : 100

On our application we now see
Cassandra timeout during write query at consistency LOCAL_QUORUM (2 replica 
were required but only 1 acknowledged the write)
(we are using java Cassandra driver 2.1.6)

So we increased the number of Cassandra nodes
To 5, then 6  and each time got the same replication error.

So then we double the spec of every node to
8 cores
30GB  RAM
Heap size 15GB

And we still get this replication error (2 replica were required but only 1 
acknowledged the write)

We know that when we introduce HaProxy Load balancer with 3 of our nodes that 
its hits Cassandra 3 times quicker.
But we've now increased the Cassandra spec nearly 3 fold, and only for an extra 
250 writes p/s and it still doesn't work.

We're having a hard time finding out why replication is an issue with the size 
of a cluster.

We tried to get OpsCenter working to monitor the nodes, but due to the amount 
of CF's in Cassandra the datastax-agent takes 90% of the CPU on every node.

Any suggestion / recommendation would be very welcome.

Regards
Stephen Walsh



This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


Re: Consistency Issues

2015-05-19 Thread Jared Rodriguez
It looks like NTP was the problem.  Thanks for the solution!!!

On Wed, May 13, 2015 at 9:20 AM, Robert Wille  wrote:

>  Timestamps have millisecond granularity. If you make multiple writes
> within the same millisecond, then the outcome is not deterministic.
>
>  Also, make sure you are running ntp. Clock skew will manifest itself
> similarly.
>
>  On May 13, 2015, at 3:47 AM, Jared Rodriguez 
> wrote:
>
>  Thanks for the feedback.  We have dug in deeper and upgraded to
> Cassandra 2.0.14 and are seeing the same issue.  What appears to be
> happening is that if a record is initially written, then the first read is
> fine.  But if we immediately update that record with a second write, that
> then the second read is problematic.
>
>  We have a 4 node cluster and a replication factor of 2.  What seems to
> be happening on the initial write the record is sent to nodes A and B.  If
> a secondary write (update) of the record occurs while the record is in the
> memtable and not yet written to the sstable of A or B, that the next read
> returns nothing.
>
>  We are continuing to dig in and get as much detail as possible before
> opening this as a JIRA.
>
> On Tue, May 12, 2015 at 6:51 PM, Robert Coli  wrote:
>
>>  On Tue, May 12, 2015 at 12:35 PM, Michael Shuler > > wrote:
>>
>>>  This is a 4 node cluster running Cassandra 2.0.6

>>>
>>> Can you reproduce the same issue on 2.0.14? (or better yet, the
>>> cassandra-2.0 branch HEAD, which will soon ship 2.0.15) If you get the same
>>> results, please, open a JIRA with the reproduction steps.
>>
>>
>>  And if you do file such a JIRA, please let the list know the JIRA URL,
>> to close the loop!
>>
>>  =Rob
>>
>>
>
>
>
>  --
> Jared Rodriguez
>
>
>


-- 
Jared Rodriguez


Re: Consistency Issues

2015-05-13 Thread Robert Wille
Timestamps have millisecond granularity. If you make multiple writes within the 
same millisecond, then the outcome is not deterministic.

Also, make sure you are running ntp. Clock skew will manifest itself similarly.

On May 13, 2015, at 3:47 AM, Jared Rodriguez 
mailto:jrodrig...@kitedesk.com>> wrote:

Thanks for the feedback.  We have dug in deeper and upgraded to Cassandra 
2.0.14 and are seeing the same issue.  What appears to be happening is that if 
a record is initially written, then the first read is fine.  But if we 
immediately update that record with a second write, that then the second read 
is problematic.

We have a 4 node cluster and a replication factor of 2.  What seems to be 
happening on the initial write the record is sent to nodes A and B.  If a 
secondary write (update) of the record occurs while the record is in the 
memtable and not yet written to the sstable of A or B, that the next read 
returns nothing.

We are continuing to dig in and get as much detail as possible before opening 
this as a JIRA.

On Tue, May 12, 2015 at 6:51 PM, Robert Coli 
mailto:rc...@eventbrite.com>> wrote:
On Tue, May 12, 2015 at 12:35 PM, Michael Shuler 
mailto:mich...@pbandjelly.org>> wrote:
This is a 4 node cluster running Cassandra 2.0.6

Can you reproduce the same issue on 2.0.14? (or better yet, the cassandra-2.0 
branch HEAD, which will soon ship 2.0.15) If you get the same results, please, 
open a JIRA with the reproduction steps.

And if you do file such a JIRA, please let the list know the JIRA URL, to close 
the loop!

=Rob




--
Jared Rodriguez




Re: Consistency Issues

2015-05-13 Thread Jared Rodriguez
Thanks for the feedback.  We have dug in deeper and upgraded to Cassandra
2.0.14 and are seeing the same issue.  What appears to be happening is that
if a record is initially written, then the first read is fine.  But if we
immediately update that record with a second write, that then the second
read is problematic.

We have a 4 node cluster and a replication factor of 2.  What seems to be
happening on the initial write the record is sent to nodes A and B.  If a
secondary write (update) of the record occurs while the record is in the
memtable and not yet written to the sstable of A or B, that the next read
returns nothing.

We are continuing to dig in and get as much detail as possible before
opening this as a JIRA.

On Tue, May 12, 2015 at 6:51 PM, Robert Coli  wrote:

> On Tue, May 12, 2015 at 12:35 PM, Michael Shuler 
> wrote:
>
>> This is a 4 node cluster running Cassandra 2.0.6
>>>
>>
>> Can you reproduce the same issue on 2.0.14? (or better yet, the
>> cassandra-2.0 branch HEAD, which will soon ship 2.0.15) If you get the same
>> results, please, open a JIRA with the reproduction steps.
>
>
> And if you do file such a JIRA, please let the list know the JIRA URL, to
> close the loop!
>
> =Rob
>
>



-- 
Jared Rodriguez


Re: Consistency Issues

2015-05-12 Thread Robert Coli
On Tue, May 12, 2015 at 12:35 PM, Michael Shuler 
wrote:

> This is a 4 node cluster running Cassandra 2.0.6
>>
>
> Can you reproduce the same issue on 2.0.14? (or better yet, the
> cassandra-2.0 branch HEAD, which will soon ship 2.0.15) If you get the same
> results, please, open a JIRA with the reproduction steps.


And if you do file such a JIRA, please let the list know the JIRA URL, to
close the loop!

=Rob


Re: Consistency Issues

2015-05-12 Thread Michael Shuler

On 05/12/2015 04:50 AM, Jared Rodriguez wrote:

I have a specific update and query that I need to ensure has strong
consistency.  To that end, when I do the write, I set the consistency
level to ALL.  Shortly afterwards, I do a query for that record with a
consistency of ONE and somehow get back stale data.

This is a 4 node cluster running Cassandra 2.0.6


Can you reproduce the same issue on 2.0.14? (or better yet, the 
cassandra-2.0 branch HEAD, which will soon ship 2.0.15) If you get the 
same results, please, open a JIRA with the reproduction steps.


--
Kind regards,
Michael