RE: Consistency Issues
It did, but a ran it again on one node – that node never recovered. ☹ From: Robert Coli [mailto:rc...@eventbrite.com] Sent: 02 October 2015 21:20 To: user@cassandra.apache.org Subject: Re: Consistency Issues On Fri, Oct 2, 2015 at 1:32 AM, Walsh, Stephen mailto:stephen.wa...@aspect.com>> wrote: Sorry for the late reply, I ran the nodetool resetlocalschema on all nodes but in the end it just removed all the schemas and crashed the applications. I need to reset and try again. I’ll try get you the gc stats today ☺ FTR, running resetlocalschema on all nodes (especially simultaneously) seems likely to nuke all of your schema. =Rob This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.
Re: Consistency Issues
On Fri, Oct 2, 2015 at 1:32 AM, Walsh, Stephen wrote: > Sorry for the late reply, I ran the nodetool resetlocalschema on all > nodes but in the end it just removed all the schemas and crashed the > applications. > > I need to reset and try again. I’ll try get you the gc stats today J > FTR, running resetlocalschema on all nodes (especially simultaneously) seems likely to nuke all of your schema. =Rob
RE: Consistency Issues
Using the following cmd - sudo su cassandra -c "jstat -gccause 4162” Gave this (not sure if it will present correctly on the webpage) But during load we only see data move between the survivor spaces in Eden and the old gen never really grows S0 S1E O M CCS YGC YGCTFGC FGCT GCTLGCC GCC 0.00 70.57 48.69 26.29 97.86 96.62119 14.087 20.100 14.187 Allocation Failure No GC 0.00 70.57 49.02 26.29 97.86 96.62119 14.087 20.100 14.187 Allocation Failure No GC 0.00 70.57 78.38 26.29 97.86 96.62119 14.087 20.100 14.187 Allocation Failure No GC 0.00 70.57 83.99 26.29 97.86 96.62119 14.087 20.100 14.187 Allocation Failure No GC 0.00 70.57 90.07 26.29 97.86 96.62119 14.087 20.100 14.187 Allocation Failure No GC 0.00 70.57 90.30 26.29 97.86 96.62119 14.087 20.100 14.187 Allocation Failure No GC 0.00 70.57 90.40 26.29 97.86 96.62119 14.087 20.100 14.187 Allocation Failure No GC From: Walsh, Stephen [mailto:stephen.wa...@aspect.com] Sent: 02 October 2015 09:32 To: user@cassandra.apache.org Subject: RE: Consistency Issues Sorry for the late reply, I ran the nodetool resetlocalschema on all nodes but in the end it just removed all the schemas and crashed the applications. I need to reset and try again. I’ll try get you the gc stats today ☺ From: Jonathan Haddad [mailto:j...@jonhaddad.com] Sent: 01 October 2015 16:01 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Consistency Issues You say that you don't think GC is your issue... but did you actually check? The reasons you suggest aren't very convincing. Can you provide your GC settings, and take a look at jstat --gccause? http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstat.html#gccause_option On Thu, Oct 1, 2015 at 4:50 AM Walsh, Stephen mailto:stephen.wa...@aspect.com>> wrote: If you’re looking for the clean-up of the old gen in the jvm heap, it doesn’t happen. We have a new gen turning 15 times before its pushed to old gen. Seems all our data only has a TTL of 10 seconds – very little data is sent to the old gen. Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is our issue. I’m more worried about error messages in the Cassandra log file that state. UnknownColumnFamilyException reading from socket; closing org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=cf411b50-6785-11e5-a435-e7be20c92086 and cassandra OutboundTcpConnection.java:313 - error writing to Connection. But I really need to understand this best practice that was mentioned (on number of CF’s) by Jack Krupansky. Anyone more information on this? Many thanks for all your help guys keep it coming ☺ Steve From: Ricardo Sancho [mailto:sancho.rica...@gmail.com<mailto:sancho.rica...@gmail.com>] Sent: 01 October 2015 09:39 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: RE: Consistency Issues Can you tell us how much time your gcs are taking? Do you see any especially long ones? On 1 Oct 2015 09:37, "Walsh, Stephen" mailto:stephen.wa...@aspect.com>> wrote: There is no load balancer in front of Cassandra, it’s in front of our application. Everyone seems hung up on this point? But it’s not the root causing of the inconsistency issue. Can anyone verify the best practice for number of CF’s? From: Robert Coli [mailto:rc...@eventbrite.com<mailto:rc...@eventbrite.com>] Sent: 30 September 2015 18:45 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Consistency Issues On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen mailto:stephen.wa...@aspect.com>> wrote: We never had these issue with our first run. Its only when we added another 25% of writes. As Jack said, you are probably pushing your GC over a threshold, leading to long pause times and inability to meet quorum. As Sebastian said, you probably shouldn't need a load balancer in front of Cassandra. =Rob This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments. This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute th
RE: Consistency Issues
Sorry for the late reply, I ran the nodetool resetlocalschema on all nodes but in the end it just removed all the schemas and crashed the applications. I need to reset and try again. I’ll try get you the gc stats today ☺ From: Jonathan Haddad [mailto:j...@jonhaddad.com] Sent: 01 October 2015 16:01 To: user@cassandra.apache.org Subject: Re: Consistency Issues You say that you don't think GC is your issue... but did you actually check? The reasons you suggest aren't very convincing. Can you provide your GC settings, and take a look at jstat --gccause? http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstat.html#gccause_option On Thu, Oct 1, 2015 at 4:50 AM Walsh, Stephen mailto:stephen.wa...@aspect.com>> wrote: If you’re looking for the clean-up of the old gen in the jvm heap, it doesn’t happen. We have a new gen turning 15 times before its pushed to old gen. Seems all our data only has a TTL of 10 seconds – very little data is sent to the old gen. Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is our issue. I’m more worried about error messages in the Cassandra log file that state. UnknownColumnFamilyException reading from socket; closing org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=cf411b50-6785-11e5-a435-e7be20c92086 and cassandra OutboundTcpConnection.java:313 - error writing to Connection. But I really need to understand this best practice that was mentioned (on number of CF’s) by Jack Krupansky. Anyone more information on this? Many thanks for all your help guys keep it coming ☺ Steve From: Ricardo Sancho [mailto:sancho.rica...@gmail.com<mailto:sancho.rica...@gmail.com>] Sent: 01 October 2015 09:39 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: RE: Consistency Issues Can you tell us how much time your gcs are taking? Do you see any especially long ones? On 1 Oct 2015 09:37, "Walsh, Stephen" mailto:stephen.wa...@aspect.com>> wrote: There is no load balancer in front of Cassandra, it’s in front of our application. Everyone seems hung up on this point? But it’s not the root causing of the inconsistency issue. Can anyone verify the best practice for number of CF’s? From: Robert Coli [mailto:rc...@eventbrite.com<mailto:rc...@eventbrite.com>] Sent: 30 September 2015 18:45 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Consistency Issues On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen mailto:stephen.wa...@aspect.com>> wrote: We never had these issue with our first run. Its only when we added another 25% of writes. As Jack said, you are probably pushing your GC over a threshold, leading to long pause times and inability to meet quorum. As Sebastian said, you probably shouldn't need a load balancer in front of Cassandra. =Rob This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments. This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments. This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.
Re: Consistency Issues
You say that you don't think GC is your issue... but did you actually check? The reasons you suggest aren't very convincing. Can you provide your GC settings, and take a look at jstat --gccause? http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstat.html#gccause_option On Thu, Oct 1, 2015 at 4:50 AM Walsh, Stephen wrote: > If you’re looking for the clean-up of the old gen in the jvm heap, it > doesn’t happen. > > We have a new gen turning 15 times before its pushed to old gen. > > Seems all our data only has a TTL of 10 seconds – very little data is sent > to the old gen. > > > > Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is > our issue. > > > > > > I’m more worried about error messages in the Cassandra log file that state. > > > > > > UnknownColumnFamilyException reading from socket; closing > > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find > cfId=cf411b50-6785-11e5-a435-e7be20c92086 > > > > and > > > > cassandra OutboundTcpConnection.java:313 - error writing to Connection. > > > > > > > > But I really need to understand this best practice that was mentioned (on > number of CF’s) by Jack Krupansky. > > Anyone more information on this? > > > > > > Many thanks for all your help guys keep it coming J > > Steve > > > > *From:* Ricardo Sancho [mailto:sancho.rica...@gmail.com] > *Sent:* 01 October 2015 09:39 > *To:* user@cassandra.apache.org > *Subject:* RE: Consistency Issues > > > > Can you tell us how much time your gcs are taking? > Do you see any especially long ones? > > On 1 Oct 2015 09:37, "Walsh, Stephen" wrote: > > There is no load balancer in front of Cassandra, it’s in front of our > application. > > Everyone seems hung up on this point? But it’s not the root causing of the > inconsistency issue. > > > > Can anyone verify the best practice for number of CF’s? > > > > > > *From:* Robert Coli [mailto:rc...@eventbrite.com] > *Sent:* 30 September 2015 18:45 > *To:* user@cassandra.apache.org > *Subject:* Re: Consistency Issues > > > > On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen > wrote: > > > > We never had these issue with our first run. Its only when we added > another 25% of writes. > > > > As Jack said, you are probably pushing your GC over a threshold, leading > to long pause times and inability to meet quorum. > > > > As Sebastian said, you probably shouldn't need a load balancer in front of > Cassandra. > > > > =Rob > > > > This email (including any attachments) is proprietary to Aspect Software, > Inc. and may contain information that is confidential. If you have received > this message in error, please do not read, copy or forward this message. > Please notify the sender immediately, delete it from your system and > destroy any copies. You may not further disclose or distribute this email > or its attachments. > > This email (including any attachments) is proprietary to Aspect Software, > Inc. and may contain information that is confidential. If you have received > this message in error, please do not read, copy or forward this message. > Please notify the sender immediately, delete it from your system and > destroy any copies. You may not further disclose or distribute this email > or its attachments. >
Re: Consistency Issues
Well... I wasn't expecting that, as both OpsCenter 5.2.1 and cqlsh in Cassandra 2.1.x both use native protocol. I was expecting them having different protocols, so Have no further ideas :( Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> On 1 October 2015 at 14:36, Walsh, Stephen wrote: > Thanks Jake, I’ll try test out 2.1.9 to see if it resolved the issue and > ill try “nodetool resetlocalschema” now to see if it helps. > > > > Cassandra is 2.1.6 > > OpsCenter is 5.2.1 > > > > *From:* Jake Luciani [mailto:jak...@gmail.com] > *Sent:* 01 October 2015 14:00 > *To:* user > *Subject:* Re: Consistency Issues > > > > Onur, was responding to Stephen's issue. > > > > > > On Thu, Oct 1, 2015 at 8:56 AM, Onur Yalazı > wrote: > > Thank you Jake. > > The issue is I do not have missing CF's and upgrading beyond 2.1.3 is not > a possibility because of the deprecation of cql dialects. Our application > is using Hector and migrating to cql3 is a huge refactoring. > > > > > On 01/10/15 15:48, Jake Luciani wrote: > > Couple things to try. > > 1. nodetool resetlocalschema on the nodes with missing CFs. This will > refresh the schema on the local node. > 2. upgrade to 2.1.9. There are some pretty major issues in 2.1.6 (nothing > specific to this problem but worth upgrading) > > > > > > > > -- > > http://twitter.com/tjake > This email (including any attachments) is proprietary to Aspect Software, > Inc. and may contain information that is confidential. If you have received > this message in error, please do not read, copy or forward this message. > Please notify the sender immediately, delete it from your system and > destroy any copies. You may not further disclose or distribute this email > or its attachments. >
RE: Consistency Issues
You're running describe with CL quorum aren't you? To see the inconsistency you'd have to check the system.schema_column family tables on each node. On Oct 1, 2015 8:07 AM, "Walsh, Stephen" wrote: > No such thing as a stupid questionJ > > I know they exist in some nodes, but if they replicated correctly is a > different story. > > I’m checking this one now, > > > > Ok, hooked up OpsCenter to see what it was saying, > > Out of the 100 keyspaces creted, > > 9 are missing one CF > > 2 are missing two CF’s > > 1 is missing three CF’s > > > > It looks like the replication of the tables did not complete to all nodes? > > > > Looking at each of the 4 nodes at the keyspace with 3 missing CF’s > > (via CQLSH_HOST=x.x.x.x cqlsh & “Describe keyspace XXX;”) > > > > Node 1 : has all CF’s > > Node 2 : has all CF’s > > Node 3 : has all CF’s > > Node 4 : has all CF’s > > > > > > This is indeed very strange…. > > > > > > *From:* Carlos Alonso [mailto:i...@mrcalonso.com] > *Sent:* 01 October 2015 12:05 > *To:* user@cassandra.apache.org > *Subject:* Re: Consistency Issues > > > > And that's a stupid one, I know, but does the column you're trying to > access actually exist? > > > Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> > > > > On 1 October 2015 at 11:09, Walsh, Stephen > wrote: > > I did think of that and they are all the same version J > > > > > > *From:* Carlos Alonso [mailto:i...@mrcalonso.com] > *Sent:* 01 October 2015 10:11 > > > *To:* user@cassandra.apache.org > *Subject:* Re: Consistency Issues > > > > Hi Stephen. > > > > The UnknownColumnFamilyException made me thought of a possible schema > disagreement in which any of your nodes has a different version and > therefore you cannot reach quorum? > > > > Can you run nodetool describecluster and see if all nodes have the same > schema versions? > > > > Cheers! > > > Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> > > > > On 1 October 2015 at 09:49, Walsh, Stephen > wrote: > > If you’re looking for the clean-up of the old gen in the jvm heap, it > doesn’t happen. > > We have a new gen turning 15 times before its pushed to old gen. > > Seems all our data only has a TTL of 10 seconds – very little data is sent > to the old gen. > > > > Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is > our issue. > > > > > > I’m more worried about error messages in the Cassandra log file that state. > > > > > > UnknownColumnFamilyException reading from socket; closing > > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find > cfId=cf411b50-6785-11e5-a435-e7be20c92086 > > > > and > > > > cassandra OutboundTcpConnection.java:313 - error writing to Connection. > > > > > > > > But I really need to understand this best practice that was mentioned (on > number of CF’s) by Jack Krupansky. > > Anyone more information on this? > > > > > > Many thanks for all your help guys keep it coming J > > Steve > > > > *From:* Ricardo Sancho [mailto:sancho.rica...@gmail.com] > *Sent:* 01 October 2015 09:39 > *To:* user@cassandra.apache.org > *Subject:* RE: Consistency Issues > > > > Can you tell us how much time your gcs are taking? > Do you see any especially long ones? > > On 1 Oct 2015 09:37, "Walsh, Stephen" wrote: > > There is no load balancer in front of Cassandra, it’s in front of our > application. > > Everyone seems hung up on this point? But it’s not the root causing of the > inconsistency issue. > > > > Can anyone verify the best practice for number of CF’s? > > > > > > *From:* Robert Coli [mailto:rc...@eventbrite.com] > *Sent:* 30 September 2015 18:45 > *To:* user@cassandra.apache.org > *Subject:* Re: Consistency Issues > > > > On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen > wrote: > > > > We never had these issue with our first run. Its only when we added > another 25% of writes. > > > > As Jack said, you are probably pushing your GC over a threshold, leading > to long pause times and inability to meet quorum. > > > > As Sebastian said, you probably shouldn't need a load balancer in front of > Cassandra. > > > > =Rob > > > > This email (including any attachments) is proprietary to Aspect Software, > Inc. and may contain information that is confidential. If yo
RE: Consistency Issues
Thanks Jake, I’ll try test out 2.1.9 to see if it resolved the issue and ill try “nodetool resetlocalschema” now to see if it helps. Cassandra is 2.1.6 OpsCenter is 5.2.1 From: Jake Luciani [mailto:jak...@gmail.com] Sent: 01 October 2015 14:00 To: user Subject: Re: Consistency Issues Onur, was responding to Stephen's issue. On Thu, Oct 1, 2015 at 8:56 AM, Onur Yalazı mailto:onur.yal...@8digits.com>> wrote: Thank you Jake. The issue is I do not have missing CF's and upgrading beyond 2.1.3 is not a possibility because of the deprecation of cql dialects. Our application is using Hector and migrating to cql3 is a huge refactoring. On 01/10/15 15:48, Jake Luciani wrote: Couple things to try. 1. nodetool resetlocalschema on the nodes with missing CFs. This will refresh the schema on the local node. 2. upgrade to 2.1.9. There are some pretty major issues in 2.1.6 (nothing specific to this problem but worth upgrading) -- http://twitter.com/tjake This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.
Re: Consistency Issues
Onur, was responding to Stephen's issue. On Thu, Oct 1, 2015 at 8:56 AM, Onur Yalazı wrote: > Thank you Jake. > > The issue is I do not have missing CF's and upgrading beyond 2.1.3 is not > a possibility because of the deprecation of cql dialects. Our application > is using Hector and migrating to cql3 is a huge refactoring. > > > > On 01/10/15 15:48, Jake Luciani wrote: > >> Couple things to try. >> >> 1. nodetool resetlocalschema on the nodes with missing CFs. This will >> refresh the schema on the local node. >> 2. upgrade to 2.1.9. There are some pretty major issues in 2.1.6 (nothing >> specific to this problem but worth upgrading) >> > > -- http://twitter.com/tjake
Re: Consistency Issues
Thank you Jake. The issue is I do not have missing CF's and upgrading beyond 2.1.3 is not a possibility because of the deprecation of cql dialects. Our application is using Hector and migrating to cql3 is a huge refactoring. On 01/10/15 15:48, Jake Luciani wrote: Couple things to try. 1. nodetool resetlocalschema on the nodes with missing CFs. This will refresh the schema on the local node. 2. upgrade to 2.1.9. There are some pretty major issues in 2.1.6 (nothing specific to this problem but worth upgrading)
Re: Consistency Issues
Couple things to try. 1. nodetool resetlocalschema on the nodes with missing CFs. This will refresh the schema on the local node. 2. upgrade to 2.1.9. There are some pretty major issues in 2.1.6 (nothing specific to this problem but worth upgrading)
Re: Consistency Issues
Which versions of Cassandra and OpsCenter are you using? Because probably opscenter and your app are using cql and cqlsh is using thrift or vice versa and that's why depending on where you access from you see different things? Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> On 1 October 2015 at 13:06, Walsh, Stephen wrote: > No such thing as a stupid questionJ > > I know they exist in some nodes, but if they replicated correctly is a > different story. > > I’m checking this one now, > > > > Ok, hooked up OpsCenter to see what it was saying, > > Out of the 100 keyspaces creted, > > 9 are missing one CF > > 2 are missing two CF’s > > 1 is missing three CF’s > > > > It looks like the replication of the tables did not complete to all nodes? > > > > Looking at each of the 4 nodes at the keyspace with 3 missing CF’s > > (via CQLSH_HOST=x.x.x.x cqlsh & “Describe keyspace XXX;”) > > > > Node 1 : has all CF’s > > Node 2 : has all CF’s > > Node 3 : has all CF’s > > Node 4 : has all CF’s > > > > > > This is indeed very strange…. > > > > > > *From:* Carlos Alonso [mailto:i...@mrcalonso.com] > *Sent:* 01 October 2015 12:05 > *To:* user@cassandra.apache.org > *Subject:* Re: Consistency Issues > > > > And that's a stupid one, I know, but does the column you're trying to > access actually exist? > > > Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> > > > > On 1 October 2015 at 11:09, Walsh, Stephen > wrote: > > I did think of that and they are all the same version J > > > > > > *From:* Carlos Alonso [mailto:i...@mrcalonso.com] > *Sent:* 01 October 2015 10:11 > > > *To:* user@cassandra.apache.org > *Subject:* Re: Consistency Issues > > > > Hi Stephen. > > > > The UnknownColumnFamilyException made me thought of a possible schema > disagreement in which any of your nodes has a different version and > therefore you cannot reach quorum? > > > > Can you run nodetool describecluster and see if all nodes have the same > schema versions? > > > > Cheers! > > > Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> > > > > On 1 October 2015 at 09:49, Walsh, Stephen > wrote: > > If you’re looking for the clean-up of the old gen in the jvm heap, it > doesn’t happen. > > We have a new gen turning 15 times before its pushed to old gen. > > Seems all our data only has a TTL of 10 seconds – very little data is sent > to the old gen. > > > > Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is > our issue. > > > > > > I’m more worried about error messages in the Cassandra log file that state. > > > > > > UnknownColumnFamilyException reading from socket; closing > > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find > cfId=cf411b50-6785-11e5-a435-e7be20c92086 > > > > and > > > > cassandra OutboundTcpConnection.java:313 - error writing to Connection. > > > > > > > > But I really need to understand this best practice that was mentioned (on > number of CF’s) by Jack Krupansky. > > Anyone more information on this? > > > > > > Many thanks for all your help guys keep it coming J > > Steve > > > > *From:* Ricardo Sancho [mailto:sancho.rica...@gmail.com] > *Sent:* 01 October 2015 09:39 > *To:* user@cassandra.apache.org > *Subject:* RE: Consistency Issues > > > > Can you tell us how much time your gcs are taking? > Do you see any especially long ones? > > On 1 Oct 2015 09:37, "Walsh, Stephen" wrote: > > There is no load balancer in front of Cassandra, it’s in front of our > application. > > Everyone seems hung up on this point? But it’s not the root causing of the > inconsistency issue. > > > > Can anyone verify the best practice for number of CF’s? > > > > > > *From:* Robert Coli [mailto:rc...@eventbrite.com] > *Sent:* 30 September 2015 18:45 > *To:* user@cassandra.apache.org > *Subject:* Re: Consistency Issues > > > > On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen > wrote: > > > > We never had these issue with our first run. Its only when we added > another 25% of writes. > > > > As Jack said, you are probably pushing your GC over a threshold, leading > to long pause times and inability to meet quorum. > > > > As Sebastian said, you probably shouldn't need a load balancer in front of > Cassandra. > > > > =Rob > > > >
RE: Consistency Issues
No such thing as a stupid question☺ I know they exist in some nodes, but if they replicated correctly is a different story. I’m checking this one now, Ok, hooked up OpsCenter to see what it was saying, Out of the 100 keyspaces creted, 9 are missing one CF 2 are missing two CF’s 1 is missing three CF’s It looks like the replication of the tables did not complete to all nodes? Looking at each of the 4 nodes at the keyspace with 3 missing CF’s (via CQLSH_HOST=x.x.x.x cqlsh & “Describe keyspace XXX;”) Node 1 : has all CF’s Node 2 : has all CF’s Node 3 : has all CF’s Node 4 : has all CF’s This is indeed very strange…. From: Carlos Alonso [mailto:i...@mrcalonso.com] Sent: 01 October 2015 12:05 To: user@cassandra.apache.org Subject: Re: Consistency Issues And that's a stupid one, I know, but does the column you're trying to access actually exist? Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso> On 1 October 2015 at 11:09, Walsh, Stephen mailto:stephen.wa...@aspect.com>> wrote: I did think of that and they are all the same version ☺ From: Carlos Alonso [mailto:i...@mrcalonso.com<mailto:i...@mrcalonso.com>] Sent: 01 October 2015 10:11 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Consistency Issues Hi Stephen. The UnknownColumnFamilyException made me thought of a possible schema disagreement in which any of your nodes has a different version and therefore you cannot reach quorum? Can you run nodetool describecluster and see if all nodes have the same schema versions? Cheers! Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso> On 1 October 2015 at 09:49, Walsh, Stephen mailto:stephen.wa...@aspect.com>> wrote: If you’re looking for the clean-up of the old gen in the jvm heap, it doesn’t happen. We have a new gen turning 15 times before its pushed to old gen. Seems all our data only has a TTL of 10 seconds – very little data is sent to the old gen. Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is our issue. I’m more worried about error messages in the Cassandra log file that state. UnknownColumnFamilyException reading from socket; closing org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=cf411b50-6785-11e5-a435-e7be20c92086 and cassandra OutboundTcpConnection.java:313 - error writing to Connection. But I really need to understand this best practice that was mentioned (on number of CF’s) by Jack Krupansky. Anyone more information on this? Many thanks for all your help guys keep it coming ☺ Steve From: Ricardo Sancho [mailto:sancho.rica...@gmail.com<mailto:sancho.rica...@gmail.com>] Sent: 01 October 2015 09:39 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: RE: Consistency Issues Can you tell us how much time your gcs are taking? Do you see any especially long ones? On 1 Oct 2015 09:37, "Walsh, Stephen" mailto:stephen.wa...@aspect.com>> wrote: There is no load balancer in front of Cassandra, it’s in front of our application. Everyone seems hung up on this point? But it’s not the root causing of the inconsistency issue. Can anyone verify the best practice for number of CF’s? From: Robert Coli [mailto:rc...@eventbrite.com<mailto:rc...@eventbrite.com>] Sent: 30 September 2015 18:45 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Consistency Issues On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen mailto:stephen.wa...@aspect.com>> wrote: We never had these issue with our first run. Its only when we added another 25% of writes. As Jack said, you are probably pushing your GC over a threshold, leading to long pause times and inability to meet quorum. As Sebastian said, you probably shouldn't need a load balancer in front of Cassandra. =Rob This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments. This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments. This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete
Re: Consistency Issues
And that's a stupid one, I know, but does the column you're trying to access actually exist? Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> On 1 October 2015 at 11:09, Walsh, Stephen wrote: > I did think of that and they are all the same version J > > > > > > *From:* Carlos Alonso [mailto:i...@mrcalonso.com] > *Sent:* 01 October 2015 10:11 > > *To:* user@cassandra.apache.org > *Subject:* Re: Consistency Issues > > > > Hi Stephen. > > > > The UnknownColumnFamilyException made me thought of a possible schema > disagreement in which any of your nodes has a different version and > therefore you cannot reach quorum? > > > > Can you run nodetool describecluster and see if all nodes have the same > schema versions? > > > > Cheers! > > > Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> > > > > On 1 October 2015 at 09:49, Walsh, Stephen > wrote: > > If you’re looking for the clean-up of the old gen in the jvm heap, it > doesn’t happen. > > We have a new gen turning 15 times before its pushed to old gen. > > Seems all our data only has a TTL of 10 seconds – very little data is sent > to the old gen. > > > > Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is > our issue. > > > > > > I’m more worried about error messages in the Cassandra log file that state. > > > > > > UnknownColumnFamilyException reading from socket; closing > > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find > cfId=cf411b50-6785-11e5-a435-e7be20c92086 > > > > and > > > > cassandra OutboundTcpConnection.java:313 - error writing to Connection. > > > > > > > > But I really need to understand this best practice that was mentioned (on > number of CF’s) by Jack Krupansky. > > Anyone more information on this? > > > > > > Many thanks for all your help guys keep it coming J > > Steve > > > > *From:* Ricardo Sancho [mailto:sancho.rica...@gmail.com] > *Sent:* 01 October 2015 09:39 > *To:* user@cassandra.apache.org > *Subject:* RE: Consistency Issues > > > > Can you tell us how much time your gcs are taking? > Do you see any especially long ones? > > On 1 Oct 2015 09:37, "Walsh, Stephen" wrote: > > There is no load balancer in front of Cassandra, it’s in front of our > application. > > Everyone seems hung up on this point? But it’s not the root causing of the > inconsistency issue. > > > > Can anyone verify the best practice for number of CF’s? > > > > > > *From:* Robert Coli [mailto:rc...@eventbrite.com] > *Sent:* 30 September 2015 18:45 > *To:* user@cassandra.apache.org > *Subject:* Re: Consistency Issues > > > > On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen > wrote: > > > > We never had these issue with our first run. Its only when we added > another 25% of writes. > > > > As Jack said, you are probably pushing your GC over a threshold, leading > to long pause times and inability to meet quorum. > > > > As Sebastian said, you probably shouldn't need a load balancer in front of > Cassandra. > > > > =Rob > > > > This email (including any attachments) is proprietary to Aspect Software, > Inc. and may contain information that is confidential. If you have received > this message in error, please do not read, copy or forward this message. > Please notify the sender immediately, delete it from your system and > destroy any copies. You may not further disclose or distribute this email > or its attachments. > > This email (including any attachments) is proprietary to Aspect Software, > Inc. and may contain information that is confidential. If you have received > this message in error, please do not read, copy or forward this message. > Please notify the sender immediately, delete it from your system and > destroy any copies. You may not further disclose or distribute this email > or its attachments. > > > This email (including any attachments) is proprietary to Aspect Software, > Inc. and may contain information that is confidential. If you have received > this message in error, please do not read, copy or forward this message. > Please notify the sender immediately, delete it from your system and > destroy any copies. You may not further disclose or distribute this email > or its attachments. >
RE: Consistency Issues
I did think of that and they are all the same version ☺ From: Carlos Alonso [mailto:i...@mrcalonso.com] Sent: 01 October 2015 10:11 To: user@cassandra.apache.org Subject: Re: Consistency Issues Hi Stephen. The UnknownColumnFamilyException made me thought of a possible schema disagreement in which any of your nodes has a different version and therefore you cannot reach quorum? Can you run nodetool describecluster and see if all nodes have the same schema versions? Cheers! Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso> On 1 October 2015 at 09:49, Walsh, Stephen mailto:stephen.wa...@aspect.com>> wrote: If you’re looking for the clean-up of the old gen in the jvm heap, it doesn’t happen. We have a new gen turning 15 times before its pushed to old gen. Seems all our data only has a TTL of 10 seconds – very little data is sent to the old gen. Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is our issue. I’m more worried about error messages in the Cassandra log file that state. UnknownColumnFamilyException reading from socket; closing org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=cf411b50-6785-11e5-a435-e7be20c92086 and cassandra OutboundTcpConnection.java:313 - error writing to Connection. But I really need to understand this best practice that was mentioned (on number of CF’s) by Jack Krupansky. Anyone more information on this? Many thanks for all your help guys keep it coming ☺ Steve From: Ricardo Sancho [mailto:sancho.rica...@gmail.com<mailto:sancho.rica...@gmail.com>] Sent: 01 October 2015 09:39 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: RE: Consistency Issues Can you tell us how much time your gcs are taking? Do you see any especially long ones? On 1 Oct 2015 09:37, "Walsh, Stephen" mailto:stephen.wa...@aspect.com>> wrote: There is no load balancer in front of Cassandra, it’s in front of our application. Everyone seems hung up on this point? But it’s not the root causing of the inconsistency issue. Can anyone verify the best practice for number of CF’s? From: Robert Coli [mailto:rc...@eventbrite.com<mailto:rc...@eventbrite.com>] Sent: 30 September 2015 18:45 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Consistency Issues On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen mailto:stephen.wa...@aspect.com>> wrote: We never had these issue with our first run. Its only when we added another 25% of writes. As Jack said, you are probably pushing your GC over a threshold, leading to long pause times and inability to meet quorum. As Sebastian said, you probably shouldn't need a load balancer in front of Cassandra. =Rob This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments. This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments. This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.
Re: Consistency Issues
Hi Stephen. The UnknownColumnFamilyException made me thought of a possible schema disagreement in which any of your nodes has a different version and therefore you cannot reach quorum? Can you run nodetool describecluster and see if all nodes have the same schema versions? Cheers! Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> On 1 October 2015 at 09:49, Walsh, Stephen wrote: > If you’re looking for the clean-up of the old gen in the jvm heap, it > doesn’t happen. > > We have a new gen turning 15 times before its pushed to old gen. > > Seems all our data only has a TTL of 10 seconds – very little data is sent > to the old gen. > > > > Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is > our issue. > > > > > > I’m more worried about error messages in the Cassandra log file that state. > > > > > > UnknownColumnFamilyException reading from socket; closing > > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find > cfId=cf411b50-6785-11e5-a435-e7be20c92086 > > > > and > > > > cassandra OutboundTcpConnection.java:313 - error writing to Connection. > > > > > > > > But I really need to understand this best practice that was mentioned (on > number of CF’s) by Jack Krupansky. > > Anyone more information on this? > > > > > > Many thanks for all your help guys keep it coming J > > Steve > > > > *From:* Ricardo Sancho [mailto:sancho.rica...@gmail.com] > *Sent:* 01 October 2015 09:39 > *To:* user@cassandra.apache.org > *Subject:* RE: Consistency Issues > > > > Can you tell us how much time your gcs are taking? > Do you see any especially long ones? > > On 1 Oct 2015 09:37, "Walsh, Stephen" wrote: > > There is no load balancer in front of Cassandra, it’s in front of our > application. > > Everyone seems hung up on this point? But it’s not the root causing of the > inconsistency issue. > > > > Can anyone verify the best practice for number of CF’s? > > > > > > *From:* Robert Coli [mailto:rc...@eventbrite.com] > *Sent:* 30 September 2015 18:45 > *To:* user@cassandra.apache.org > *Subject:* Re: Consistency Issues > > > > On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen > wrote: > > > > We never had these issue with our first run. Its only when we added > another 25% of writes. > > > > As Jack said, you are probably pushing your GC over a threshold, leading > to long pause times and inability to meet quorum. > > > > As Sebastian said, you probably shouldn't need a load balancer in front of > Cassandra. > > > > =Rob > > > > This email (including any attachments) is proprietary to Aspect Software, > Inc. and may contain information that is confidential. If you have received > this message in error, please do not read, copy or forward this message. > Please notify the sender immediately, delete it from your system and > destroy any copies. You may not further disclose or distribute this email > or its attachments. > > This email (including any attachments) is proprietary to Aspect Software, > Inc. and may contain information that is confidential. If you have received > this message in error, please do not read, copy or forward this message. > Please notify the sender immediately, delete it from your system and > destroy any copies. You may not further disclose or distribute this email > or its attachments. >
RE: Consistency Issues
If you’re looking for the clean-up of the old gen in the jvm heap, it doesn’t happen. We have a new gen turning 15 times before its pushed to old gen. Seems all our data only has a TTL of 10 seconds – very little data is sent to the old gen. Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is our issue. I’m more worried about error messages in the Cassandra log file that state. UnknownColumnFamilyException reading from socket; closing org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=cf411b50-6785-11e5-a435-e7be20c92086 and cassandra OutboundTcpConnection.java:313 - error writing to Connection. But I really need to understand this best practice that was mentioned (on number of CF’s) by Jack Krupansky. Anyone more information on this? Many thanks for all your help guys keep it coming ☺ Steve From: Ricardo Sancho [mailto:sancho.rica...@gmail.com] Sent: 01 October 2015 09:39 To: user@cassandra.apache.org Subject: RE: Consistency Issues Can you tell us how much time your gcs are taking? Do you see any especially long ones? On 1 Oct 2015 09:37, "Walsh, Stephen" mailto:stephen.wa...@aspect.com>> wrote: There is no load balancer in front of Cassandra, it’s in front of our application. Everyone seems hung up on this point? But it’s not the root causing of the inconsistency issue. Can anyone verify the best practice for number of CF’s? From: Robert Coli [mailto:rc...@eventbrite.com<mailto:rc...@eventbrite.com>] Sent: 30 September 2015 18:45 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Consistency Issues On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen mailto:stephen.wa...@aspect.com>> wrote: We never had these issue with our first run. Its only when we added another 25% of writes. As Jack said, you are probably pushing your GC over a threshold, leading to long pause times and inability to meet quorum. As Sebastian said, you probably shouldn't need a load balancer in front of Cassandra. =Rob This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments. This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.
RE: Consistency Issues
Can you tell us how much time your gcs are taking? Do you see any especially long ones? On 1 Oct 2015 09:37, "Walsh, Stephen" wrote: > There is no load balancer in front of Cassandra, it’s in front of our > application. > > Everyone seems hung up on this point? But it’s not the root causing of the > inconsistency issue. > > > > Can anyone verify the best practice for number of CF’s? > > > > > > *From:* Robert Coli [mailto:rc...@eventbrite.com] > *Sent:* 30 September 2015 18:45 > *To:* user@cassandra.apache.org > *Subject:* Re: Consistency Issues > > > > On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen > wrote: > > > > We never had these issue with our first run. Its only when we added > another 25% of writes. > > > > As Jack said, you are probably pushing your GC over a threshold, leading > to long pause times and inability to meet quorum. > > > > As Sebastian said, you probably shouldn't need a load balancer in front of > Cassandra. > > > > =Rob > > > This email (including any attachments) is proprietary to Aspect Software, > Inc. and may contain information that is confidential. If you have received > this message in error, please do not read, copy or forward this message. > Please notify the sender immediately, delete it from your system and > destroy any copies. You may not further disclose or distribute this email > or its attachments. >
RE: Consistency Issues
There is no load balancer in front of Cassandra, it’s in front of our application. Everyone seems hung up on this point? But it’s not the root causing of the inconsistency issue. Can anyone verify the best practice for number of CF’s? From: Robert Coli [mailto:rc...@eventbrite.com] Sent: 30 September 2015 18:45 To: user@cassandra.apache.org Subject: Re: Consistency Issues On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen mailto:stephen.wa...@aspect.com>> wrote: We never had these issue with our first run. Its only when we added another 25% of writes. As Jack said, you are probably pushing your GC over a threshold, leading to long pause times and inability to meet quorum. As Sebastian said, you probably shouldn't need a load balancer in front of Cassandra. =Rob This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.
Re: Consistency Issues
On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen wrote: > > We never had these issue with our first run. Its only when we added > another 25% of writes. > As Jack said, you are probably pushing your GC over a threshold, leading to long pause times and inability to meet quorum. As Sebastian said, you probably shouldn't need a load balancer in front of Cassandra. =Rob
RE: Consistency Issues
Many thanks for your reply Sebastian, But the load balancer is being used with our applications, not with Cassandra. It just allows use to increase the through-put to Cassandra Our Generation Tool -> Load Balancer -> Our Processing Application -> Cassandra Sebastian, is jack correct in best practices for the number of CF’s? From: Sebastian Estevez [mailto:sebastian.este...@datastax.com] Sent: 30 September 2015 17:29 To: user@cassandra.apache.org Subject: Re: Consistency Issues Can you provide exact details on where your load balancer is? Like Michael said, you shouldn't need one between your client and the c* cluster if you're using a DataStax driver. All the best, [datastax_logo.png]<http://www.datastax.com/> Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com<mailto:sebastian.este...@datastax.com> [linkedin.png]<https://www.linkedin.com/company/datastax>[facebook.png]<https://www.facebook.com/datastax>[twitter.png]<https://twitter.com/datastax>[g+.png]<https://plus.google.com/+Datastax/about>[https://lh6.googleusercontent.com/24_538J0j5M0NHQx-jkRiV_IHrhsh-98hpi--Qz9b0-I4llvWuYI6LgiVJsul0AhxL0gMTOHgw3G0SvIXaT2C7fsKKa_DdQ2uOJ-bQ6h_mQ7k7iMybcR1dr1VhWgLMxcmg]<http://feeds.feedburner.com/datastax> [http://datastax.com/images/Summit_Email.png]<http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature> DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Wed, Sep 30, 2015 at 12:06 PM, Walsh, Stephen mailto:stephen.wa...@aspect.com>> wrote: Many thanks all, The Load balancers are only between our own node and not as a middle-man to Cassandra. It’s just so we can push more data into Cassandra. The only reason we are not using 2.1.9 is time , we haven’t had time to test upgrades. I wasn’t able to find any best practices for number of CF, where do you see this documented? I see a lot of comments on 1,000 CF’s Vs 1,000 key spaces. Errors around a few times a second, about 10 or so. They are constant. Our TTL is 10 seconds on data with gc_grace_seconds set to 0 on each CF. We don’t seem to get any OOM errors. We never had these issue with our first run. Its only when we added another 25% of writes. Many thanks for taking the time to reply Jack From: Jack Krupansky [mailto:jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>] Sent: 30 September 2015 16:53 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Consistency Issues More than "low hundreds" (200 or 300 max, and preferably under 100) of tables/column families is not exactly a recommended best practice. You may be able to get it to work, but probably only with very heavy tuning (i.e., lots of time and playing with options) on your own part. IOW, no quick and easy solution. The only immediate issue that pops to mind is that you are hitting a GC pause due to the large heap size and high volume. How frequent are these errors occurring? Like, how much data can you load before the first one pops up, and are they then frequent/constant or just occasionally/rarely? Can you test to see if you can see similar timeouts with say only 100 or 50 tables? At least that might isolate whether the issue relates at all to the number of tables vs. raw data rate or GC pause. Sometimes you can reduce/eliminate the GC pause issue by reducing the heap so that it is only modestly above the minimum required to avoid OOM. -- Jack Krupansky On Wed, Sep 30, 2015 at 11:22 AM, Walsh, Stephen mailto:stephen.wa...@aspect.com>> wrote: More information, I’ve just setup a NTP server to rule out any timing issues. And I also see this in the Cassandra node log files MessagingService-Incoming-/172.31.22.4<http://172.31.22.4>] 2015-09-30 15:19:14,769 IncomingTcpConnection.java:97 - UnknownColumnFamilyException reading from socket; closing org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=cf411b50-6785-11e5-a435-e7be20c92086 Any idea what this is related too? All these tests are run with a clean setup of Cassandra nodes followed by a nodetool repair. Before any data hits them. From: Walsh, Stephen [mailto:stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>] Sent: 30 September 2015 15:17 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Consistency Issues Hi there, We are having some issues with consistency. I’ll try my best to explain. We have an application that was
Re: Consistency Issues
Can you provide exact details on where your load balancer is? Like Michael said, you shouldn't need one between your client and the c* cluster if you're using a DataStax driver. All the best, [image: datastax_logo.png] <http://www.datastax.com/> Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image: facebook.png] <https://www.facebook.com/datastax> [image: twitter.png] <https://twitter.com/datastax> [image: g+.png] <https://plus.google.com/+Datastax/about> <http://feeds.feedburner.com/datastax> <http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature> DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Wed, Sep 30, 2015 at 12:06 PM, Walsh, Stephen wrote: > Many thanks all, > > > > The Load balancers are only between our own node and not as a middle-man > to Cassandra. It’s just so we can push more data into Cassandra. > > The only reason we are not using 2.1.9 is time , we haven’t had time to > test upgrades. > > > > I wasn’t able to find any best practices for number of CF, where do you > see this documented? > > I see a lot of comments on 1,000 CF’s Vs 1,000 key spaces. > > > > Errors around a few times a second, about 10 or so. > > They are constant. > > > > Our TTL is 10 seconds on data with gc_grace_seconds set to 0 on each CF. > > We don’t seem to get any OOM errors. > > > > We never had these issue with our first run. Its only when we added > another 25% of writes. > > > > Many thanks for taking the time to reply Jack > > > > > > > > *From:* Jack Krupansky [mailto:jack.krupan...@gmail.com] > *Sent:* 30 September 2015 16:53 > *To:* user@cassandra.apache.org > *Subject:* Re: Consistency Issues > > > > More than "low hundreds" (200 or 300 max, and preferably under 100) of > tables/column families is not exactly a recommended best practice. You may > be able to get it to work, but probably only with very heavy tuning (i.e., > lots of time and playing with options) on your own part. IOW, no quick and > easy solution. > > > > The only immediate issue that pops to mind is that you are hitting a GC > pause due to the large heap size and high volume. > > > > How frequent are these errors occurring? Like, how much data can you load > before the first one pops up, and are they then frequent/constant or just > occasionally/rarely? > > > > Can you test to see if you can see similar timeouts with say only 100 or > 50 tables? At least that might isolate whether the issue relates at all to > the number of tables vs. raw data rate or GC pause. > > > > Sometimes you can reduce/eliminate the GC pause issue by reducing the heap > so that it is only modestly above the minimum required to avoid OOM. > > > > > -- Jack Krupansky > > > > On Wed, Sep 30, 2015 at 11:22 AM, Walsh, Stephen > wrote: > > More information, > > > > I’ve just setup a NTP server to rule out any timing issues. > > And I also see this in the Cassandra node log files > > > > MessagingService-Incoming-/172.31.22.4] 2015-09-30 15:19:14,769 > IncomingTcpConnection.java:97 - UnknownColumnFamilyException reading from > socket; closing > > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find > cfId=cf411b50-6785-11e5-a435-e7be20c92086 > > > > Any idea what this is related too? > > All these tests are run with a clean setup of Cassandra nodes followed by > a nodetool repair. > > Before any data hits them. > > > > > > *From:* Walsh, Stephen [mailto:stephen.wa...@aspect.com] > *Sent:* 30 September 2015 15:17 > *To:* user@cassandra.apache.org > *Subject:* Consistency Issues > > > > Hi there, > > > > We are having some issues with consistency. I’ll try my best to explain. > > > > We have an application that was able to > > Write ~1000 p/s > > Read ~300 p/s > > Total CF created: 400 > > Total Keyspaces created : 80 > > > > On a 4 node Cassandra Cluster with > > Version 2.1.6 > > Replication : 3 > > Consistency (Read & Write) : LOCAL_QUORUM > > Cores : 4 > > Ram : 15 GB &g
RE: Consistency Issues
Many thanks all, The Load balancers are only between our own node and not as a middle-man to Cassandra. It’s just so we can push more data into Cassandra. The only reason we are not using 2.1.9 is time , we haven’t had time to test upgrades. I wasn’t able to find any best practices for number of CF, where do you see this documented? I see a lot of comments on 1,000 CF’s Vs 1,000 key spaces. Errors around a few times a second, about 10 or so. They are constant. Our TTL is 10 seconds on data with gc_grace_seconds set to 0 on each CF. We don’t seem to get any OOM errors. We never had these issue with our first run. Its only when we added another 25% of writes. Many thanks for taking the time to reply Jack From: Jack Krupansky [mailto:jack.krupan...@gmail.com] Sent: 30 September 2015 16:53 To: user@cassandra.apache.org Subject: Re: Consistency Issues More than "low hundreds" (200 or 300 max, and preferably under 100) of tables/column families is not exactly a recommended best practice. You may be able to get it to work, but probably only with very heavy tuning (i.e., lots of time and playing with options) on your own part. IOW, no quick and easy solution. The only immediate issue that pops to mind is that you are hitting a GC pause due to the large heap size and high volume. How frequent are these errors occurring? Like, how much data can you load before the first one pops up, and are they then frequent/constant or just occasionally/rarely? Can you test to see if you can see similar timeouts with say only 100 or 50 tables? At least that might isolate whether the issue relates at all to the number of tables vs. raw data rate or GC pause. Sometimes you can reduce/eliminate the GC pause issue by reducing the heap so that it is only modestly above the minimum required to avoid OOM. -- Jack Krupansky On Wed, Sep 30, 2015 at 11:22 AM, Walsh, Stephen mailto:stephen.wa...@aspect.com>> wrote: More information, I’ve just setup a NTP server to rule out any timing issues. And I also see this in the Cassandra node log files MessagingService-Incoming-/172.31.22.4<http://172.31.22.4>] 2015-09-30 15:19:14,769 IncomingTcpConnection.java:97 - UnknownColumnFamilyException reading from socket; closing org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=cf411b50-6785-11e5-a435-e7be20c92086 Any idea what this is related too? All these tests are run with a clean setup of Cassandra nodes followed by a nodetool repair. Before any data hits them. From: Walsh, Stephen [mailto:stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>] Sent: 30 September 2015 15:17 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Consistency Issues Hi there, We are having some issues with consistency. I’ll try my best to explain. We have an application that was able to Write ~1000 p/s Read ~300 p/s Total CF created: 400 Total Keyspaces created : 80 On a 4 node Cassandra Cluster with Version 2.1.6 Replication : 3 Consistency (Read & Write) : LOCAL_QUORUM Cores : 4 Ram : 15 GB Heap Size 8GB This was fine and worked, but was pushing our application to the max. - Next we added a load balancer (HaProxy) to our application. So now we have 3 of our nodes talking to 4 Cassandra Nodes with a load of Write ~1250 p/s Read 0p/s Total CF created: 450 Total Keyspaces created : 100 On our application we now see Cassandra timeout during write query at consistency LOCAL_QUORUM (2 replica were required but only 1 acknowledged the write) (we are using java Cassandra driver 2.1.6) So we increased the number of Cassandra nodes To 5, then 6 and each time got the same replication error. So then we double the spec of every node to 8 cores 30GB RAM Heap size 15GB And we still get this replication error (2 replica were required but only 1 acknowledged the write) We know that when we introduce HaProxy Load balancer with 3 of our nodes that its hits Cassandra 3 times quicker. But we’ve now increased the Cassandra spec nearly 3 fold, and only for an extra 250 writes p/s and it still doesn’t work. We’re having a hard time finding out why replication is an issue with the size of a cluster. We tried to get OpsCenter working to monitor the nodes, but due to the amount of CF’s in Cassandra the datastax-agent takes 90% of the CPU on every node. Any suggestion / recommendation would be very welcome. Regards Stephen Walsh This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments. This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information
Re: Consistency Issues
More than "low hundreds" (200 or 300 max, and preferably under 100) of tables/column families is not exactly a recommended best practice. You may be able to get it to work, but probably only with very heavy tuning (i.e., lots of time and playing with options) on your own part. IOW, no quick and easy solution. The only immediate issue that pops to mind is that you are hitting a GC pause due to the large heap size and high volume. How frequent are these errors occurring? Like, how much data can you load before the first one pops up, and are they then frequent/constant or just occasionally/rarely? Can you test to see if you can see similar timeouts with say only 100 or 50 tables? At least that might isolate whether the issue relates at all to the number of tables vs. raw data rate or GC pause. Sometimes you can reduce/eliminate the GC pause issue by reducing the heap so that it is only modestly above the minimum required to avoid OOM. -- Jack Krupansky On Wed, Sep 30, 2015 at 11:22 AM, Walsh, Stephen wrote: > More information, > > > > I’ve just setup a NTP server to rule out any timing issues. > > And I also see this in the Cassandra node log files > > > > MessagingService-Incoming-/172.31.22.4] 2015-09-30 15:19:14,769 > IncomingTcpConnection.java:97 - UnknownColumnFamilyException reading from > socket; closing > > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find > cfId=cf411b50-6785-11e5-a435-e7be20c92086 > > > > Any idea what this is related too? > > All these tests are run with a clean setup of Cassandra nodes followed by > a nodetool repair. > > Before any data hits them. > > > > > > *From:* Walsh, Stephen [mailto:stephen.wa...@aspect.com] > *Sent:* 30 September 2015 15:17 > *To:* user@cassandra.apache.org > *Subject:* Consistency Issues > > > > Hi there, > > > > We are having some issues with consistency. I’ll try my best to explain. > > > > We have an application that was able to > > Write ~1000 p/s > > Read ~300 p/s > > Total CF created: 400 > > Total Keyspaces created : 80 > > > > On a 4 node Cassandra Cluster with > > Version 2.1.6 > > Replication : 3 > > Consistency (Read & Write) : LOCAL_QUORUM > > Cores : 4 > > Ram : 15 GB > > Heap Size 8GB > > > > This was fine and worked, but was pushing our application to the max. > > > > - > > > > Next we added a load balancer (HaProxy) to our application. > > So now we have 3 of our nodes talking to 4 Cassandra Nodes with a load of > > Write ~1250 p/s > > Read 0p/s > > Total CF created: 450 > > Total Keyspaces created : 100 > > > > On our application we now see > > Cassandra timeout during write query at consistency LOCAL_QUORUM (2 > replica were required but only 1 acknowledged the write) > > (we are using java Cassandra driver 2.1.6) > > > > So we increased the number of Cassandra nodes > > To 5, then 6 and each time got the same replication error. > > > > So then we double the spec of every node to > > 8 cores > > 30GB RAM > > Heap size 15GB > > > > And we still get this replication error (2 replica were required but only > 1 acknowledged the write) > > > > We know that when we introduce HaProxy Load balancer with 3 of our nodes > that its hits Cassandra 3 times quicker. > > But we’ve now increased the Cassandra spec nearly 3 fold, and only for an > extra 250 writes p/s and it still doesn’t work. > > > > We’re having a hard time finding out why replication is an issue with the > size of a cluster. > > > > We tried to get OpsCenter working to monitor the nodes, but due to the > amount of CF’s in Cassandra the datastax-agent takes 90% of the CPU on > every node. > > > > Any suggestion / recommendation would be very welcome. > > > > Regards > > Stephen Walsh > > > > > > > > This email (including any attachments) is proprietary to Aspect Software, > Inc. and may contain information that is confidential. If you have received > this message in error, please do not read, copy or forward this message. > Please notify the sender immediately, delete it from your system and > destroy any copies. You may not further disclose or distribute this email > or its attachments. > This email (including any attachments) is proprietary to Aspect Software, > Inc. and may contain information that is confidential. If you have received > this message in error, please do not read, copy or forward this message. > Please notify the sender immediately, delete it from your system and > destroy any copies. You may not further disclose or distribute this email > or its attachments. >
Re: Consistency Issues
What client are you using? Official java and python clients should not have a LB between them and the C* nodes AFAIK. Why aren't you using 2.1.9? Have you checked for schema agreement amongst all nodes? ml On Wed, Sep 30, 2015 at 11:22 AM, Walsh, Stephen wrote: > More information, > > > > I’ve just setup a NTP server to rule out any timing issues. > > And I also see this in the Cassandra node log files > > > > MessagingService-Incoming-/172.31.22.4] 2015-09-30 15:19:14,769 > IncomingTcpConnection.java:97 - UnknownColumnFamilyException reading from > socket; closing > > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find > cfId=cf411b50-6785-11e5-a435-e7be20c92086 > > > > Any idea what this is related too? > > All these tests are run with a clean setup of Cassandra nodes followed by > a nodetool repair. > > Before any data hits them. > > > > > > *From:* Walsh, Stephen [mailto:stephen.wa...@aspect.com] > *Sent:* 30 September 2015 15:17 > *To:* user@cassandra.apache.org > *Subject:* Consistency Issues > > > > Hi there, > > > > We are having some issues with consistency. I’ll try my best to explain. > > > > We have an application that was able to > > Write ~1000 p/s > > Read ~300 p/s > > Total CF created: 400 > > Total Keyspaces created : 80 > > > > On a 4 node Cassandra Cluster with > > Version 2.1.6 > > Replication : 3 > > Consistency (Read & Write) : LOCAL_QUORUM > > Cores : 4 > > Ram : 15 GB > > Heap Size 8GB > > > > This was fine and worked, but was pushing our application to the max. > > > > - > > > > Next we added a load balancer (HaProxy) to our application. > > So now we have 3 of our nodes talking to 4 Cassandra Nodes with a load of > > Write ~1250 p/s > > Read 0p/s > > Total CF created: 450 > > Total Keyspaces created : 100 > > > > On our application we now see > > Cassandra timeout during write query at consistency LOCAL_QUORUM (2 > replica were required but only 1 acknowledged the write) > > (we are using java Cassandra driver 2.1.6) > > > > So we increased the number of Cassandra nodes > > To 5, then 6 and each time got the same replication error. > > > > So then we double the spec of every node to > > 8 cores > > 30GB RAM > > Heap size 15GB > > > > And we still get this replication error (2 replica were required but only > 1 acknowledged the write) > > > > We know that when we introduce HaProxy Load balancer with 3 of our nodes > that its hits Cassandra 3 times quicker. > > But we’ve now increased the Cassandra spec nearly 3 fold, and only for an > extra 250 writes p/s and it still doesn’t work. > > > > We’re having a hard time finding out why replication is an issue with the > size of a cluster. > > > > We tried to get OpsCenter working to monitor the nodes, but due to the > amount of CF’s in Cassandra the datastax-agent takes 90% of the CPU on > every node. > > > > Any suggestion / recommendation would be very welcome. > > > > Regards > > Stephen Walsh > > > > > > > > This email (including any attachments) is proprietary to Aspect Software, > Inc. and may contain information that is confidential. If you have received > this message in error, please do not read, copy or forward this message. > Please notify the sender immediately, delete it from your system and > destroy any copies. You may not further disclose or distribute this email > or its attachments. > This email (including any attachments) is proprietary to Aspect Software, > Inc. and may contain information that is confidential. If you have received > this message in error, please do not read, copy or forward this message. > Please notify the sender immediately, delete it from your system and > destroy any copies. You may not further disclose or distribute this email > or its attachments. >
RE: Consistency Issues
More information, I've just setup a NTP server to rule out any timing issues. And I also see this in the Cassandra node log files MessagingService-Incoming-/172.31.22.4] 2015-09-30 15:19:14,769 IncomingTcpConnection.java:97 - UnknownColumnFamilyException reading from socket; closing org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=cf411b50-6785-11e5-a435-e7be20c92086 Any idea what this is related too? All these tests are run with a clean setup of Cassandra nodes followed by a nodetool repair. Before any data hits them. From: Walsh, Stephen [mailto:stephen.wa...@aspect.com] Sent: 30 September 2015 15:17 To: user@cassandra.apache.org Subject: Consistency Issues Hi there, We are having some issues with consistency. I'll try my best to explain. We have an application that was able to Write ~1000 p/s Read ~300 p/s Total CF created: 400 Total Keyspaces created : 80 On a 4 node Cassandra Cluster with Version 2.1.6 Replication : 3 Consistency (Read & Write) : LOCAL_QUORUM Cores : 4 Ram : 15 GB Heap Size 8GB This was fine and worked, but was pushing our application to the max. - Next we added a load balancer (HaProxy) to our application. So now we have 3 of our nodes talking to 4 Cassandra Nodes with a load of Write ~1250 p/s Read 0p/s Total CF created: 450 Total Keyspaces created : 100 On our application we now see Cassandra timeout during write query at consistency LOCAL_QUORUM (2 replica were required but only 1 acknowledged the write) (we are using java Cassandra driver 2.1.6) So we increased the number of Cassandra nodes To 5, then 6 and each time got the same replication error. So then we double the spec of every node to 8 cores 30GB RAM Heap size 15GB And we still get this replication error (2 replica were required but only 1 acknowledged the write) We know that when we introduce HaProxy Load balancer with 3 of our nodes that its hits Cassandra 3 times quicker. But we've now increased the Cassandra spec nearly 3 fold, and only for an extra 250 writes p/s and it still doesn't work. We're having a hard time finding out why replication is an issue with the size of a cluster. We tried to get OpsCenter working to monitor the nodes, but due to the amount of CF's in Cassandra the datastax-agent takes 90% of the CPU on every node. Any suggestion / recommendation would be very welcome. Regards Stephen Walsh This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments. This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.
Re: Consistency Issues
It looks like NTP was the problem. Thanks for the solution!!! On Wed, May 13, 2015 at 9:20 AM, Robert Wille wrote: > Timestamps have millisecond granularity. If you make multiple writes > within the same millisecond, then the outcome is not deterministic. > > Also, make sure you are running ntp. Clock skew will manifest itself > similarly. > > On May 13, 2015, at 3:47 AM, Jared Rodriguez > wrote: > > Thanks for the feedback. We have dug in deeper and upgraded to > Cassandra 2.0.14 and are seeing the same issue. What appears to be > happening is that if a record is initially written, then the first read is > fine. But if we immediately update that record with a second write, that > then the second read is problematic. > > We have a 4 node cluster and a replication factor of 2. What seems to > be happening on the initial write the record is sent to nodes A and B. If > a secondary write (update) of the record occurs while the record is in the > memtable and not yet written to the sstable of A or B, that the next read > returns nothing. > > We are continuing to dig in and get as much detail as possible before > opening this as a JIRA. > > On Tue, May 12, 2015 at 6:51 PM, Robert Coli wrote: > >> On Tue, May 12, 2015 at 12:35 PM, Michael Shuler > > wrote: >> >>> This is a 4 node cluster running Cassandra 2.0.6 >>> >>> Can you reproduce the same issue on 2.0.14? (or better yet, the >>> cassandra-2.0 branch HEAD, which will soon ship 2.0.15) If you get the same >>> results, please, open a JIRA with the reproduction steps. >> >> >> And if you do file such a JIRA, please let the list know the JIRA URL, >> to close the loop! >> >> =Rob >> >> > > > > -- > Jared Rodriguez > > > -- Jared Rodriguez
Re: Consistency Issues
Timestamps have millisecond granularity. If you make multiple writes within the same millisecond, then the outcome is not deterministic. Also, make sure you are running ntp. Clock skew will manifest itself similarly. On May 13, 2015, at 3:47 AM, Jared Rodriguez mailto:jrodrig...@kitedesk.com>> wrote: Thanks for the feedback. We have dug in deeper and upgraded to Cassandra 2.0.14 and are seeing the same issue. What appears to be happening is that if a record is initially written, then the first read is fine. But if we immediately update that record with a second write, that then the second read is problematic. We have a 4 node cluster and a replication factor of 2. What seems to be happening on the initial write the record is sent to nodes A and B. If a secondary write (update) of the record occurs while the record is in the memtable and not yet written to the sstable of A or B, that the next read returns nothing. We are continuing to dig in and get as much detail as possible before opening this as a JIRA. On Tue, May 12, 2015 at 6:51 PM, Robert Coli mailto:rc...@eventbrite.com>> wrote: On Tue, May 12, 2015 at 12:35 PM, Michael Shuler mailto:mich...@pbandjelly.org>> wrote: This is a 4 node cluster running Cassandra 2.0.6 Can you reproduce the same issue on 2.0.14? (or better yet, the cassandra-2.0 branch HEAD, which will soon ship 2.0.15) If you get the same results, please, open a JIRA with the reproduction steps. And if you do file such a JIRA, please let the list know the JIRA URL, to close the loop! =Rob -- Jared Rodriguez
Re: Consistency Issues
Thanks for the feedback. We have dug in deeper and upgraded to Cassandra 2.0.14 and are seeing the same issue. What appears to be happening is that if a record is initially written, then the first read is fine. But if we immediately update that record with a second write, that then the second read is problematic. We have a 4 node cluster and a replication factor of 2. What seems to be happening on the initial write the record is sent to nodes A and B. If a secondary write (update) of the record occurs while the record is in the memtable and not yet written to the sstable of A or B, that the next read returns nothing. We are continuing to dig in and get as much detail as possible before opening this as a JIRA. On Tue, May 12, 2015 at 6:51 PM, Robert Coli wrote: > On Tue, May 12, 2015 at 12:35 PM, Michael Shuler > wrote: > >> This is a 4 node cluster running Cassandra 2.0.6 >>> >> >> Can you reproduce the same issue on 2.0.14? (or better yet, the >> cassandra-2.0 branch HEAD, which will soon ship 2.0.15) If you get the same >> results, please, open a JIRA with the reproduction steps. > > > And if you do file such a JIRA, please let the list know the JIRA URL, to > close the loop! > > =Rob > > -- Jared Rodriguez
Re: Consistency Issues
On Tue, May 12, 2015 at 12:35 PM, Michael Shuler wrote: > This is a 4 node cluster running Cassandra 2.0.6 >> > > Can you reproduce the same issue on 2.0.14? (or better yet, the > cassandra-2.0 branch HEAD, which will soon ship 2.0.15) If you get the same > results, please, open a JIRA with the reproduction steps. And if you do file such a JIRA, please let the list know the JIRA URL, to close the loop! =Rob
Re: Consistency Issues
On 05/12/2015 04:50 AM, Jared Rodriguez wrote: I have a specific update and query that I need to ensure has strong consistency. To that end, when I do the write, I set the consistency level to ALL. Shortly afterwards, I do a query for that record with a consistency of ONE and somehow get back stale data. This is a 4 node cluster running Cassandra 2.0.6 Can you reproduce the same issue on 2.0.14? (or better yet, the cassandra-2.0 branch HEAD, which will soon ship 2.0.15) If you get the same results, please, open a JIRA with the reproduction steps. -- Kind regards, Michael