Re: Are Triggers in Cassandra 2.1.2 performace Hog??
DSE does now have a queue to decouple Cassandra insert and Solr indexing. It will block only when/if the queue is filled - you can configure the size of the queue. So, to be clear, DSE no longer has the highlighted problem mentioned for ES. -- Jack Krupansky On Wed, Jan 7, 2015 at 9:46 AM, Ken Hancock wrote: > When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the > same problem that you highlight, no different than your good idea of > asynchronously pushing to ES. > > Each Cassandra write was indexed independently by each server in the > replication group. If a node timed out or a mutation was dropped, that > Solr node would have an out-of-sync index. Doing a solr query such as > count(*) users could return inconsistent results depending on which node > you hit since solr didn't support Cassandra consistency levels. > > I haven't seen any blog posts or docs as to whether this intrinsic > mismatch between how Cassandra handles eventual consistency and Solr has > ever been resolved. > > Ken > > > On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan wrote: > >> Be very very careful not to perform blocking calls to ElasticSearch in >> your trigger otherwise you will kill C* performance. The biggest danger of >> the triggers in their current state is that they are on the write path. >> >> In your trigger, you can try to push the mutation asynchronously to ES >> but in this case it will mean managing a thread pool and all related issues. >> >> Not even mentioning atomicity issues like: what happen if the update to >> ES fails or the connection times out ? etc ... >> >> As an alternative, instead of implementing yourself the integration with >> ES, you can have a look at Datastax Enterprise integration of Cassandra >> with Apache Solr (not free) or some open-source alternatives like Stratio >> or TupleJump fork of Cassandra with Lucene integration. >> >> On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK >> wrote: >> >>> HI All, >>> >>> We are trying to integrate elasticsearch with Cassandra and as the river >>> plugin uses select * from any table it seems to be bad performance choice. >>> So i was thinking of inserting into elasticsearch using Cassandra trigger. >>> So i wanted your view does a Cassandra Trigger impacts the performance >>> of read/Write of Cassandra. >>> >>> Also any other way you guys achieve this please guide me. I am struck on >>> this . >>> >>> Regards >>> Asit >>> >>> >> > > > >
Re: Are Triggers in Cassandra 2.1.2 performace Hog??
+1. Don't use triggers. On Wed, Jan 7, 2015 at 10:49 AM, Robert Coli wrote: > On Wed, Jan 7, 2015 at 5:40 AM, Asit KAUSHIK > wrote: >> >> We are trying to integrate elasticsearch with Cassandra and as the river >> plugin uses select * from any table it seems to be bad performance choice. >> So i was thinking of inserting into elasticsearch using Cassandra trigger. >> So i wanted your view does a Cassandra Trigger impacts the performance of >> read/Write of Cassandra. > > > I would not use triggers in production in their current form. > > =Rob -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade
Re: Are Triggers in Cassandra 2.1.2 performace Hog??
On Wed, Jan 7, 2015 at 5:40 AM, Asit KAUSHIK wrote: > We are trying to integrate elasticsearch with Cassandra and as the river > plugin uses select * from any table it seems to be bad performance choice. > So i was thinking of inserting into elasticsearch using Cassandra trigger. > So i wanted your view does a Cassandra Trigger impacts the performance of > read/Write of Cassandra. > I would not use triggers in production in their current form. =Rob
Re: Are Triggers in Cassandra 2.1.2 performace Hog??
@Ken So I actually support a lot of the DSE Search users and teach classes on it, so as long as you're not dropping mutations you're in sync, and if you're dropping mutations you're probably sized way too small anyway, and once you run repair (which you should be doing anyway when dropping mutations) you're back in sync. I actually think because of that the models work well together. FWIW the improvement since 3.0 is MASSIVE (it's been what I'd call stable since 3.2.x and we're on 4.6 now) @Asit to answer the ES question, it's not really for me to say at all what the lag will be or to help in advising sizing of ES, so that's probably more of a question for them. On Wed, Jan 7, 2015 at 8:56 AM, Asit KAUSHIK wrote: > HI All, > > What i intend to do is on every write i would push the code to > elasticsearch using the Trigger. I know it would impact the Cassandra write > but given that the WRITE is pretty performant on Cassandra would that lag > be a big one. > > Also as per my information SOLR has limitation of using Nested JSON > documents which is elasticsearch does seamlessly and hence it was our > preference. > > Please Let me know about you thought on this as we are struck on this and > i am looking into Streaming Part of cassandra in hope that i can find > something > > Regards > Asit > > > > On Wed, Jan 7, 2015 at 8:16 PM, Ken Hancock > wrote: > >> When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the >> same problem that you highlight, no different than your good idea of >> asynchronously pushing to ES. >> >> Each Cassandra write was indexed independently by each server in the >> replication group. If a node timed out or a mutation was dropped, that >> Solr node would have an out-of-sync index. Doing a solr query such as >> count(*) users could return inconsistent results depending on which node >> you hit since solr didn't support Cassandra consistency levels. >> >> I haven't seen any blog posts or docs as to whether this intrinsic >> mismatch between how Cassandra handles eventual consistency and Solr has >> ever been resolved. >> >> Ken >> >> >> On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan wrote: >> >>> Be very very careful not to perform blocking calls to ElasticSearch in >>> your trigger otherwise you will kill C* performance. The biggest danger of >>> the triggers in their current state is that they are on the write path. >>> >>> In your trigger, you can try to push the mutation asynchronously to ES >>> but in this case it will mean managing a thread pool and all related issues. >>> >>> Not even mentioning atomicity issues like: what happen if the update to >>> ES fails or the connection times out ? etc ... >>> >>> As an alternative, instead of implementing yourself the integration with >>> ES, you can have a look at Datastax Enterprise integration of Cassandra >>> with Apache Solr (not free) or some open-source alternatives like Stratio >>> or TupleJump fork of Cassandra with Lucene integration. >>> >>> On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK >> > wrote: >>> HI All, We are trying to integrate elasticsearch with Cassandra and as the river plugin uses select * from any table it seems to be bad performance choice. So i was thinking of inserting into elasticsearch using Cassandra trigger. So i wanted your view does a Cassandra Trigger impacts the performance of read/Write of Cassandra. Also any other way you guys achieve this please guide me. I am struck on this . Regards Asit >>> >> >> >> >> > -- Thanks, Ryan Svihla
Re: Are Triggers in Cassandra 2.1.2 performace Hog??
HI All, What i intend to do is on every write i would push the code to elasticsearch using the Trigger. I know it would impact the Cassandra write but given that the WRITE is pretty performant on Cassandra would that lag be a big one. Also as per my information SOLR has limitation of using Nested JSON documents which is elasticsearch does seamlessly and hence it was our preference. Please Let me know about you thought on this as we are struck on this and i am looking into Streaming Part of cassandra in hope that i can find something Regards Asit On Wed, Jan 7, 2015 at 8:16 PM, Ken Hancock wrote: > When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the > same problem that you highlight, no different than your good idea of > asynchronously pushing to ES. > > Each Cassandra write was indexed independently by each server in the > replication group. If a node timed out or a mutation was dropped, that > Solr node would have an out-of-sync index. Doing a solr query such as > count(*) users could return inconsistent results depending on which node > you hit since solr didn't support Cassandra consistency levels. > > I haven't seen any blog posts or docs as to whether this intrinsic > mismatch between how Cassandra handles eventual consistency and Solr has > ever been resolved. > > Ken > > > On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan wrote: > >> Be very very careful not to perform blocking calls to ElasticSearch in >> your trigger otherwise you will kill C* performance. The biggest danger of >> the triggers in their current state is that they are on the write path. >> >> In your trigger, you can try to push the mutation asynchronously to ES >> but in this case it will mean managing a thread pool and all related issues. >> >> Not even mentioning atomicity issues like: what happen if the update to >> ES fails or the connection times out ? etc ... >> >> As an alternative, instead of implementing yourself the integration with >> ES, you can have a look at Datastax Enterprise integration of Cassandra >> with Apache Solr (not free) or some open-source alternatives like Stratio >> or TupleJump fork of Cassandra with Lucene integration. >> >> On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK >> wrote: >> >>> HI All, >>> >>> We are trying to integrate elasticsearch with Cassandra and as the river >>> plugin uses select * from any table it seems to be bad performance choice. >>> So i was thinking of inserting into elasticsearch using Cassandra trigger. >>> So i wanted your view does a Cassandra Trigger impacts the performance >>> of read/Write of Cassandra. >>> >>> Also any other way you guys achieve this please guide me. I am struck on >>> this . >>> >>> Regards >>> Asit >>> >>> >> > > > >
Re: Are Triggers in Cassandra 2.1.2 performace Hog??
When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the same problem that you highlight, no different than your good idea of asynchronously pushing to ES. Each Cassandra write was indexed independently by each server in the replication group. If a node timed out or a mutation was dropped, that Solr node would have an out-of-sync index. Doing a solr query such as count(*) users could return inconsistent results depending on which node you hit since solr didn't support Cassandra consistency levels. I haven't seen any blog posts or docs as to whether this intrinsic mismatch between how Cassandra handles eventual consistency and Solr has ever been resolved. Ken On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan wrote: > Be very very careful not to perform blocking calls to ElasticSearch in > your trigger otherwise you will kill C* performance. The biggest danger of > the triggers in their current state is that they are on the write path. > > In your trigger, you can try to push the mutation asynchronously to ES but > in this case it will mean managing a thread pool and all related issues. > > Not even mentioning atomicity issues like: what happen if the update to ES > fails or the connection times out ? etc ... > > As an alternative, instead of implementing yourself the integration with > ES, you can have a look at Datastax Enterprise integration of Cassandra > with Apache Solr (not free) or some open-source alternatives like Stratio > or TupleJump fork of Cassandra with Lucene integration. > > On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK > wrote: > >> HI All, >> >> We are trying to integrate elasticsearch with Cassandra and as the river >> plugin uses select * from any table it seems to be bad performance choice. >> So i was thinking of inserting into elasticsearch using Cassandra trigger. >> So i wanted your view does a Cassandra Trigger impacts the performance of >> read/Write of Cassandra. >> >> Also any other way you guys achieve this please guide me. I am struck on >> this . >> >> Regards >> Asit >> >> >
Re: Are Triggers in Cassandra 2.1.2 performace Hog??
Be very very careful not to perform blocking calls to ElasticSearch in your trigger otherwise you will kill C* performance. The biggest danger of the triggers in their current state is that they are on the write path. In your trigger, you can try to push the mutation asynchronously to ES but in this case it will mean managing a thread pool and all related issues. Not even mentioning atomicity issues like: what happen if the update to ES fails or the connection times out ? etc ... As an alternative, instead of implementing yourself the integration with ES, you can have a look at Datastax Enterprise integration of Cassandra with Apache Solr (not free) or some open-source alternatives like Stratio or TupleJump fork of Cassandra with Lucene integration. On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK wrote: > HI All, > > We are trying to integrate elasticsearch with Cassandra and as the river > plugin uses select * from any table it seems to be bad performance choice. > So i was thinking of inserting into elasticsearch using Cassandra trigger. > So i wanted your view does a Cassandra Trigger impacts the performance of > read/Write of Cassandra. > > Also any other way you guys achieve this please guide me. I am struck on > this . > > Regards > Asit > >