Re: A Big Data Trifecta: Storm, Kafka and Cassandra
--- On Sat, 8/4/12, Brian O'Neill wrote: > From: Brian O'Neill > Subject: A Big Data Trifecta: Storm, Kafka and Cassandra > To: user@cassandra.apache.org > Date: Saturday, August 4, 2012, 4:41 AM > Philip, > > I figured I would reply via blog post. =) > http://brianoneill.blogspot.com/2012/08/a-big-data-trifecta-storm-kafka-and.html Brian -- thanks again for this. It's always great to get a reference to another approach.
Re: Secondary index impact on write performance
Thanks. That was what I expected, but wanted to confirm. On Aug 4, 2012 11:24 AM, "Dave Brosius" wrote: > There is a second (system managed) column family for each secondary index, > so any write to a field that is indexed causes two writes, one to the main > column family, and another to the index column family, where in this index > column family the key is the value of the secondary column, and the value > is the key of the original row. > > > > On 08/04/2012 11:40 AM, David McNelis wrote: > >> Morning, >> >> Was reading up on secondary indexes and on the Datastax post about them, >> it mentions the additional management overhead, and also that if you alter >> an existing column family, that data will be updated in the background. >> But how do secondary indexes affect write performance? >> >> If the answer is "it doesn't", then how do brand new records get located >> by a subsequent indexed query? >> >> If someone has a link to a post with some of this info, that would be >> awesome. >> >> David >> > >
Re: Secondary index impact on write performance
There is a second (system managed) column family for each secondary index, so any write to a field that is indexed causes two writes, one to the main column family, and another to the index column family, where in this index column family the key is the value of the secondary column, and the value is the key of the original row. On 08/04/2012 11:40 AM, David McNelis wrote: Morning, Was reading up on secondary indexes and on the Datastax post about them, it mentions the additional management overhead, and also that if you alter an existing column family, that data will be updated in the background. But how do secondary indexes affect write performance? If the answer is "it doesn't", then how do brand new records get located by a subsequent indexed query? If someone has a link to a post with some of this info, that would be awesome. David
Secondary index impact on write performance
Morning, Was reading up on secondary indexes and on the Datastax post about them, it mentions the additional management overhead, and also that if you alter an existing column family, that data will be updated in the background. But how do secondary indexes affect write performance? If the answer is "it doesn't", then how do brand new records get located by a subsequent indexed query? If someone has a link to a post with some of this info, that would be awesome. David
A Big Data Trifecta: Storm, Kafka and Cassandra
Philip, I figured I would reply via blog post. =) http://brianoneill.blogspot.com/2012/08/a-big-data-trifecta-storm-kafka-and.html That blog post shows how we pieced together Kafka and Cassandra (via Storm). With LinkedIn behind Kafka, it is well supported. They use it in production. (and most likely we will too =) Let me know if you end up using it. Thus far, I think it pairs nicely with Cassandra, but we don't have it in production yet. -brian On Fri, Aug 3, 2012 at 3:41 PM, Milind Parikh wrote: > Kafka is relatively stable and has a active well-supported news-group as > well. > > As discussed by Brian, you would be inverting the paradigm of store-process. > Essentially in your original approach, you are storing the messages first > and then processing them after the fact. In the Kafka model, you would > process the messages as they come in. > > Since you are thinking about parallelism anyways, I trust that your > processing paradigm is inherently paralleizable. > > Regards > Milind > > > > > > On Fri, Aug 3, 2012 at 12:22 PM, Philip Nelson > wrote: >> >> Brian -- thanks. >> >> > We were looking to do the same thing, but in the end decided >> > to go with Kafka. >> > Given your throughput requirements, Kafka might be a good >> > option for you as well. >> >> This might be off-topic, so I'll keep it short. Kafka is reasonably >> stable? Mature (I see it's in the Incubator)? Relative to Cassandra? >> >> Philip >> >> > -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/