Re: Tracking word frequencies

2014-01-20 Thread David Tinker
I haven't actually tried to use that schema yet, it was just my first idea.
If we use that solution our app would have to read the whole table once a
day or so to find the top 5000'ish words.


On Fri, Jan 17, 2014 at 2:49 PM, Jonathan Lacefield jlacefi...@datastax.com
 wrote:

 Hi David,

   How do you know that you are receiving a seek for each row?  Are you
 querying for a specific word at a time or do the queries span multiple
 words, i.e. what's the query pattern? Also, what is your goal for read
 latency?  Most customers can achieve microsecond partition key base query
 reads with Cassanda.  This can be done through tuning, data modeling,
 and/or scaling.  Please post a cfhistograms for this table as well as
 provide some details on the specific queries you are running.

 Thanks,

 Jonathan

 Jonathan Lacefield
 Solutions Architect, DataStax
 (404) 822 3487
  http://www.linkedin.com/in/jlacefield



 http://www.datastax.com/what-we-offer/products-services/training/virtual-training


 On Fri, Jan 17, 2014 at 1:41 AM, David Tinker david.tin...@gmail.comwrote:

 I have an app that stores lots of bits of text in Cassandra. One of
 the things I need to do is keep a global word frequency table.
 Something like this:

 CREATE TABLE IF NOT EXISTS word_count (
   word text,
   count value,
   PRIMARY KEY (word)
 );

 This is slow to read as the rows (100's of thousands of them) each
 need a seek. Is there a better way to model this in Cassandra? I could
 periodically snapshot the rows into a fat row in another table I
 suppose.

 Or should I use Redis or something instead? I would prefer to keep it
 all Cassandra if possible.





-- 
http://qdb.io/ Persistent Message Queues With Replay and #RabbitMQ
Integration


Re: Tracking word frequencies

2014-01-20 Thread Colin
When updating, use table that uses rows of words and increment the count?

--
Colin 
+1 320 221 9531

 

 On Jan 20, 2014, at 6:58 AM, David Tinker david.tin...@gmail.com wrote:
 
 I haven't actually tried to use that schema yet, it was just my first idea. 
 If we use that solution our app would have to read the whole table once a day 
 or so to find the top 5000'ish words.
 
 
 On Fri, Jan 17, 2014 at 2:49 PM, Jonathan Lacefield 
 jlacefi...@datastax.com wrote:
 Hi David,
 
   How do you know that you are receiving a seek for each row?  Are you 
 querying for a specific word at a time or do the queries span multiple 
 words, i.e. what's the query pattern? Also, what is your goal for read 
 latency?  Most customers can achieve microsecond partition key base query 
 reads with Cassanda.  This can be done through tuning, data modeling, and/or 
 scaling.  Please post a cfhistograms for this table as well as provide some 
 details on the specific queries you are running.
 
 Thanks,
 
 Jonathan
 
 Jonathan Lacefield
 Solutions Architect, DataStax
 (404) 822 3487
 
 
 
 
 
 
 On Fri, Jan 17, 2014 at 1:41 AM, David Tinker david.tin...@gmail.com 
 wrote:
 I have an app that stores lots of bits of text in Cassandra. One of
 the things I need to do is keep a global word frequency table.
 Something like this:
 
 CREATE TABLE IF NOT EXISTS word_count (
   word text,
   count value,
   PRIMARY KEY (word)
 );
 
 This is slow to read as the rows (100's of thousands of them) each
 need a seek. Is there a better way to model this in Cassandra? I could
 periodically snapshot the rows into a fat row in another table I
 suppose.
 
 Or should I use Redis or something instead? I would prefer to keep it
 all Cassandra if possible.
 
 
 
 -- 
 http://qdb.io/ Persistent Message Queues With Replay and #RabbitMQ Integration


Re: Tracking word frequencies

2014-01-17 Thread Jonathan Lacefield
Hi David,

  How do you know that you are receiving a seek for each row?  Are you
querying for a specific word at a time or do the queries span multiple
words, i.e. what's the query pattern? Also, what is your goal for read
latency?  Most customers can achieve microsecond partition key base query
reads with Cassanda.  This can be done through tuning, data modeling,
and/or scaling.  Please post a cfhistograms for this table as well as
provide some details on the specific queries you are running.

Thanks,

Jonathan

Jonathan Lacefield
Solutions Architect, DataStax
(404) 822 3487
http://www.linkedin.com/in/jlacefield


http://www.datastax.com/what-we-offer/products-services/training/virtual-training


On Fri, Jan 17, 2014 at 1:41 AM, David Tinker david.tin...@gmail.comwrote:

 I have an app that stores lots of bits of text in Cassandra. One of
 the things I need to do is keep a global word frequency table.
 Something like this:

 CREATE TABLE IF NOT EXISTS word_count (
   word text,
   count value,
   PRIMARY KEY (word)
 );

 This is slow to read as the rows (100's of thousands of them) each
 need a seek. Is there a better way to model this in Cassandra? I could
 periodically snapshot the rows into a fat row in another table I
 suppose.

 Or should I use Redis or something instead? I would prefer to keep it
 all Cassandra if possible.



Tracking word frequencies

2014-01-16 Thread David Tinker
I have an app that stores lots of bits of text in Cassandra. One of
the things I need to do is keep a global word frequency table.
Something like this:

CREATE TABLE IF NOT EXISTS word_count (
  word text,
  count value,
  PRIMARY KEY (word)
);

This is slow to read as the rows (100's of thousands of them) each
need a seek. Is there a better way to model this in Cassandra? I could
periodically snapshot the rows into a fat row in another table I
suppose.

Or should I use Redis or something instead? I would prefer to keep it
all Cassandra if possible.