Re: Modeling column families

Andrew Nguyen Sat, 24 Apr 2010 13:10:50 -0700

Ryan,

Exactly, eventually, we will be storing data continuously on N beds in the ICU. 
 So, if it's waveform data, it's probably going to be 125 Hz which is about 3.9 
billion points per bed, times N beds.  I've been trying to find out what sort 
of search terms to use to dive deeper and "compound keys" with respect to NoSQL 
solutions.

You mention tall tables - this sounds consistent with what Erik and Andrey have 
said.  Given that, just to clarify my understanding, I'm probably looking at a 
single table with only one column (the value, which Andrey names as 
"series"???) and billiions of rows, right?

That said, the decision to break up the values into multiple column families is 
just a function of performance and how I want the data physically stored.  Are 
there any other major points to consider for determining what column families 
to have?  (I made this conclusion from your hbase-nosql presentation on 
slideshare.)

Thanks all!

--Andrew

On Apr 24, 2010, at 12:59 PM, Ryan Rawson wrote:

> For example if you are storing timeseries data for a monitoring
> system, you might want to store it by row, since the number of points
> for a single system might be arbitrarily large (think: 2 years+ of
> data). In this case if the expected data set size per row is larger
> than what a single machine could conceivably store, Cassandra would
> not work for you in this case (since each row must be stored on a
> single (er N) node(s)).

Re: Modeling column families

Reply via email to