Re: Modeling column families

Andrey Stepachev Sat, 24 Apr 2010 00:22:54 -0700

2010/4/24 Andrew Nguyen <andrew-lists-hb...@ucsfcti.org>

> Hello all,
>
> Each row key is of the form "PatientName-PhysiologicParameter" and each
> column name is the timestamp of the reading.
>


With such design in hbase (in opposite to cassandra) you should use row
filters to get only part of data (for example last year) or use client
filtering with row scan.
If data series will be big (>100) you will run in issue of infra row
scanning https://issues.apache.org/jira/browse/HBASE-1537,
as I did. Another issue, as mentioned before, is scaling. Hbase splits data
by rows.

Нou have to figure out how much data will be in a row, and if it counts to
hundreds, use compound key (patient-code-date),
If they are small, may be more easy to use will be (patient-code) because
you can use Get operations with locks (if you need them), and in case of
dated key, you can't (because scan doesn't yet honor locks).


> Give me all blood pressures for Bob between two dates
> Give me all blood pressures, and intracranial pressures for Bob from <date>
> until present
>

Looks like patient-code-date is preferred way. In you case model can be:
patient-code-date -> series:value.


> In other words, the queries will be very patient-centric, or
> patient-physiologic parameter-centric.
>
> Thanks,
> Andrew

Re: Modeling column families

Reply via email to