2010/4/24 Andrew Nguyen <andrew-lists-hb...@ucsfcti.org> > Hello all, > > Each row key is of the form "PatientName-PhysiologicParameter" and each > column name is the timestamp of the reading. >
With such design in hbase (in opposite to cassandra) you should use row filters to get only part of data (for example last year) or use client filtering with row scan. If data series will be big (>100) you will run in issue of infra row scanning https://issues.apache.org/jira/browse/HBASE-1537, as I did. Another issue, as mentioned before, is scaling. Hbase splits data by rows. Нou have to figure out how much data will be in a row, and if it counts to hundreds, use compound key (patient-code-date), If they are small, may be more easy to use will be (patient-code) because you can use Get operations with locks (if you need them), and in case of dated key, you can't (because scan doesn't yet honor locks). > Give me all blood pressures for Bob between two dates > Give me all blood pressures, and intracranial pressures for Bob from <date> > until present > Looks like patient-code-date is preferred way. In you case model can be: patient-code-date -> series:value. > In other words, the queries will be very patient-centric, or > patient-physiologic parameter-centric. > > Thanks, > Andrew