On Thu, Apr 15, 2010 at 4:36 PM, Ryan Rawson <[email protected]> wrote:

> From an implementation point of view, extremely large rows can become
> a problem.  Since region splits are on the row, if a a single row
> becomes larger than a region we become unable to split that to spread
> the load out.
>
> -ryan
>
> On Thu, Apr 15, 2010 at 11:54 AM, alex kamil <[email protected]> wrote:
> > Pierre-Alexandre,
> >
> > 'Temporal' as a term is a bit overloaded, if you mean 'temporal' in the
> > classic sense (http://en.wikipedia.org/wiki/Temporal_database), i'm not
> sure
> > Hbase would be a good fit for that
> >
> > if you simply want to store a time series, this is what i do to store a
> > sequence of events to allow range queries and sequential scans:
> >
> > RowKey=time (or sequence id)
> > ColumnFamilies ={event types}
> > Columns={event name:value}
> > *I don't use versioning
> >
> > If you are looking for aggregation reports on multi-dimensional time
> series
> > you may find this tool useful:http://github.com/zohmg/zohmg
> >
> > Regards
> > Alex
> >
> > On Thu, Apr 15, 2010 at 2:34 PM, Pierre-Alexandre St-Jean <
> > [email protected]> wrote:
> >
> >> Hi,
> >>
> >> I am quite new to hbase but i love the simplified api and the way it
> >> scales.
> >> I currently have a 3 node cluster of virtual machines and removing and
> >> adding them is really easy.
> >>
> >>
> >> I am in some data modeling struggle. I want to build some type of
> temporal
> >> database so here are my ideas and maybe you could tell me what would be
> the
> >> best to do.
> >>
> >> I want to analyze data over time. each data point has got attributes and
> >> then multiple values over time
> >>
> >>   #1- infinite versions
> >>
> >>
> >>
> >>  Table Row Key Family Attributs  points point name attributes Contains
> the
> >> column keys : description,unit. 1 Version      value No column key.
> >> Infinite
> >> versions
> >>
> >>
> >>
> >>  #2- value column = time
> >>
> >>
> >>
> >>  Table Row Key Family Attributs  points point name attributes Contains
> the
> >> column keys : description,unit. 1 Version      value column keys = time
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>  # 3- point name /time = value
> >>
> >>
> >>
> >>  Table Row Key Family Attributs  points point name attributes Contains
> the
> >> column keys : description,unit. 1 Version    point name / time value no
> >> column key 5 versions (to keep modifications)
> >>
> >>
> >>
> >>  #4 -
> >> value column = time
> >>
> >>
> >>
> >>  Table Row Key Family Attributs  points point name attributes Contains
> the
> >> column keys : description,unit. 1 Version  pointsValues point name /
> time
> >> value no column key 5 versions (to keep modifications)
> >> ---------------------
> >>
> >> I tought #1 would be the simplest then i tried to create an infinite
> >> versions family and it did not work (puttin 0 as number).
> >>
> >> #2 seems good but i think it would be hard to analyze the data over time
> >> like that.
> >>
> >> So #3 and #4 are remaining.
> >>
> >> I would do #3 but i don't know if it would be easy to iterate and know
> >> which
> >> data point exists skipping the /time part.
> >> --
> >> Pierre-Alexandre St-Jean
> >>
> >
>

RowKey=time (or sequence id)
Usually this is non-optimal.

If your keys are in order, inserts only go to one region and you do not get
the throughput from multiple regions. Since rowkeys always have an implicit
timestamp your rowkey should probably be something else.

Reply via email to