Re: financial time series database

stack Thu, 02 Apr 2009 00:26:23 -0700

How many columns will you have?  Until we fix
https://issues.apache.org/jira/browse/HBASE-867, you are limited regards the
number of columns you can have.
St.Ack


On Thu, Apr 2, 2009 at 4:48 AM, Bradford Cross
<[email protected]>wrote:

> Based on reading the hbase architecture wiki, I have changed my thinking
> due
> to the "Column Family Centric Storage."
>
> HBase stores column families physically close on disk, so the items in a
> given column family should have roughly the same read/write characteristics
> and contain similar data.  Although at a conceptual level, tables may be
> viewed as a sparse set of rows, physically they are stored on a per-column
> family basis. This is an important consideration for schema and application
> designers to keep in mind.
>
> This leads me to the thought of keeping an entire time series inside a
> single column family.
>
> Options:
>
> Row key is a ticker symbol:
> - hijack time stamp to be the time of each observation.  Use a column
> family
> to hold all the data, and a column for each property of  each observation.
> -don't hijack the time stamp, just ignore it.  Use a column family for all
> the data, and use an individual column for the date/time of the
> observation,
> and individual columns for each property of each observation.
>
> thoughts?
>
> On Tue, Mar 31, 2009 at 7:25 PM, Bradford Cross
> <[email protected]>wrote:
>
> > Greetings,
> >
> > I am prototyping a financial time series database on top of HBase and
> > trying to head my head around what a good design would look like.
> >
> > As I understand it, I have rows, column families, columns and cells.
> >
> > Since the only think that Hbase really "indexes" is row keys, it seems
> > natural in a way to represent the rowkeys as the date/time.
> >
> > As a simple example:
> >
> > Bar data:
> >
> > {
> >    "2009/1/17" : {
> >      "open":"100",
> >      "high":"102",
> >      "low":"99",
> >      "close":"101"
> >      "volume":"1000256"
> >    }
> > }
> >
> >
> > Quote data:
> >
> > {
> >    "2009/1/17:11:23:04" : {
> >      "bid":"100.01",
> >      "ask":"100.02",
> >      "bidsize":"10000",
> >      "asksize":"100200"
> >    }
> > }
> >
> > But there are many other issues to think about.
> >
> > In financial time series data we have small amounts of data within each
> > "observation" and we can have lots of observations.  We can have millions
> of
> > observations per time series (f.ex. all historical trade and quote date
> for
> > a particular stock since 1993)across hundreds of thousands of individual
> > instruments (f.ex. across all stocks that have traded since 1993.)
> >
> > The write patterns fit HBase nicely, because it is a write once and
> append
> > pattern.  This is followed by loads of offline processes for simulating
> > trading models and such.  These query patterns look like "all quotes for
> all
> > stocks between the dates of 1/1/996 and 12/31/2008."  So the querying is
> > typically across a date range, and we can further filter the query by
> > instrument types.
> >
> > So I am not sure what makes sense for efficiency because I do not
> > understand HBase well enough yet.
> >
> >  What kinds of mixes of rows, column families, and columns should I be
> > thinking about?
> >
> > Does my simplistic approach make any sense?  That would mean each row is
> a
> > key-value pair where the key is is the date/time and the value is the
> > "observation."  I suppose this leads to a "table per time series" model.
> > Does that make sense or is there overhead to having lots of tables?
> >
>

Re: financial time series database

Reply via email to