[ 
https://issues.apache.org/jira/browse/CHUKWA-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816982#comment-13816982
 ] 

Eric Yang commented on CHUKWA-700:
----------------------------------

How about we adopt DataNucleus for ORM layer? This will enable the entity to be 
described using annotation, and work in Demux processor.

DataNucleus way to write the entity seems like a reasonable approach:

http://www.datanucleus.org/products/accessplatform/datastores/hbase.html

Demux processor dictate the schema.  We just need a implementation that works 
well for time series metrics for monitoring Hadoop.

> Revisit Chukwa metrics schema design for HBase
> ----------------------------------------------
>
>                 Key: CHUKWA-700
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-700
>             Project: Chukwa
>          Issue Type: Bug
>          Components: Data Collection
>    Affects Versions: 0.6.0
>         Environment: MacOSX, Java
>            Reporter: Eric Yang
>
> Current Chukwa HBase schema looks like this:
> {code}
> <timestamp>-<primaryKey>   <columnFamily>:<cell>...
> {code}
> Monotonic increasing timestamp can not evenly distribute across region 
> servers without special handle and care periodically.
> It is time to revise the schema, and proposed schema looks like this:
> {code}
> <hhddmmyyyy>-<primaryId>  cf:<cell>...
> {code}
> Timestamp is stored with cell, row key helps to split data by hour, and a 
> full hour of metrics is stored on the same row.  PrimaryKey is replaced with 
> hash id of the primary key.  Metrics tables to aggregate metrics:
> chukwaMetrics -> chukwaMetricsMonthly -> chukwaMetricsYearly



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to