[
https://issues.apache.org/jira/browse/CHUKWA-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818022#comment-13818022
]
Eric Yang commented on CHUKWA-700:
----------------------------------
I don't plan to reinvent ORM in Chukwa project, therefore, it is better to pick
one that works. In open source, there are two choices: Gora, DataNucleus.
DataNucleus seems like a better choice than Gora because it can define bloom
filter, and the sharding of row key, should make random data access faster with
storing shorter row key than the current monotonic increasing rowKey.
> Revisit Chukwa metrics schema design for HBase
> ----------------------------------------------
>
> Key: CHUKWA-700
> URL: https://issues.apache.org/jira/browse/CHUKWA-700
> Project: Chukwa
> Issue Type: New Feature
> Components: Data Collection
> Affects Versions: 0.6.0
> Environment: MacOSX, Java
> Reporter: Eric Yang
>
> Current Chukwa HBase schema looks like this:
> {code}
> <timestamp>-<primaryKey> <columnFamily>:<cell>...
> {code}
> Monotonic increasing timestamp can not evenly distribute across region
> servers without special handle and care periodically.
> It is time to revise the schema, and proposed schema looks like this:
> {code}
> <hhddmmyyyy>-<primaryId> cf:<cell>...
> {code}
> Timestamp is stored with cell, row key helps to split data by hour, and a
> full hour of metrics is stored on the same row. PrimaryKey is replaced with
> hash id of the primary key. Metrics tables to aggregate metrics:
> chukwaMetrics -> chukwaMetricsMonthly -> chukwaMetricsYearly
--
This message was sent by Atlassian JIRA
(v6.1#6144)