[
https://issues.apache.org/jira/browse/CHUKWA-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300152#comment-14300152
]
Eric Yang commented on CHUKWA-667:
----------------------------------
Hi Sreepathi,
In general, column family is partitioned per directory in HDFS. The common
access pattern is by column instead of by row. Therefore, using more than 1
column family is fine as long as the scan is from the same column family.
there is no performance penalty. However, using secondary table to store the
metric name has it's own problems. The ID needs to be padded. Otherwise it is
possible to get "|1000" from composed query of "|100", if the keys are not
padded with the same length. When storing large number of key types, padding
only take slightly less storage than direct store of metric name in cell name
instead. However the lookup time is faster to have shorter row key to locate
region, and linearly deserializing data from the same data blocks, 2 connection
requests to decode ID then linear scan of row key to find the closest row key
to start. Linear row key scan is slower than grabbing block of data from a
column family for a row.
> Optimize the HBase schema for Ganglia queris
> --------------------------------------------
>
> Key: CHUKWA-667
> URL: https://issues.apache.org/jira/browse/CHUKWA-667
> Project: Chukwa
> Issue Type: Sub-task
> Components: Data Processors
> Affects Versions: 0.6.0
> Reporter: Saisai Shao
>
> Chukwa HBase table schema is designed for HICC, it cannot be fully adapted to
> Ganglia web frontend for several reasons:
> (1) cannot fastly retrieve all the cluster and related host names.
> (2) system metrics have no attributes, like type, unit, so it is hard to
> explain the collected metrics by code.
> (3) lack of data cosolidate function, choosing metric for a large time range
> (like 30 days) will fetch all the data and draw graph, which will largely
> lose performance.
> We will redesign the table schema that will be better adapted to Ganglia web
> frontend queries.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)