[ https://issues.apache.org/jira/browse/GORA-413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kevin Ratnasekera updated GORA-413: ----------------------------------- Fix Version/s: (was: 0.9) 1.0 > Support creation of dynamic columns within Gora datastore mapping designs > ------------------------------------------------------------------------- > > Key: GORA-413 > URL: https://issues.apache.org/jira/browse/GORA-413 > Project: Apache Gora > Issue Type: New Feature > Components: gora-hbase > Affects Versions: 0.6 > Reporter: Lewis John McGibbney > Priority: Major > Fix For: 1.0 > > > The conversation taking place on [dynamically generating HBase > columns|http://www.mail-archive.com/dev%40gora.apache.org/msg05754.html] has > raised an issue that new functionality needs to be added in order to achieve > this. > The main driver for this issue coming to light is that Chukwa logs need to > dynamically create many many columns over time directly dependent on the > number of data chunks we get. Each data chunk has a [Sequence ID], this > sequenceID should be the column name. > The table design will look like this > {code} > Row Key: [Invert Date]:[Data Type]:[Primary Key] > Column Family: log > Column Name: [Sequence ID] > Timestamp: [log entry timestamp] > Example: > Row Key: 2132013102:TT:host1.example.com > Column Family: log > Column Name: 1230 > Cell Value: 2013-01-23 12:01:30 INFO This is a log entry. > Timestamp: 1358942490 > {code} > The inverted date allow the table to be partitioned by hour or day of the > month or month more easily. > The usage of column name for consecutive sequence to allow fast retrieval in > a linear scan. This format is typically good for retrieve a hour worth of > logs fast for a node. Hence, if we are doing batch scanning of the table in a > rolling window via map reduce job at every hour interval, we get a even > spread the work load to multiple map reduce tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)