Dynamically generating HBase columns

Lewis John Mcgibbney Tue, 24 Feb 2015 08:57:07 -0800

Hi Folks,
I am currently supercharging persistence in Apache Chukwa [0] with Gora,
progress can be tracked in Jira [1].
The issue I run in to, is that the required HBase schema looks as follows


Row Key: [Invert Date]:[Data Type]:[Primary Key]
Column Family: log
Column Name: [Sequence ID]
Timestamp: [log entry timestamp]

Example:

Row Key: 2132013102:TT:host1.example.com
Column Family: log
Column Name: 1230
Cell Value: 2013-01-23 12:01:30 INFO This is a log entry.
Timestamp: 1358942490

The issue here is therefore that there will be dynamically generated
columns, and the column names needs to be the field 'sequenceID', which is
coming from the data bean itself.

I *think* that this causes a conflict between our current mapping workflow
where you 1) create data model in JSON, 2) create mapping file/datastore
schema, 3) compile JSON... and so forth. The data is then mapped into the
PREDEFINED datastore specific schema.

The proposed change in workflow would involve 1) create data model in JSON,
2) create mapping file/datastore schema, 3) compile JSON... and so forth.
The data is then mapped into the PREDEFINED datastore specific schema AND
ALSO DYNAMIC FIELDS CAN BE GENERATED ON THE FLY.

Has anyone else required dynamic columns for any datastore?

I think that this is very handy and I would like to see what you guys think.

Thanks

[0] http://chukwa.apache.org
[1] https://issues.apache.org/jira/browse/CHUKWA-734

-- 
*Lewis*

Dynamically generating HBase columns

Reply via email to