Re: Dynamically generating HBase columns

Alfonso Nishikawa Tue, 24 Feb 2015 09:24:10 -0800

Hi, Lewis.

In my use cases I always need a mix between static and dynamic columns.
In my first week I tried to mix a Map over a column family overlapped with
static columns. Didn't work because Gora was not prepared for that (and
indeed needs thinking about it further).


What I do is separate the static columns in one column family (or serveral)
from the dynamic stuff (that goes in a map). One Map is mapped to one
column family in which each column:value is key=>value in the map.
I have several maps depending on my needs, but can be just one big one with
key=column.

What I don't fully understand is the timestamp you talk about, since we
don't handle HBase timestamps. Do you specifically need it?

I'm not quite sure if I answer you :S

Something important to ask is much columns will you store in the column
family?
Since we removed the StateManager, when you modify a map it deletes the
column familiy and sends all the data again to be written (
https://github.com/apache/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L289),
so adding/removing just one column can be quite killing when persisting
several huge maps. About what volume and write pattern are we talking?

Best,

Alfonso Nishikawa


2015-02-24 17:55 GMT+01:00 Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>:

> Hi Folks,
> I am currently supercharging persistence in Apache Chukwa [0] with Gora,
> progress can be tracked in Jira [1].
> The issue I run in to, is that the required HBase schema looks as follows
>
> Row Key: [Invert Date]:[Data Type]:[Primary Key]
> Column Family: log
> Column Name: [Sequence ID]
> Timestamp: [log entry timestamp]
>
> Example:
>
> Row Key: 2132013102:TT:host1.example.com
> Column Family: log
> Column Name: 1230
> Cell Value: 2013-01-23 12:01:30 INFO This is a log entry.
> Timestamp: 1358942490
>
> The issue here is therefore that there will be dynamically generated
> columns, and the column names needs to be the field 'sequenceID', which is
> coming from the data bean itself.
>
> I *think* that this causes a conflict between our current mapping workflow
> where you 1) create data model in JSON, 2) create mapping file/datastore
> schema, 3) compile JSON... and so forth. The data is then mapped into the
> PREDEFINED datastore specific schema.
>
> The proposed change in workflow would involve 1) create data model in JSON,
> 2) create mapping file/datastore schema, 3) compile JSON... and so forth.
> The data is then mapped into the PREDEFINED datastore specific schema AND
> ALSO DYNAMIC FIELDS CAN BE GENERATED ON THE FLY.
>
> Has anyone else required dynamic columns for any datastore?
>
> I think that this is very handy and I would like to see what you guys
> think.
>
> Thanks
>
> [0] http://chukwa.apache.org
> [1] https://issues.apache.org/jira/browse/CHUKWA-734
>
> --
> *Lewis*
>

Re: Dynamically generating HBase columns

Reply via email to