[ 
https://issues.apache.org/jira/browse/FLUME-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295231#comment-13295231
 ] 

Hari Shreedharan commented on FLUME-1275:
-----------------------------------------

1) No, we should not be incrementing and then reading cells. We should *never* 
be reading stuff from HBase, that is a huge overhead.
2) Makes sense. Since you are using timestamp.machineIP - it might be even ok 
to use (nanoTimeDiff = System.nanoTime() - someFixedTime) for your timestamp, 
which could make your rowKey -> nanoTimeDiff.machineIP.pid (to potentially not 
have issues with multiple agents on the same machine).
3) I like your approach to (2) or my approach to (2) better than (3).

One suggestion I would make, which would be quite helpful is to create a new 
KeyGenerator interface, which simply returns a rowKey given the event. This 
should be plugged into the serializer just the way the serializer is plugged 
into the sink. This way, a user could use the serializer with any logic they 
like, to generate the row keys the data should go to. This is more of a wish 
actually ;)
                
> Add Regex Serializer for HBaseSink
> ----------------------------------
>
>                 Key: FLUME-1275
>                 URL: https://issues.apache.org/jira/browse/FLUME-1275
>             Project: Flume
>          Issue Type: Improvement
>            Reporter: Patrick Wendell
>         Attachments: FLUME-1275.patch.v1.txt
>
>
> It would be nice to have an "out of the box" HBase serializer that can 
> extract column data from a regular expression. This is a feature in Hive and 
> it is widely used:
> https://issues.apache.org/jira/browse/HIVE-167

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to