[
https://issues.apache.org/jira/browse/FLUME-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295231#comment-13295231
]
Hari Shreedharan commented on FLUME-1275:
-----------------------------------------
1) No, we should not be incrementing and then reading cells. We should *never*
be reading stuff from HBase, that is a huge overhead.
2) Makes sense. Since you are using timestamp.machineIP - it might be even ok
to use (nanoTimeDiff = System.nanoTime() - someFixedTime) for your timestamp,
which could make your rowKey -> nanoTimeDiff.machineIP.pid (to potentially not
have issues with multiple agents on the same machine).
3) I like your approach to (2) or my approach to (2) better than (3).
One suggestion I would make, which would be quite helpful is to create a new
KeyGenerator interface, which simply returns a rowKey given the event. This
should be plugged into the serializer just the way the serializer is plugged
into the sink. This way, a user could use the serializer with any logic they
like, to generate the row keys the data should go to. This is more of a wish
actually ;)
> Add Regex Serializer for HBaseSink
> ----------------------------------
>
> Key: FLUME-1275
> URL: https://issues.apache.org/jira/browse/FLUME-1275
> Project: Flume
> Issue Type: Improvement
> Reporter: Patrick Wendell
> Attachments: FLUME-1275.patch.v1.txt
>
>
> It would be nice to have an "out of the box" HBase serializer that can
> extract column data from a regular expression. This is a feature in Hive and
> it is widely used:
> https://issues.apache.org/jira/browse/HIVE-167
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira