[
https://issues.apache.org/jira/browse/FLUME-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294756#comment-13294756
]
Hari Shreedharan commented on FLUME-1275:
-----------------------------------------
Patrick:
Thanks for the patch! Overall looks good. I have some minor suggestions though:
* There is a final variable: ENCODING, which is used only for String.getBytes()
calls. Replacing this with String.getBytes(Charsets.UTF_8) gives the same
effect and does not thrown UnsupportedEncodingException. Also, do you want to
make the encoding configurable - though not a common use case, it might be ok
to consider - though I don't really mind just supporting UTF_8 alone.
* Please document the configuration.
* Using the SimpleRowKeyGenerators's timestamp key is not exactly a good idea,
or at least, it should be configurable. In the same millisecond, that loop
could run several times - creating several Puts with the same row key. I don't
really have a good solution for this, other than creating an interface for row
key generators and using a configuration provided implementation of the
interface, to further make it pluggable.
> Add Regex Serializer for HBaseSink
> ----------------------------------
>
> Key: FLUME-1275
> URL: https://issues.apache.org/jira/browse/FLUME-1275
> Project: Flume
> Issue Type: Improvement
> Reporter: Patrick Wendell
> Attachments: FLUME-1275.patch.v1.txt
>
>
> It would be nice to have an "out of the box" HBase serializer that can
> extract column data from a regular expression. This is a feature in Hive and
> it is widely used:
> https://issues.apache.org/jira/browse/HIVE-167
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira