[
https://issues.apache.org/jira/browse/FLUME-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271438#comment-13271438
]
[email protected] commented on FLUME-1183:
------------------------------------------------------
bq. On 2012-05-09 05:04:12, Brock Noland wrote:
bq. >
flume-ng-sinks/flume-ng-hbase-sink/src/main/java/org/apache/flume/sink/hbase/HBaseSink.java,
line 132
bq. > <https://reviews.apache.org/r/5073/diff/1/?file=107970#file107970line132>
bq. >
bq. > Wish: It'd be ideal if we could use mockito to pass it a fake
HTable object and then test that transactions are handled correctly if Error
and RuntimeException are thrown.
bq.
bq. Hari Shreedharan wrote:
bq. I am not too familiar with mockito. I will do that in a different
patch, when I have time to pick up Mockito.
Sounds good, for a future reference FLUME-1131 uses Mockito.
bq. On 2012-05-09 05:04:12, Brock Noland wrote:
bq. >
flume-ng-sinks/flume-ng-hbase-sink/src/main/java/org/apache/flume/sink/hbase/HBaseSink.java,
line 231
bq. > <https://reviews.apache.org/r/5073/diff/1/?file=107970#file107970line231>
bq. >
bq. > Maybe we should use a different default row key? I am guessing the
row key prefix is supposed to be used to get around hot spotting due to the
timestamp. Maybe UUID would be a better default?
bq.
bq. Hari Shreedharan wrote:
bq. The idea is that the user gives an initial prefix in the conf. This
way they can supply different prefixes for different sinks, within the same
agent(and later identify which sink each of the rows came from). I agree that
using uuid is a better default, but the concerns I have are its size, and also
that scans will return the rows in a different order than inserted, while
inserting it using timestamps will guarantee that values inserted in a specific
order will be returned together. I would like your feedback on that, if that is
not a major use case, then I will change it to uuid, since the implementation
is also cleaner. Please let me know.
Keys are so important, I wonder if this isn't an area where we should provide
options via an interface and then three default implementations:
1) Prefixed timestamp (current)
2) Prefixed reverse timestamp (Long.MAX_VALUE - timestamp) (sorts newest to the
top)
3) uuid/md5sum (randomly distributed keys)
4) User can extend the interface and generate their own keys based on the
headers/body.
- Brock
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5073/#review7720
-----------------------------------------------------------
On 2012-05-09 03:04:07, Hari Shreedharan wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/5073/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-05-09 03:04:07)
bq.
bq.
bq. Review request for Flume.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Hbase sink.
bq.
bq.
bq. This addresses bug FLUME-1183.
bq. https://issues.apache.org/jira/browse/FLUME-1183
bq.
bq.
bq. Diffs
bq. -----
bq.
bq.
flume-ng-sinks/flume-ng-hbase-sink/src/test/java/org/apache/flume/sink/hbase/TestHBaseSink.java
PRE-CREATION
bq. flume-ng-sinks/pom.xml acb3087
bq. pom.xml 8c11a2d
bq. flume-ng-dist/pom.xml 5bdcfe7
bq. flume-ng-sinks/flume-ng-hbase-sink/pom.xml PRE-CREATION
bq.
flume-ng-sinks/flume-ng-hbase-sink/src/main/java/org/apache/flume/sink/hbase/HBaseSink.java
PRE-CREATION
bq. bin/flume-ng 0108997
bq.
bq. Diff: https://reviews.apache.org/r/5073/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Unit tests added
bq.
bq.
bq. Thanks,
bq.
bq. Hari
bq.
bq.
> Implement an HBase Sink which supports table level access
> ---------------------------------------------------------
>
> Key: FLUME-1183
> URL: https://issues.apache.org/jira/browse/FLUME-1183
> Project: Flume
> Issue Type: New Feature
> Components: Sinks+Sources
> Affects Versions: v1.2.0
> Reporter: Hari Shreedharan
> Assignee: Hari Shreedharan
> Fix For: v1.2.0
>
>
> This is what I intend to do:
> * Insert the row key from event headers. Pick the column family and column
> from configuration, not from headers.
> * Allow configuration to specify default for row key. If no row key exists in
> header, then take the default value from the configuration. I don't want to
> dump everything into the table with the same row key.
> * I don't intend to support multiple tables, column families or columns in
> the same sink. We can use multiplexing channel selector to use different
> sinks for different tables/columns.
> I know the existence of another jira for porting the HBase sink from OG, but
> didn't see any activity for a while on that.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira