[ 
https://issues.apache.org/jira/browse/FLUME-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271438#comment-13271438
 ] 

[email protected] commented on FLUME-1183:
------------------------------------------------------



bq.  On 2012-05-09 05:04:12, Brock Noland wrote:
bq.  > 
flume-ng-sinks/flume-ng-hbase-sink/src/main/java/org/apache/flume/sink/hbase/HBaseSink.java,
 line 132
bq.  > <https://reviews.apache.org/r/5073/diff/1/?file=107970#file107970line132>
bq.  >
bq.  >     Wish:  It'd be ideal if we could use mockito to pass it a fake 
HTable object and then test that transactions are handled correctly if Error 
and RuntimeException are thrown.
bq.  
bq.  Hari Shreedharan wrote:
bq.      I am not too familiar with mockito. I will do that in a different 
patch, when I have time to pick up Mockito.

Sounds good, for a future reference FLUME-1131 uses Mockito.


bq.  On 2012-05-09 05:04:12, Brock Noland wrote:
bq.  > 
flume-ng-sinks/flume-ng-hbase-sink/src/main/java/org/apache/flume/sink/hbase/HBaseSink.java,
 line 231
bq.  > <https://reviews.apache.org/r/5073/diff/1/?file=107970#file107970line231>
bq.  >
bq.  >     Maybe we should use a different default row key? I am guessing the 
row key prefix is supposed to be used to get around hot spotting due to the 
timestamp. Maybe UUID would be a better default?
bq.  
bq.  Hari Shreedharan wrote:
bq.      The idea is that the user gives an initial prefix in the conf. This 
way they can supply different prefixes for different sinks, within the same 
agent(and later identify which sink each of the rows came from). I agree that 
using uuid is a better default, but the concerns I have are its size, and also 
that scans will return the rows in a different order than inserted, while 
inserting it using timestamps will guarantee that values inserted in a specific 
order will be returned together. I would like your feedback on that, if that is 
not a major use case, then I will change it to uuid, since the implementation 
is also cleaner. Please let me know.

Keys are so important, I wonder if this isn't an area where we should provide 
options via an interface and then three default implementations:

1) Prefixed timestamp (current)
2) Prefixed reverse timestamp (Long.MAX_VALUE - timestamp) (sorts newest to the 
top)
3) uuid/md5sum (randomly distributed keys)
4) User can extend the interface and generate their own keys based on the 
headers/body.


- Brock


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5073/#review7720
-----------------------------------------------------------


On 2012-05-09 03:04:07, Hari Shreedharan wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/5073/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-05-09 03:04:07)
bq.  
bq.  
bq.  Review request for Flume.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Hbase sink.
bq.  
bq.  
bq.  This addresses bug FLUME-1183.
bq.      https://issues.apache.org/jira/browse/FLUME-1183
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    
flume-ng-sinks/flume-ng-hbase-sink/src/test/java/org/apache/flume/sink/hbase/TestHBaseSink.java
 PRE-CREATION 
bq.    flume-ng-sinks/pom.xml acb3087 
bq.    pom.xml 8c11a2d 
bq.    flume-ng-dist/pom.xml 5bdcfe7 
bq.    flume-ng-sinks/flume-ng-hbase-sink/pom.xml PRE-CREATION 
bq.    
flume-ng-sinks/flume-ng-hbase-sink/src/main/java/org/apache/flume/sink/hbase/HBaseSink.java
 PRE-CREATION 
bq.    bin/flume-ng 0108997 
bq.  
bq.  Diff: https://reviews.apache.org/r/5073/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Unit tests added
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Hari
bq.  
bq.


                
> Implement an HBase Sink which supports table level access
> ---------------------------------------------------------
>
>                 Key: FLUME-1183
>                 URL: https://issues.apache.org/jira/browse/FLUME-1183
>             Project: Flume
>          Issue Type: New Feature
>          Components: Sinks+Sources
>    Affects Versions: v1.2.0
>            Reporter: Hari Shreedharan
>            Assignee: Hari Shreedharan
>             Fix For: v1.2.0
>
>
> This is what I intend to do:
> * Insert the row key from event headers. Pick the column family and column 
> from configuration, not from headers. 
> * Allow configuration to specify default for row key. If no row key exists in 
> header, then take the default value from the configuration. I don't want to 
> dump everything into the table with the same row key. 
> * I don't intend to support multiple tables, column families or columns in 
> the same sink. We can use multiplexing channel selector to use different 
> sinks for different tables/columns.
> I know the existence of another jira for porting the HBase sink from OG, but 
> didn't see any activity for a while on that. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to