[ 
https://issues.apache.org/jira/browse/FLUME-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502241#comment-13502241
 ] 

Roshan Naik edited comment on FLUME-1734 at 11/21/12 7:41 PM:
--------------------------------------------------------------

Mike,
 1. There will be no map reduce. This will all be client side (i.e flume 
agents) streaming data in parallel into HCatalog. Clients will compute the 
specific partition into which the data will be written. Periodically 
(configurable) they would 'commit' the currently open partition and roll-over 
to a new partition. Until the partition is committed the data will not be 
query-able. There is one restriction... once a partition is committed its data 
cannot be modified it.

 2. org.apache.hcatalog.data.transfer.* 

 3. I have not verified the secure mode HCat operation, but it appears to be 
supported. Will get back to you.

 4. At the moment, I dont see much code overlap with HDFS sink for the core 
data movement functionality. There may be always room for sharing other smaller 
tidbits.



                
      was (Author: roshan_naik):
    Mike,
 1. There will be no map reduce. This will all be client side (i.e flume 
agents) streaming data in parallel into HCatalog. Clients will compute the 
specific partition into which the data will be written. Periodically 
(configurable) they would 'commit' the currently open partition and roll-over 
to a new partition. Until the partition is committed the data will not be 
query-able. There is one restriction... once a partition is committed data its 
data cannot be modified it.

 2. org.apache.hcatalog.data.transfer.* 

 3. I have not verified the secure mode HCat operation, but it appears to be 
supported. Will get back to you.

 4. At the moment, I dont see much code overlap with HDFS sink for the core 
data movement functionality. There may be always room for sharing other smaller 
tidbits.



                  
> Create a HCatalog Sink 
> -----------------------
>
>                 Key: FLUME-1734
>                 URL: https://issues.apache.org/jira/browse/FLUME-1734
>             Project: Flume
>          Issue Type: New Feature
>          Components: Sinks+Sources
>    Affects Versions: v1.2.0
>            Reporter: Roshan Naik
>            Assignee: Roshan Naik
>              Labels: features
>
> Create a sink that would stream data into HCatalog partitions. The primary 
> goal being that once the data is loaded into Hadoop, it should be 
> automatically queryable (using say Hive or Pig) without requiring additional 
> post processing steps on behalf of the users. Sink should manage the creation 
> of new partitions and committing them periodically. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to