[jira] [Commented] (HIVE-5687) Streaming support in Hive

Alan Gates (JIRA) Mon, 17 Mar 2014 11:42:07 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938187#comment-13938187
 ]


Alan Gates commented on HIVE-5687:
----------------------------------

A few comments:
* Right now you're building one lock request and re-using it.  That won't work. 
 A new lock needs to be constructed with each transaction and then associated 
with that transaction so that the transaction manager knows to release the lock 
when the transaction is committed or aborted.  This should be done in 
beginNextTxn().
* The partition name is currently being built incorrectly.  It is just using 
the values.  It should be constructed using Warehouse.makePartName.
* The lock components are being constructed incorrectly.  You are building a 
component for every key/value pair in the partition.  You should only build one 
component for each partition you want to lock.  So in your case, each lock 
request will have exactly one lock component.
* The file is being written to the table location instead of the partition 
location.  When I run this with table foo and partition bar I get files in 
/hive/warehouse/foo instead of /hive/warehouse/foo/bar
* The metaStoreClient is being prematurely closed.  It certainly shouldn't be 
closed in createPartition.  I'm not sure it should be closed at all.


> Streaming support in Hive
> -------------------------
>
>                 Key: HIVE-5687
>                 URL: https://issues.apache.org/jira/browse/HIVE-5687
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Roshan Naik
>            Assignee: Roshan Naik
>         Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 
> 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687.patch, 
> HIVE-5687.v2.patch
>
>
> Implement support for Streaming data into HIVE.
> - Provide a client streaming API 
> - Transaction support: Clients should be able to periodically commit a batch 
> of records atomically
> - Immediate visibility: Records should be immediately visible to queries on 
> commit
> - Should not overload HDFS with too many small files
> Use Cases:
>  - Streaming logs into HIVE via Flume
>  - Streaming results of computations from Storm



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5687) Streaming support in Hive

Reply via email to