[ 
https://issues.apache.org/jira/browse/HIVE-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13403:
----------------------------------
    Description: 
as of HIVE-11983, when a TransactionBatch is opened in StreamingAPI, a full 
compliment of bucket files (AbstractRecordWriter.createRecordUpdaters()) is 
created on disk even though some may end up receiving no data.

It would be better to create them on demand and not clog the FS.

Tez can handle missing (empty) buckets and on MR bucket join algorithms will 
check if all buckets are there and bail out if not.  

  was:
as of HIVE-11983, when a TransactionBatch is opened in StreamingAPI, a full 
compliment of bucket files is created on disk even though some may end up 
receiving no data.

It would be better to create them on demand and not clog the FS.

Tez can handle missing (empty) buckets and on MR bucket join algorithms will 
check if all buckets are there and bail out if not.  


> Make Streaming API not create empty buckets (at least as an option)
> -------------------------------------------------------------------
>
>                 Key: HIVE-13403
>                 URL: https://issues.apache.org/jira/browse/HIVE-13403
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog, Transactions
>    Affects Versions: 1.3.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>
> as of HIVE-11983, when a TransactionBatch is opened in StreamingAPI, a full 
> compliment of bucket files (AbstractRecordWriter.createRecordUpdaters()) is 
> created on disk even though some may end up receiving no data.
> It would be better to create them on demand and not clog the FS.
> Tez can handle missing (empty) buckets and on MR bucket join algorithms will 
> check if all buckets are there and bail out if not.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to