[ 
https://issues.apache.org/jira/browse/HIVE-21193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21193:
-------------------------------
    Status: Patch Available  (was: Open)

> Support LZO Compression with CombineHiveInputFormat
> ---------------------------------------------------
>
>                 Key: HIVE-21193
>                 URL: https://issues.apache.org/jira/browse/HIVE-21193
>             Project: Hive
>          Issue Type: Improvement
>          Components: Compression
>    Affects Versions: 4.0.0, 3.2.0
>            Reporter: BELUGA BEHR
>            Assignee: BELUGA BEHR
>            Priority: Major
>         Attachments: HIVE-21193.1.patch
>
>
> In regards to LZO compression with Hive...
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO
> It does not work out of the box if there are {{.lzo.index}} files present.  
> As I understand it, this is because of the default Hive input format 
> {{CombineHiveInputFormat}} does not handle this correctly.  It does not like 
> that there are a mix of data files and some index files, it lumps them 
> altogether when making the combined splits and Mappers fail when they try to 
> process the {{.lzo.index}} files as data.  When using the original 
> {{HiveInputFormat}}, it correctly identifies the {{.lzo.index}} files because 
> it considers each file individually.
> Allow {{CombineHiveInputFormat}} to short-circuit LZO files and to not 
> combine them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to