[ https://issues.apache.org/jira/browse/HIVE-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Navis updated HIVE-4788: ------------------------ Assignee: Navis Status: Patch Available (was: Open) > RCFile and bzip2 compression not working > ---------------------------------------- > > Key: HIVE-4788 > URL: https://issues.apache.org/jira/browse/HIVE-4788 > Project: Hive > Issue Type: Bug > Components: Compression > Affects Versions: 0.10.0 > Environment: CDH4.2 > Reporter: Johndee Burks > Assignee: Navis > Priority: Minor > Attachments: HIVE-4788.1.patch.txt > > > The issue is that Bzip2 compressed rcfile data is encountering an error when > being queried even the most simple query "select *". The issue is easily > reproducible using the following. > Create a table and load the sample data below. > DDL: create table source_data (a string, b string) row format delimited > fields terminated by ','; > Sample data: > apple,sauce > Test: > Do the following and you should receive the error listed below for the rcfile > table with bz2 compression. > create table rc_nobz2 (a string, b string) stored as rcfile; > insert into table rc_nobz2 select * from source_txt; > SET io.seqfile.compression.type=BLOCK; > SET hive.exec.compress.output=true; > SET mapred.compress.map.output=true; > SET mapred.output.compress=true; > SET mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec; > create table rc_bz2 (a string, b string) stored as rcfile; > insert into table rc_bz2 select * from source_txt; > hive> select * from rc_bz2; > Failed with exception java.io.IOException:java.io.IOException: Stream is not > BZip2 formatted: expected 'h' as first byte but got '�' > hive> select * from rc_nobz2; > apple sauce -- This message was sent by Atlassian JIRA (v6.2#6252)