[ 
https://issues.apache.org/jira/browse/HIVE-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15975993#comment-15975993
 ] 

Biao Wu commented on HIVE-4788:
-------------------------------

I met the same problem with BZip2Codec, but Gzip is ok.  If codes is 
BZip2Codec, it already have resetState in createInputStream. Repeat create lead 
to can't read the header, because the position of stream is end.

> RCFile and bzip2 compression not working
> ----------------------------------------
>
>                 Key: HIVE-4788
>                 URL: https://issues.apache.org/jira/browse/HIVE-4788
>             Project: Hive
>          Issue Type: Bug
>          Components: Compression
>    Affects Versions: 0.10.0
>         Environment: CDH4.2
>            Reporter: Johndee Burks
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-4788.1.patch.txt, HIVE-4788.2.patch.txt
>
>
> The issue is that Bzip2 compressed rcfile data is encountering an error when 
> being queried even the most simple query "select *". The issue is easily 
> reproducible using the following. 
> Create a table and load the sample data below. 
> DDL: create table source_data (a string, b string) row format delimited 
> fields terminated by ',';
> Sample data: 
> apple,sauce 
> Test: 
> Do the following and you should receive the error listed below for the rcfile 
> table with bz2 compression. 
> create table rc_nobz2 (a string, b string) stored as rcfile; 
> insert into table rc_nobz2 select * from source_txt; 
> SET io.seqfile.compression.type=BLOCK; 
> SET hive.exec.compress.output=true; 
> SET mapred.compress.map.output=true; 
> SET mapred.output.compress=true; 
> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec; 
> create table rc_bz2 (a string, b string) stored as rcfile; 
> insert into table rc_bz2 select * from source_txt; 
> hive> select * from rc_bz2; 
> Failed with exception java.io.IOException:java.io.IOException: Stream is not 
> BZip2 formatted: expected 'h' as first byte but got '�' 
> hive> select * from rc_nobz2; 
> apple sauce



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to