[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

2015-05-29 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565196#comment-14565196
 ] 

Xuefu Zhang commented on HIVE-9863:
---

Okay. Let me try to reproduce it with the old release and will update.

> Querying parquet tables fails with IllegalStateException [Spark Branch]
> ---
>
> Key: HIVE-9863
> URL: https://issues.apache.org/jira/browse/HIVE-9863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> Not necessarily happens only in spark branch, queries such as select count(*) 
> from table_name fails with error:
> {code}
> hive> select * from content limit 2;
> OK
> Failed with exception java.io.IOException:java.lang.IllegalStateException: 
> All the offsets listed in the split should be found in the file. expected: 
> [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] 
> BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY 
>  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] 
> BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] 
> INT64  [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP 
> [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, 
> ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
> BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, 
> PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] 
> INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP 
> [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP 
> [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP 
> [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP 
> [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 
> 129785482, 260224757] in range 0, 134217728
> Time taken: 0.253 seconds
> hive> 
> {code}
> I can reproduce the problem with either local or yarn-cluster. It seems 
> happening to MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

2015-05-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565190#comment-14565190
 ] 

Sergio Peña commented on HIVE-9863:
---

[~xuefuz]
I run the same tests using the hive cli + spark this time; but it works fine. 
There is no error exception.

{noformat}
hive> desc formatted parquet;
...
# Storage Information
SerDe Library:  
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe  
InputFormat:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat:   
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
...

hive> select count(*) from parquet;
Query ID = sergio_20150529133253_5fd9da28-d73b-4137-a04a-3975108dbba7
...
Starting Spark Job = 513800e8-2d6a-47af-830c-d18099e52bc3
2015-05-29 13:32:54,261 Stage-3_0: 1/1 Finished Stage-4_0: 1/1 Finished
Status: Finished successfully in 1.01 seconds
OK
500
Time taken: 1.198 seconds, Fetched: 1 row(s)
{noformat}

> Querying parquet tables fails with IllegalStateException [Spark Branch]
> ---
>
> Key: HIVE-9863
> URL: https://issues.apache.org/jira/browse/HIVE-9863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> Not necessarily happens only in spark branch, queries such as select count(*) 
> from table_name fails with error:
> {code}
> hive> select * from content limit 2;
> OK
> Failed with exception java.io.IOException:java.lang.IllegalStateException: 
> All the offsets listed in the split should be found in the file. expected: 
> [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] 
> BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY 
>  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] 
> BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] 
> INT64  [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP 
> [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, 
> ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
> BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, 
> PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] 
> INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP 
> [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP 
> [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP 
> [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP 
> [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 
> 129785482, 260224757] in range 0, 134217728
> Time taken: 0.253 seconds
> hive> 
> {code}
> I can reproduce the problem with either local or yarn-cluster. It seems 
> happening to MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

2015-05-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543878#comment-14543878
 ] 

Sergio Peña commented on HIVE-9863:
---

I executed the following commands through the TestSparkCliDriver in 1.3.0 (with 
parquet 1.6.0) and 1.1.0 (with parquet 1.6.0rc3), and both versions are working 
correctly. I cannot reproduce the issue yet. Which version are you running?

{noformat}
set hive.compute.query.using.stats=false;

create table text(key int, value string);
load data local inpath '/opt/local/hive/upstream/data/files/kv1.txt' overwrite 
into table text;

create table parquet(key int, value string) stored as parquet;
insert overwrite table parquet select * from text;

select count(*) from parquet;
select * from parquet limit 2;
{noformat}



> Querying parquet tables fails with IllegalStateException [Spark Branch]
> ---
>
> Key: HIVE-9863
> URL: https://issues.apache.org/jira/browse/HIVE-9863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> Not necessarily happens only in spark branch, queries such as select count(*) 
> from table_name fails with error:
> {code}
> hive> select * from content limit 2;
> OK
> Failed with exception java.io.IOException:java.lang.IllegalStateException: 
> All the offsets listed in the split should be found in the file. expected: 
> [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] 
> BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY 
>  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] 
> BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] 
> INT64  [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP 
> [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, 
> ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
> BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, 
> PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] 
> INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP 
> [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP 
> [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP 
> [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP 
> [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 
> 129785482, 260224757] in range 0, 134217728
> Time taken: 0.253 seconds
> hive> 
> {code}
> I can reproduce the problem with either local or yarn-cluster. It seems 
> happening to MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

2015-05-04 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527419#comment-14527419
 ] 

Xuefu Zhang commented on HIVE-9863:
---

The steps is quite simple: just create a parquet table and run select count(*) 
on it. Please make sure that query execution isn't using statistics. Thanks.

> Querying parquet tables fails with IllegalStateException [Spark Branch]
> ---
>
> Key: HIVE-9863
> URL: https://issues.apache.org/jira/browse/HIVE-9863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> Not necessarily happens only in spark branch, queries such as select count(*) 
> from table_name fails with error:
> {code}
> hive> select * from content limit 2;
> OK
> Failed with exception java.io.IOException:java.lang.IllegalStateException: 
> All the offsets listed in the split should be found in the file. expected: 
> [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] 
> BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY 
>  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] 
> BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] 
> INT64  [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP 
> [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, 
> ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
> BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, 
> PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] 
> INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP 
> [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP 
> [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP 
> [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP 
> [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 
> 129785482, 260224757] in range 0, 134217728
> Time taken: 0.253 seconds
> hive> 
> {code}
> I can reproduce the problem with either local or yarn-cluster. It seems 
> happening to MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

2015-05-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526811#comment-14526811
 ] 

Sergio Peña commented on HIVE-9863:
---

[~xuefuz] Could you paste the steps to reproduce this issue?
I haven't been able to reproduce it.

> Querying parquet tables fails with IllegalStateException [Spark Branch]
> ---
>
> Key: HIVE-9863
> URL: https://issues.apache.org/jira/browse/HIVE-9863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> Not necessarily happens only in spark branch, queries such as select count(*) 
> from table_name fails with error:
> {code}
> hive> select * from content limit 2;
> OK
> Failed with exception java.io.IOException:java.lang.IllegalStateException: 
> All the offsets listed in the split should be found in the file. expected: 
> [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] 
> BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY 
>  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] 
> BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] 
> INT64  [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP 
> [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, 
> ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
> BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, 
> PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] 
> INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP 
> [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP 
> [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP 
> [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP 
> [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 
> 129785482, 260224757] in range 0, 134217728
> Time taken: 0.253 seconds
> hive> 
> {code}
> I can reproduce the problem with either local or yarn-cluster. It seems 
> happening to MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

2015-04-28 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518088#comment-14518088
 ] 

Xuefu Zhang commented on HIVE-9863:
---

[~spena], given that we already upgraded the version (HIVE-10372), could you 
please verify if that fixes the problem here and close this JIRA if so? Thanks.

> Querying parquet tables fails with IllegalStateException [Spark Branch]
> ---
>
> Key: HIVE-9863
> URL: https://issues.apache.org/jira/browse/HIVE-9863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> Not necessarily happens only in spark branch, queries such as select count(*) 
> from table_name fails with error:
> {code}
> hive> select * from content limit 2;
> OK
> Failed with exception java.io.IOException:java.lang.IllegalStateException: 
> All the offsets listed in the split should be found in the file. expected: 
> [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] 
> BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY 
>  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] 
> BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] 
> INT64  [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP 
> [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, 
> ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
> BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, 
> PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] 
> INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP 
> [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP 
> [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP 
> [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP 
> [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 
> 129785482, 260224757] in range 0, 134217728
> Time taken: 0.253 seconds
> hive> 
> {code}
> I can reproduce the problem with either local or yarn-cluster. It seems 
> happening to MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

2015-04-27 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514560#comment-14514560
 ] 

Ryan Blue commented on HIVE-9863:
-

Good news: the [1.6.0 
artifacts|https://search.maven.org/#artifactdetails%7Ccom.twitter%7Cparquet-hadoop-bundle%7C1.6.0%7Cjar]
 have hit maven central. We should be able to get off of the RC release now!

And to give you guys a heads-up, 1.7.0 will be going out soon. That's just a 
rename for artifacts (com.twitter => org.apache.parquet) and packages (parquet 
=> org.apache.parquet). There are no other changes, so it should make updating 
to the new version clean, although it is an incompatible change.

> Querying parquet tables fails with IllegalStateException [Spark Branch]
> ---
>
> Key: HIVE-9863
> URL: https://issues.apache.org/jira/browse/HIVE-9863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> Not necessarily happens only in spark branch, queries such as select count(*) 
> from table_name fails with error:
> {code}
> hive> select * from content limit 2;
> OK
> Failed with exception java.io.IOException:java.lang.IllegalStateException: 
> All the offsets listed in the split should be found in the file. expected: 
> [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] 
> BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY 
>  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] 
> BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] 
> INT64  [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP 
> [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, 
> ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
> BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, 
> PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] 
> INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP 
> [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP 
> [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP 
> [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP 
> [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 
> 129785482, 260224757] in range 0, 134217728
> Time taken: 0.253 seconds
> hive> 
> {code}
> I can reproduce the problem with either local or yarn-cluster. It seems 
> happening to MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

2015-04-26 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513356#comment-14513356
 ] 

Ferdinand Xu commented on HIVE-9863:


Yes, The maven repository hasn't been updated yet. I have HIVE-10372 tracing 
this issue.

> Querying parquet tables fails with IllegalStateException [Spark Branch]
> ---
>
> Key: HIVE-9863
> URL: https://issues.apache.org/jira/browse/HIVE-9863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> Not necessarily happens only in spark branch, queries such as select count(*) 
> from table_name fails with error:
> {code}
> hive> select * from content limit 2;
> OK
> Failed with exception java.io.IOException:java.lang.IllegalStateException: 
> All the offsets listed in the split should be found in the file. expected: 
> [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] 
> BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY 
>  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] 
> BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] 
> INT64  [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP 
> [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, 
> ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
> BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, 
> PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] 
> INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP 
> [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP 
> [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP 
> [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP 
> [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 
> 129785482, 260224757] in range 0, 134217728
> Time taken: 0.253 seconds
> hive> 
> {code}
> I can reproduce the problem with either local or yarn-cluster. It seems 
> happening to MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

2015-04-24 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511730#comment-14511730
 ] 

Sergio Peña commented on HIVE-9863:
---

Hive currently uses the 1.6.0rc6, but we're waiting for 1.6.0 bits on the maven 
repository. We have other changes we want to do based on 1.6.0.

> Querying parquet tables fails with IllegalStateException [Spark Branch]
> ---
>
> Key: HIVE-9863
> URL: https://issues.apache.org/jira/browse/HIVE-9863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> Not necessarily happens only in spark branch, queries such as select count(*) 
> from table_name fails with error:
> {code}
> hive> select * from content limit 2;
> OK
> Failed with exception java.io.IOException:java.lang.IllegalStateException: 
> All the offsets listed in the split should be found in the file. expected: 
> [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] 
> BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY 
>  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] 
> BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] 
> INT64  [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP 
> [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, 
> ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
> BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, 
> PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] 
> INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP 
> [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP 
> [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP 
> [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP 
> [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 
> 129785482, 260224757] in range 0, 134217728
> Time taken: 0.253 seconds
> hive> 
> {code}
> I can reproduce the problem with either local or yarn-cluster. It seems 
> happening to MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

2015-04-24 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511685#comment-14511685
 ] 

Ryan Blue commented on HIVE-9863:
-

This was a problem in the upstream Parquet. I don't think HIVE-10076 fixed it 
because PARQUET-139 wasn't released until 1.6.0. But, maybe it was included in 
an RC. I seem to remember Hive depending on a Parquet RC.

> Querying parquet tables fails with IllegalStateException [Spark Branch]
> ---
>
> Key: HIVE-9863
> URL: https://issues.apache.org/jira/browse/HIVE-9863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> Not necessarily happens only in spark branch, queries such as select count(*) 
> from table_name fails with error:
> {code}
> hive> select * from content limit 2;
> OK
> Failed with exception java.io.IOException:java.lang.IllegalStateException: 
> All the offsets listed in the split should be found in the file. expected: 
> [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] 
> BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY 
>  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] 
> BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] 
> INT64  [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP 
> [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, 
> ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
> BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, 
> PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] 
> INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP 
> [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP 
> [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP 
> [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP 
> [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 
> 129785482, 260224757] in range 0, 134217728
> Time taken: 0.253 seconds
> hive> 
> {code}
> I can reproduce the problem with either local or yarn-cluster. It seems 
> happening to MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

2015-04-24 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511620#comment-14511620
 ] 

Xuefu Zhang commented on HIVE-9863:
---

[~Ferd], [~b...@cloudera.com], with HIVE-10076, do you know if the problem here 
is fixed? Thanks.

> Querying parquet tables fails with IllegalStateException [Spark Branch]
> ---
>
> Key: HIVE-9863
> URL: https://issues.apache.org/jira/browse/HIVE-9863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> Not necessarily happens only in spark branch, queries such as select count(*) 
> from table_name fails with error:
> {code}
> hive> select * from content limit 2;
> OK
> Failed with exception java.io.IOException:java.lang.IllegalStateException: 
> All the offsets listed in the split should be found in the file. expected: 
> [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] 
> BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY 
>  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] 
> BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] 
> INT64  [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP 
> [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, 
> ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
> BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, 
> PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] 
> INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP 
> [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP 
> [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP 
> [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP 
> [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 
> 129785482, 260224757] in range 0, 134217728
> Time taken: 0.253 seconds
> hive> 
> {code}
> I can reproduce the problem with either local or yarn-cluster. It seems 
> happening to MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

2015-03-05 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349635#comment-14349635
 ] 

Ryan Blue commented on HIVE-9863:
-

[~spena], there is a work-around but it depends on the RC that you're depending 
on. You can use one of the other constructors in the ParquetInputSplit.

In 1.6.0, Parquet will 
[accept|https://github.com/apache/incubator-parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/ParquetRecordReader.java#L204]
 mapreduce.FileSplit, mapred.FileSplit, and ParquetInputSplit, so there will be 
no need for Hive to depend on ParquetInputSplit at all. At that point we will 
probably deprecate it from the public API.

> Querying parquet tables fails with IllegalStateException [Spark Branch]
> ---
>
> Key: HIVE-9863
> URL: https://issues.apache.org/jira/browse/HIVE-9863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> Not necessarily happens only in spark branch, queries such as select count(*) 
> from table_name fails with error:
> {code}
> hive> select * from content limit 2;
> OK
> Failed with exception java.io.IOException:java.lang.IllegalStateException: 
> All the offsets listed in the split should be found in the file. expected: 
> [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] 
> BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY 
>  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] 
> BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] 
> INT64  [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP 
> [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, 
> ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
> BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, 
> PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] 
> INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP 
> [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP 
> [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP 
> [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP 
> [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 
> 129785482, 260224757] in range 0, 134217728
> Time taken: 0.253 seconds
> hive> 
> {code}
> I can reproduce the problem with either local or yarn-cluster. It seems 
> happening to MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

2015-03-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349018#comment-14349018
 ] 

Sergio Peña commented on HIVE-9863:
---

[~xuefuz] How often does this error happen? Is there a workaround for this so 
that we can wait for parquet 1.6?

> Querying parquet tables fails with IllegalStateException [Spark Branch]
> ---
>
> Key: HIVE-9863
> URL: https://issues.apache.org/jira/browse/HIVE-9863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> Not necessarily happens only in spark branch, queries such as select count(*) 
> from table_name fails with error:
> {code}
> hive> select * from content limit 2;
> OK
> Failed with exception java.io.IOException:java.lang.IllegalStateException: 
> All the offsets listed in the split should be found in the file. expected: 
> [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] 
> BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY 
>  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] 
> BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] 
> INT64  [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP 
> [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, 
> ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
> BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, 
> PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] 
> INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP 
> [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP 
> [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP 
> [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP 
> [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 
> 129785482, 260224757] in range 0, 134217728
> Time taken: 0.253 seconds
> hive> 
> {code}
> I can reproduce the problem with either local or yarn-cluster. It seems 
> happening to MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

2015-03-04 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347826#comment-14347826
 ] 

Ryan Blue commented on HIVE-9863:
-

This was fixed in PARQUET-108. The problem was that the constructor that Hive 
uses was converting the N block metadata to offsets incorrectly and getting the 
first block's offset N times. This will be fixed in the 1.6.0 release, but we 
can probably do a patch release sooner if it's a blocker for Hive.

> Querying parquet tables fails with IllegalStateException [Spark Branch]
> ---
>
> Key: HIVE-9863
> URL: https://issues.apache.org/jira/browse/HIVE-9863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> Not necessarily happens only in spark branch, queries such as select count(*) 
> from table_name fails with error:
> {code}
> hive> select * from content limit 2;
> OK
> Failed with exception java.io.IOException:java.lang.IllegalStateException: 
> All the offsets listed in the split should be found in the file. expected: 
> [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] 
> BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY 
>  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] 
> BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] 
> INT64  [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP 
> [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, 
> ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
> BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, 
> PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] 
> INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP 
> [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP 
> [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP 
> [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP 
> [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 
> 129785482, 260224757] in range 0, 134217728
> Time taken: 0.253 seconds
> hive> 
> {code}
> I can reproduce the problem with either local or yarn-cluster. It seems 
> happening to MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

2015-03-04 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347821#comment-14347821
 ] 

Xuefu Zhang commented on HIVE-9863:
---

cc: [~rdblue]

> Querying parquet tables fails with IllegalStateException [Spark Branch]
> ---
>
> Key: HIVE-9863
> URL: https://issues.apache.org/jira/browse/HIVE-9863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> Not necessarily happens only in spark branch, queries such as select count(*) 
> from table_name fails with error:
> {code}
> hive> select * from content limit 2;
> OK
> Failed with exception java.io.IOException:java.lang.IllegalStateException: 
> All the offsets listed in the split should be found in the file. expected: 
> [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] 
> BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY 
>  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] 
> BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] 
> INT64  [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP 
> [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, 
> ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
> BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, 
> PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] 
> INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP 
> [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP 
> [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP 
> [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP 
> [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 
> 129785482, 260224757] in range 0, 134217728
> Time taken: 0.253 seconds
> hive> 
> {code}
> I can reproduce the problem with either local or yarn-cluster. It seems 
> happening to MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

2015-03-04 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347809#comment-14347809
 ] 

Xuefu Zhang commented on HIVE-9863:
---

More errors in hive.log:
{code}
   at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:265)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:212)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:332)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:715)
at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:236)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:212)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:251)
... 18 more
Caused by: java.lang.IllegalStateException: All the offsets listed in the split 
should be found in the file. expected: [4, 4] found: [BlockMetaData{69644, 
881917418 [ColumnMetaData{GZIP [guid] BINARY  [PLAIN, BIT_PACKED], 4}, 
ColumnMetaData{GZIP [collection_name] BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 
389571}, ColumnMetaData{GZIP [doc_type] BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 
389790}, ColumnMetaData{GZIP [stage] INT64  [PLAIN_DICTIONARY, BIT_PACKED], 
389887}, ColumnMetaData{GZIP [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
BIT_PACKED], 397673}, ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, 
PLAIN_DICTIONARY, BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  
[RLE, PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP 
[content_size] INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, 
ColumnMetaData{GZIP [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, 
ColumnMetaData{GZIP [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, 
ColumnMetaData{GZIP [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, 
ColumnMetaData{GZIP [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out 
of: [4, 129785482, 260224757] in range 0, 134217728
at 
parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:180)
at 
parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:138)
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:111)
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:76)
at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:66)
... 23 more
2015-03-04 15:54:52,374 WARN  [task-result-getter-1]: scheduler.TaskSetManager 
(Logging.scala:logWarning(71)) - Lost task 0.0 in stage 0.0 (TID 1, localhost): 
java.io.IOException: java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at 
org.apache.hadoop.hive.