[
https://issues.apache.org/jira/browse/HIVE-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111429#comment-14111429
]
Venkata Puneet Ravuri commented on HIVE-7886:
---------------------------------------------
The data files are in correct RCFile format. When I run 'select *' on this
table, the data is returned correctly.
> Aggregation queries fail with RCFile based Hive tables with S3 storage
> ----------------------------------------------------------------------
>
> Key: HIVE-7886
> URL: https://issues.apache.org/jira/browse/HIVE-7886
> Project: Hive
> Issue Type: Bug
> Components: File Formats
> Affects Versions: 0.13.1
> Reporter: Venkata Puneet Ravuri
>
> Aggregation queries on Hive tables which use RCFile format and S3 storage are
> failing.
> My setup is Hadoop 2.5.0 and Hive 0.13.1.
> I create a table with following schema:-
> CREATE EXTERNAL TABLE `testtable`(
> `col1` string,
> `col2` tinyint,
> `col3` int,
> `col4` float,
> `col5` boolean,
> `col6` smallint)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
> WITH SERDEPROPERTIES (
> 'serialization.format'='\t',
> 'line.delim'='\n',
> 'field.delim'='\t'
> )
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
> LOCATION
> 's3n://<testbucket>/testtable';
> When I run 'select count(*) from testtable', it gives the following exception
> stack:-
> Error: java.io.IOException: java.io.IOException: java.io.EOFException:
> Attempted to seek or read past the end of the file
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:256)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:171)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198)
> at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.io.IOException: java.io.EOFException: Attempted to seek or
> read past the end of the file
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344)
> at
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
> at
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:254)
> ... 11 more
> Caused by: java.io.EOFException: Attempted to seek or read past the end of
> the file
> at
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:462)
> at
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
> at
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieve(Jets3tNativeFileSystemStore.java:234)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at org.apache.hadoop.fs.s3native.$Proxy17.retrieve(Unknown Source)
> at
> org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.seek(NativeS3FileSystem.java:205)
> at
> org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:96)
> at
> org.apache.hadoop.fs.BufferedFSInputStream.skip(BufferedFSInputStream.java:67)
> at java.io.DataInputStream.skipBytes(DataInputStream.java:220)
> at
> org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer.readFields(RCFile.java:739)
> at
> org.apache.hadoop.hive.ql.io.RCFile$Reader.currentValueBuffer(RCFile.java:1720)
> at
> org.apache.hadoop.hive.ql.io.RCFile$Reader.getCurrentRow(RCFile.java:1898)
> at
> org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:149)
> at
> org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:44)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339)
> ... 15 more
--
This message was sent by Atlassian JIRA
(v6.2#6252)