[ https://issues.apache.org/jira/browse/HIVE-25120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
George Song updated HIVE-25120: ------------------------------- Description: In parquet 1.12.0 the modular encryption feature is introduced. https://issues.apache.org/jira/browse/PARQUET-1178 VectorizedParquetRecordReader can't read parquet files with encrypted footer. It throws the following exceptions. {code:java} Error: java.io.IOException: java.lang.reflect.InvocationTargetExceptionError: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:271) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:217) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:345) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:719) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:175) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:257) ... 11 moreCaused by: java.lang.RuntimeException: org.apache.parquet.crypto.ParquetCryptoRuntimeException: Trying to read file with encrypted footer. No keys available at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.<init>(VectorizedParquetRecordReader.java:156) at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:99) ... 16 moreCaused by: org.apache.parquet.crypto.ParquetCryptoRuntimeException: Trying to read file with encrypted footer. No keys available at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:588) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:527) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:521) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readFooterFromFile(VectorizedParquetRecordReader.java:345) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:310) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:222) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.<init>(VectorizedParquetRecordReader.java:151) ... 19 more{code} was: Taking an example of a parquet table having array of integers as below. {code:java} CREATE EXTERNAL TABLE ( list_of_ints` array<int>) STORED AS PARQUET LOCATION '{location}'; {code} Parquet file generated using hive will have schema for Type as below: {code:java} group list_of_ints (LIST) { repeated group bag { optional int32 array;\n};\n} {code} Parquet file generated using thrift or any custom tool (using org.apache.parquet.io.api.RecordConsumer) may have schema for Type as below: {code:java} required group list_of_ints (LIST) { repeated int32 list_of_tuple} {code} VectorizedParquetRecordReader handles only parquet file generated using hive. It throws the following exception when parquet file generated using thrift is read because of the changes done as part of HIVE-18553 . {code:java} Caused by: java.lang.ClassCastException: repeated int32 list_of_ints_tuple is not a group at org.apache.parquet.schema.Type.asGroupType(Type.java:207) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.getElementType(VectorizedParquetRecordReader.java:479) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:532) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:440) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:401) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353) at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365){code} I have done a small change to handle the case where the child type of group type can be PrimitiveType. > VectorizedParquetRecordReader can't to read parquet file with encrypted footer > ------------------------------------------------------------------------------ > > Key: HIVE-25120 > URL: https://issues.apache.org/jira/browse/HIVE-25120 > Project: Hive > Issue Type: Bug > Components: Parquet > Affects Versions: 3.1.2, 4.0.0 > Reporter: George Song > Assignee: Ganesha Shreedhara > Priority: Major > > In parquet 1.12.0 the modular encryption feature is introduced. > https://issues.apache.org/jira/browse/PARQUET-1178 > VectorizedParquetRecordReader can't read parquet files with encrypted footer. > It throws the following exceptions. > {code:java} > Error: java.io.IOException: java.lang.reflect.InvocationTargetExceptionError: > java.io.IOException: java.lang.reflect.InvocationTargetException at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:271) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:217) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:345) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:719) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:175) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)Caused by: > java.lang.reflect.InvocationTargetException at > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:257) > ... 11 moreCaused by: java.lang.RuntimeException: > org.apache.parquet.crypto.ParquetCryptoRuntimeException: Trying to read file > with encrypted footer. No keys available at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.<init>(VectorizedParquetRecordReader.java:156) > at > org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:50) > at > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:87) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:99) > ... 16 moreCaused by: > org.apache.parquet.crypto.ParquetCryptoRuntimeException: Trying to read file > with encrypted footer. No keys available at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:588) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:527) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:521) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readFooterFromFile(VectorizedParquetRecordReader.java:345) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.readSplitFooter(VectorizedParquetRecordReader.java:310) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:222) > at > org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.<init>(VectorizedParquetRecordReader.java:151) > ... 19 more{code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)