[ 
https://issues.apache.org/jira/browse/PARQUET-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225986#comment-14225986
 ] 

Daniel Haviv commented on PARQUET-83:
-------------------------------------

Hi,
It seems like something is broken now.. I ran this query on a parquet from
0.13:
0: jdbc:hive2://hdname:10000/default> select count(*) from A1;
+-----------+
+-----------+
+-----------+
1 row selected (48.401 seconds)

and when I run it from 0.15 I get:
2014-11-26 02:01:54,556 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.io.IOException:
java.lang.reflect.InvocationTargetException
        at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
        at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
        at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:312)
        at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:259)
        at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:386)
        at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:652)
        at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:168)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:409)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:298)
        ... 11 more
Caused by: java.lang.IllegalStateException: All the offsets listed in the
split should be found in the file. expected: [4, 4] found:
[BlockMetaData{1560100, 413986404 [
ColumnMetaData{GZIP [adformat] BINARY  [BIT_PACKED, RLE, PLAIN_DICTIONARY],
4}, ColumnMetaData{GZIP [adspaces] BINARY  [BIT_PACKED, RLE,
PLAIN_DICTIONARY], 336814}, Col
umnMetaData{GZIP [age] BINARY  [BIT_PACKED, RLE, PLAIN_DICTIONARY],
589854}, ColumnMetaData{GZIP [app_id] BINARY  [BIT_PACKED, RLE,
PLAIN_DICTIONARY], 625872}, ColumnMe
taData{GZIP [app_name] BINARY  [BIT_PACKED, RLE, PLAIN], 2112900},
ColumnMetaData{GZIP [bs_height] INT32  [BIT_PACKED, RLE, PLAIN_DICTIONARY],
2112949}, ColumnMetaData{
GZIP [bs_width] INT32  [BIT_PACKED, RLE, PLAIN_DICTIONARY], 2162163},
ColumnMetaData{GZIP [categories, array] BINARY  [RLE, PLAIN_DICTIONARY],
2211377}, ColumnMetaData{
GZIP [computer_id] INT32  [BIT_PACKED, RLE, PLAIN_DICTIONARY], 3835760},
ColumnMetaData{GZIP [deviceIdType] BINARY  [BIT_PACKED, RLE,
PLAIN_DICTIONARY], 3836706}, Colum
nMetaData{GZIP [deviceInfo, brand_name] BINARY  [BIT_PACKED, RLE,
PLAIN_DICTIONARY], 4189159}, ColumnMetaData{GZIP [deviceInfo,
device_matching_result] BINARY  [BIT_PAC
KED, RLE, PLAIN_DICTIONARY], 5293759}, ColumnMetaData{GZIP [deviceInfo,
device_os] BINARY  [BIT_PACKED, RLE, PLAIN_DICTIONARY], 5434944},
ColumnMetaData{GZIP [deviceInf
o, is_opera] BOOLEAN  [BIT_PACKED, RLE, PLAIN], 5801709},
ColumnMetaData{GZIP [deviceInfo, is_tablet] BOOLEAN  [BIT_PACKED, RLE,
PLAIN], 5802020}, ColumnMetaData{GZIP [
deviceInfo, is_touch_screen] BOOLEAN  [BIT_PACKED, RLE, PLAIN], 5966941},
ColumnMetaData{GZIP [deviceInfo, model_name] BINARY  [BIT_PACKED, RLE,
PLAIN_DICTIONARY], 6088
909}, ColumnMetaData{GZIP [deviceInfo, screen_height] INT32  [BIT_PACKED,
RLE, PLAIN_DICTIONARY], 8287356}, ColumnMetaData{GZIP [deviceInfo,
screen_width] INT32  [BIT_P
ACKED, RLE, PLAIN_DICTIONARY], 9167938}, ColumnMetaData{GZIP [deviceInfo,
user_agent] BINARY  [BIT_PACKED, RLE, PLAIN, PLAIN_DICTIONARY], 10078081},
ColumnMetaData{GZIP
 [device_id_hash] BINARY  [BIT_PACKED, RLE, PLAIN_DICTIONARY], 30019984},
ColumnMetaData{GZIP [gender] BINARY  [BIT_PACKED, RLE, PLAIN_DICTIONARY],
30155884}, ColumnMet
aData{GZIP [geoLocationInfo, carrier] BINARY  [BIT_PACKED, RLE,
PLAIN_DICTIONARY], 30193050}, ColumnMetaData{GZIP [geoLocationInfo, city]
BINARY  [BIT_PACKED, RLE, PLAI
N_DICTIONARY], 32588517}, ColumnMetaData{GZIP [geoLocationInfo,
connection_type] BINARY  [BIT_PACKED, RLE, PLAIN_DICTIONARY], 35068217},
ColumnMetaData{GZIP [geoLocatio
nInfo, country] BINARY  [BIT_PACKED, RLE, PLAIN_DICTIONARY], 35260730},
ColumnMetaData{GZIP [geoLocationInfo, ip] BINARY  [BIT_PACKED, RLE, PLAIN,
PLAIN_DICTIONARY], 36
396542}, ColumnMetaData{GZIP [geoLocationInfo, ip_routing_type] BINARY
 [BIT_PACKED, RLE, PLAIN_DICTIONARY], 45147363}, ColumnMetaData{GZIP
[geoLocationInfo, is_cache]
BOOLEAN  [BIT_PACKED, RLE, PLAIN], 45339937}, ColumnMetaData{GZIP
[geoLocationInfo, is_longlat_from_req] BOOLEAN  [BIT_PACKED, RLE, PLAIN],
45340248}, ColumnMetaData{GZ
IP [geoLocationInfo, latitude] DOUBLE  [BIT_PACKED, RLE, PLAIN_DICTIONARY],
45533425}, ColumnMetaData{GZIP [geoLocationInfo, longitude] DOUBLE
 [BIT_PACKED, RLE, PLAIN_
DICTIONARY], 48382336}, ColumnMetaData{GZIP [geoLocationInfo,
mobile_operator] BINARY  [BIT_PACKED, RLE, PLAIN_DICTIONARY], 51375539},
ColumnMetaData{GZIP [geoLocationI
nfo, postal_code] BINARY  [BIT_PACKED, RLE, PLAIN_DICTIONARY], 52219982},
ColumnMetaData{GZIP [geoLocationInfo, region] BINARY  [BIT_PACKED, RLE,
PLAIN_DICTIONARY], 532
57246}, ColumnMetaData{GZIP [handling_time] INT32  [BIT_PACKED, RLE,
PLAIN_DICTIONARY], 54739550}, ColumnMetaData{GZIP [impression_id] BINARY
 [BIT_PACKED, RLE, PLAIN],
 56296474}, ColumnMetaData{GZIP [markup] BINARY  [BIT_PACKED, RLE,
PLAIN_DICTIONARY], 75636083}, ColumnMetaData{GZIP [mode] BINARY
 [BIT_PACKED, RLE, PLAIN_DICTIONARY],
 75903538}, ColumnMetaData{GZIP [publisher_id] INT32  [BIT_PACKED, RLE,
PLAIN_DICTIONARY], 75905807}, ColumnMetaData{GZIP [referrer] BINARY
 [BIT_PACKED, RLE, PLAIN], 7
6560085}, ColumnMetaData{GZIP [request_date] INT64  [BIT_PACKED, RLE,
PLAIN, PLAIN_DICTIONARY], 76560134}, ColumnMetaData{GZIP [request_id]
BINARY  [BIT_PACKED, RLE, PL
AIN], 79416424}, ColumnMetaData{GZIP [request_source] BINARY  [BIT_PACKED,
RLE, PLAIN_DICTIONARY], 95503559}, ColumnMetaData{GZIP [request_type] INT32
 [BIT_PACKED, RLE
, PLAIN_DICTIONARY], 95770325}, ColumnMetaData{GZIP [selected_campaigns]
BINARY  [BIT_PACKED, RLE, PLAIN_DICTIONARY], 95770977}, ColumnMetaData{GZIP
[site_id] BINARY  [
BIT_PACKED, RLE, PLAIN_DICTIONARY], 96469724}, ColumnMetaData{GZIP
[source_ip] BINARY  [BIT_PACKED, RLE, PLAIN, PLAIN_DICTIONARY], 97567391},
ColumnMetaData{GZIP [udid]
 BINARY  [BIT_PACKED, RLE, PLAIN, PLAIN_DICTIONARY], 102171524},
ColumnMetaData{GZIP [user_id] BINARY  [BIT_PACKED, RLE, PLAIN,
PLAIN_DICTIONARY], 110665022}]}] out of:
 [4, 111186045, 221785906, 332758529, 450099685, 566558359, 677625032] in
range 0, 134217728
        at
parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:180)
        at
parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:138)
        at
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:99)
        at
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:71)
        at
org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.<init>(VectorizedParquetInputFormat.java:63)
        at
org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat.getRecordReader(VectorizedParquetInputFormat.java:153)
        at
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:65)
        at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
        ... 16 more

and

2014-11-26 02:01:55,579 INFO [main]
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator: 1 Close done
2014-11-26 02:01:55,579 INFO [main]
org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done
2014-11-26 02:01:55,579 INFO [main]
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator: 11 Close done
2014-11-26 02:01:55,583 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.io.IOException: java.io.IOException:
java.lang.NullPointerExcepti
on
        at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
        at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
        at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:273)
        at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:183)
        at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198)
        at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.io.IOException: java.lang.NullPointerException
        at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
        at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
        at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352)
        at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
        at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
        at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115)
        at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:271)
        ... 11 more
Caused by: java.lang.NullPointerException
        at
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:191)
        at
org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:117)
        at
org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:49)
        at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347)
        ... 15 more


My parquet files were generated by Spark  (I can share them if you need
them for testing purposes).

Daniel

On Tue, Nov 25, 2014 at 7:47 PM, Ryan Blue (JIRA) <[email protected]> wrote:



> Hive Query failed if the data type is array<string> with parquet files
> ----------------------------------------------------------------------
>
>                 Key: PARQUET-83
>                 URL: https://issues.apache.org/jira/browse/PARQUET-83
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: parquet-mr_1.6.0
>            Reporter: Sathish
>            Assignee: Ryan Blue
>              Labels: parquet, serde
>         Attachments: HIVE-7850.1.patch, HIVE-7850.2.patch, HIVE-7850.patch
>
>
> * Created a parquet file from the Avro file which have 1 array data type and 
> rest are primitive types. Avro Schema of the array data type. Eg: 
> {code}
> { "name" : "action", "type" : [ { "type" : "array", "items" : "string" }, 
> "null" ] }
> {code}
> * Created External Hive table with the Array type as below, 
> {code}
> create external table paraArray (action Array) partitioned by (partitionid 
> int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as 
> inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 
> 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
> alter table paraArray add partition(partitionid=1) location '/testPara';
> {code}
> * Run the following query(select action from paraArray limit 10) and the Map 
> reduce jobs are failing with the following exception.
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row [Error getting row data with exception 
> java.lang.ClassCastException: 
> parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
> org.apache.hadoop.io.ArrayWritable
> at 
> parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
> at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
> at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> ]
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
> ... 8 more
> {code}
> This issue has long back posted on Parquet issues list and Since this is 
> related to Parquet Hive serde, I have created the Hive issue here, The 
> details and history of this information are as shown in the link here 
> https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to