[ 
https://issues.apache.org/jira/browse/DRILL-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078926#comment-14078926
 ] 

Jacques Nadeau commented on DRILL-1058:
---------------------------------------

fixed by 0d6befc or earlier

> Unable to read or write nested/repeated data in PARQUET format
> --------------------------------------------------------------
>
>                 Key: DRILL-1058
>                 URL: https://issues.apache.org/jira/browse/DRILL-1058
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Writer
>         Environment: CentOS release 6.5
>            Reporter: Amit Katti
>             Fix For: 0.5.0
>
>         Attachments: complex.parquet
>
>
> =================================================
> DRILL WRITING A PARQUET TABLE WITH NESTED DATA
> =================================================
> I have a JSON file with nested data (schema present below):
> {"rownum":1,"name":"fred 
> ovid","age":76,"gpa":1.55,"studentnum":692315658449,"create_time":"2014-05-27 
> 00:26:07", "interests": [ "Reading", "Mountain Biking", "Hacking" ]}
> I am able to read this JSON file successfully from drill and access nested 
> values. However when I try to import this data and create a table in PARQUET 
> format, it errors:
> QUERY: create table test as select * from 
> `/user/root/sample-data/nested_student.json`;
> ERROR: Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure 
> while running query.[error_id: "3ce3dc1e-d920-4262-ae2d-28bd2d034597"
> endpoint {
>   address: "perfnode154.perf.lab"
>   user_port: 31010
>   control_port: 31011
>   data_port: 31012
> }
> error_type: 0
> message: "Failure while running fragment. < ParquetEncodingException:[ error 
> starting field interests at 6 ] < ClassCastException:[ 
> parquet.io.PrimitiveColumnIO cannot be cast to parquet.io.GroupColumnIO ]"
> ]
> Error: exception while executing query (state=,code=0)
> {code}
> 2014-06-24 00:41:18,646 [b10db58d-8d4d-4d02-9fb5-a5081e5cb254:frag:0:0] ERROR 
> o.a.d.e.w.f.AbstractStatusReporter - Error 
> 48602de2-8306-47d2-875f-8ad2cd2e964a: Failure while running fragment.
> java.lang.ClassCastException: parquet.io.PrimitiveColumnIO cannot be cast to 
> parquet.io.GroupColumnIO
>         at 
> parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.startField(MessageColumnIO.java:171)
>  ~[parquet-column-1.5.0-20140513.004024-1.jar:na]
>         at 
> org.apache.drill.exec.store.ParquetOutputRecordWriter.addRepeatedVarCharHolder(ParquetOutputRecordWriter.java:761)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>         at 
> org.apache.drill.exec.store.EventBasedRecordWriter$RepeatedVarCharFieldWriter.writeField(EventBasedRecordWriter.java:1156)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>         at 
> org.apache.drill.exec.store.EventBasedRecordWriter.write(EventBasedRecordWriter.java:150)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:111)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:91)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:72)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:65)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:45)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:94)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:91)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:56) 
> ~[drill-java-exec-1.0.0-m2-incubat
> ing-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:85)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:46) 
> ~[drill-java-exec-1.0.0-m2-incubat
> ing-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:100)
>  ~[drill-java-exec-1.0.0-m2
> -incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> {code}
> =================================================
> DRILL READING A PARQUET TABLE WITH NESTED DATA
> =================================================
> I generated a parquet file by reading the below Json file into pig and 
> storing it in a parquet format:
> {"recipe":"Tacos","ingredients":[{"name":"Beef"},{"name":"Lettuce"},{"name":"Cheese"}],"inventor":{"name":"Alex","age":25}}
> {"recipe":"TomatoSoup","ingredients":[{"name":"Tomatoes"},{"name":"Milk"}],"inventor":{"name":"Steve","age":23}}
> When I try to read this parquet table in Drill, it errors:
> QUERY: Select * from `/user/root/complex.parquet`;
> ERROR: Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure 
> while running query.[error_id: "c2e735f4-e11c-4e10-a410-959b3880dce0"
> endpoint {
>   address: "perfnode154.perf.lab"
>   user_port: 31010
>   control_port: 31011
>   data_port: 31012
> }
> error_type: 0
> message: "Failure while running fragment. < UnsupportedOperationException:[ 
> unsupported type: BINARY LIST ]"
> ]
> Error: exception while executing query (state=,code=0)
> {code}
> 2014-07-23 22:16:45,239 [d106ad59-595f-42e7-880a-ef9f6bff1ff0:frag:0:0] DEBUG 
> o.a.d.e.w.fragment.FragmentExecutor - Failure while initializing operator tree
> java.lang.UnsupportedOperationException: unsupported type: BINARY LIST
>       at 
> org.apache.drill.exec.store.parquet.ParquetRecordReader.toMajorType(ParquetRecordReader.java:446)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.store.parquet.ParquetRecordReader.setup(ParquetRecordReader.java:219)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.impl.ScanBatch.<init>(ScanBatch.java:93) 
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:126)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:47)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) 
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) 
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitSubScan(AbstractPhysicalVisitor.java:113)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.store.parquet.ParquetRowGroupScan.accept(ParquetRowGroupScan.java:113)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) 
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) 
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitIteratorValidator(AbstractPhysicalVisitor.java:196)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.config.IteratorValidator.accept(IteratorValidator.java:34)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) 
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) 
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitProducerConsumer(AbstractPhysicalVisitor.java:191)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.config.ProducerConsumer.accept(ProducerConsumer.java:42)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:62) 
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) 
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitIteratorValidator(AbstractPhysicalVisitor.java:196)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.config.IteratorValidator.accept(IteratorValidator.java:34)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:74)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:59) 
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.impl.ImplCreator.visitOp(ImplCreator.java:39) 
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitStore(AbstractPhysicalVisitor.java:118)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitScreen(AbstractPhysicalVisitor.java:176)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at org.apache.drill.exec.physical.config.Screen.accept(Screen.java:95) 
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:87) 
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:81)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:242)
>  
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_60]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_60]
>       at java.lang.Thread.run(Thread.java:745) [na:1.7.0_60]
> {code}
> I am able to verify that it has repeated data by dumping the parquet file 
> using parquet-tools
> {code}
> ./parquet-tools dump badpigparquet 
> row group 0 
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> recipe:       BINARY UNCOMPRESSED DO:0 FPO:4 SZ:85/85/1.00 VC:6 
> ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY
> ingredients: 
> .bag:        
> ..name:       BINARY UNCOMPRESSED DO:0 FPO:89 SZ:120/120/1.00 VC:15 
> ENC:RLE,PLAIN_DICTIONARY
> inventor:    
> .name:        BINARY UNCOMPRESSED DO:0 FPO:209 SZ:74/74/1.00 VC:6 
> ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY
> .age:         INT32 UNCOMPRESSED DO:0 FPO:283 SZ:64/64/1.00 VC:6 
> ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY
>     recipe TV=6 RL=0 DL=1 DS:                2 DE:PLAIN_DICTIONARY
>     
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>     page 0:                                   DLE:RLE RLE:BIT_PACKED 
> VLE:PLAIN_DICTIONARY SZ:9 VC:6
>     ingredients.bag.name TV=15 RL=1 DL=3 DS: 5 DE:PLAIN_DICTIONARY
>     
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>     page 0:                                   DLE:RLE RLE:RLE 
> VLE:PLAIN_DICTIONARY SZ:21 VC:15
>     inventor.name TV=6 RL=0 DL=2 DS:         2 DE:PLAIN_DICTIONARY
>     
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>     page 0:                                   DLE:RLE RLE:BIT_PACKED 
> VLE:PLAIN_DICTIONARY SZ:10 VC:6
>     inventor.age TV=6 RL=0 DL=2 DS:          2 DE:PLAIN_DICTIONARY
>     
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>     page 0:                                   DLE:RLE RLE:BIT_PACKED 
> VLE:PLAIN_DICTIONARY SZ:10 VC:6
> BINARY recipe 
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> *** row group 1 of 1, values 1 to 6 *** 
> value 1: R:0 D:1 V:Tacos
> value 2: R:0 D:1 V:TomatoSoup
> value 3: R:0 D:1 V:Tacos
> value 4: R:0 D:1 V:TomatoSoup
> value 5: R:0 D:1 V:Tacos
> value 6: R:0 D:1 V:TomatoSoup
> BINARY ingredients.bag.name 
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> *** row group 1 of 1, values 1 to 15 *** 
> value 1:  R:0 D:3 V:Beef
> value 2:  R:1 D:3 V:Lettuce
> value 3:  R:1 D:3 V:Cheese
> value 4:  R:0 D:3 V:Tomatoes
> value 5:  R:1 D:3 V:Milk
> value 6:  R:0 D:3 V:Beef
> value 7:  R:1 D:3 V:Lettuce
> value 8:  R:1 D:3 V:Cheese
> value 9:  R:0 D:3 V:Tomatoes
> value 10: R:1 D:3 V:Milk
> value 11: R:0 D:3 V:Beef
> value 12: R:1 D:3 V:Lettuce
> value 13: R:1 D:3 V:Cheese
> value 14: R:0 D:3 V:Tomatoes
> value 15: R:1 D:3 V:Milk
> BINARY inventor.name 
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> *** row group 1 of 1, values 1 to 6 *** 
> value 1: R:0 D:2 V:Alex
> value 2: R:0 D:2 V:Steve
> value 3: R:0 D:2 V:Alex
> value 4: R:0 D:2 V:Steve
> value 5: R:0 D:2 V:Alex
> value 6: R:0 D:2 V:Steve
> INT32 inventor.age 
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> *** row group 1 of 1, values 1 to 6 *** 
> value 1: R:0 D:2 V:25
> value 2: R:0 D:2 V:23
> value 3: R:0 D:2 V:25
> value 4: R:0 D:2 V:23
> value 5: R:0 D:2 V:25
> value 6: R:0 D:2 V:23
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to