[jira] [Commented] (DRILL-7326) Support repeated lists for CTAS parquet format

ASF GitHub Bot (Jira) Wed, 21 Aug 2019 04:33:48 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912205#comment-16912205
 ]


ASF GitHub Bot commented on DRILL-7326:
---------------------------------------

ihuzenko commented on pull request #1844: DRILL-7326: Support repeated lists 
for CTAS parquet format
URL: https://github.com/apache/drill/pull/1844#discussion_r316135192
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java
 ##########
 @@ -287,40 +290,70 @@ private Type getType(MaterializedField field) {
     DataMode dataMode = field.getType().getMode();
     switch (minorType) {
       case MAP:
-        List<Type> types = Lists.newArrayList();
-        for (MaterializedField childField : field.getChildren()) {
-          types.add(getType(childField));
-        }
+        List<Type> types = getChildrenTypes(field);
         return new GroupType(dataMode == DataMode.REPEATED ? 
Repetition.REPEATED : Repetition.OPTIONAL, field.getName(), types);
       case LIST:
-        MaterializedField elementField = field.getChildren().iterator().next();
+        MaterializedField elementField = getDataField(field);
         ListBuilder<GroupType> listBuilder = org.apache.parquet.schema.Types
             .list(dataMode == DataMode.OPTIONAL ? Repetition.OPTIONAL : 
Repetition.REQUIRED);
         addElementType(listBuilder, elementField);
         GroupType listType = listBuilder.named(field.getName());
         return listType;
       case NULL:
         MaterializedField newField = field.withType(
-          
TypeProtos.MajorType.newBuilder().setMinorType(MinorType.INT).setMode(DataMode.OPTIONAL).build());
+            
TypeProtos.MajorType.newBuilder().setMinorType(MinorType.INT).setMode(DataMode.OPTIONAL).build());
         return getPrimitiveType(newField);
       default:
         return getPrimitiveType(field);
     }
   }
 
+  /**
+   * Helper method for conversion of map child
+   * fields.
+   *
+   * @param field map
+   * @return converted child fields
+   */
+  private List<Type> getChildrenTypes(MaterializedField field) {
+    return field.getChildren().stream()
+        .map(this::getType)
+        .collect(Collectors.toList());
+  }
+
+  /**
+   * Finds data child field of list or repeated type.
+   *
+   * @param field parent repeated field
+   * @return child data field
+   */
+  private MaterializedField getDataField(MaterializedField field) {
+    return field.getChildren().stream()
+        .filter(child -> 
BaseRepeatedValueVector.DATA_VECTOR_NAME.equals(child.getName()))
+        .findFirst()
 
 Review comment:
   What's wrong with ```findFirst``` if this is simply sequential stream with 
one data and possibly one offset field ? 
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support repeated lists for CTAS parquet format
> ----------------------------------------------
>
>                 Key: DRILL-7326
>                 URL: https://issues.apache.org/jira/browse/DRILL-7326
>             Project: Apache Drill
>          Issue Type: New Feature
>    Affects Versions: 1.16.0
>            Reporter: Pavel Semenov
>            Assignee: Igor Guzenko
>            Priority: Major
>
> *STEPS TO REPRODUCE*
>  # Create json file with which has double nesting array as a value e.g.
> {code:java}"stringx2":[["asd","фывфы","asg"],["as","acz","gte"],["as","tdf","dsd"]]
>  {code}
>  # Use CTAS to create table in drill with created json file
>  # Observe the result
> *EXPECTED RESULT*
>  Table is created
> *ACTUAL RESULT*
>  UnsupportedOperationException appears on attempting to create the table
> *ADDITIONAL INFO*
>  It is possible to create table with with *single* nested array
>  Error log
> {code:java}
> Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST
> Fragment 0:0
> Please, refer to logs for more information.
> [Error Id: c48c6154-30a1-49c8-ac3b-7c2f898a7f4e on node1.cluster.com:31010]
> (java.lang.UnsupportedOperationException) Unsupported type LIST
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():295
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema():226
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema():211
>  org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema():160
>  org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():108
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186
>  org.apache.drill.exec.record.AbstractRecordBatch.next():126
>  org.apache.drill.exec.record.AbstractRecordBatch.next():116
>  org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>  
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>  org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
>  java.security.AccessController.doPrivileged():-2
>  javax.security.auth.Subject.doAs():422
>  org.apache.hadoop.security.UserGroupInformation.doAs():1669
>  org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
>  org.apache.drill.common.SelfCleaningRunnable.run():38
>  java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>  java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>  java.lang.Thread.run():748 (state=,code=0)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (DRILL-7326) Support repeated lists for CTAS parquet format

Reply via email to