[jira] [Work logged] (HIVE-25553) Support Map data-type natively in Arrow format

ASF GitHub Bot (Jira) Sun, 03 Oct 2021 23:26:11 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-25553?focusedWorklogId=659417&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-659417
 ]


ASF GitHub Bot logged work on HIVE-25553:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Oct/21 06:25
            Start Date: 04/Oct/21 06:25
    Worklog Time Spent: 10m 
      Work Description: sankarh commented on a change in pull request #2689:
URL: https://github.com/apache/hive/pull/2689#discussion_r721068047



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java
##########
@@ -170,7 +171,7 @@ private static Field toField(String name, TypeInfo 
typeInfo) {
         for (int i = 0; i < structSize; i++) {
           structFields.add(toField(fieldNames.get(i), fieldTypeInfos.get(i)));
         }
-        return new Field(name, FieldType.nullable(MinorType.STRUCT.getType()), 
structFields);
+        return new Field(name, new FieldType(false, new ArrowType.Struct(), 
null), structFields);

Review comment:
       I think, this point holds good only if Struct is a child element type of 
Map column. If the column type is STRUCT, then it should be nullable. Can we 
make this change only for first case?

##########
File path: 
ql/src/java/org/apache/hadoop/hive/llap/WritableByteChannelAdapter.java
##########
@@ -93,7 +93,7 @@ public int write(ByteBuffer src) throws IOException {
     int size = src.remaining();
     //Down the semaphore or block until available
     takeWriteResources(1);
-    ByteBuf buf = allocator.buffer(size);
+    ByteBuf buf = allocator.getAsByteBufAllocator().buffer(size);

Review comment:
       Why this change relevant to this jira?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/arrow/Serializer.java
##########
@@ -226,7 +226,7 @@ public ArrowWrapperWritable 
serializeBatch(VectorizedRowBatch vectorizedRowBatch
   }
 
   private static FieldType toFieldType(TypeInfo typeInfo) {
-    return new FieldType(true, toArrowType(typeInfo), null);
+    return new FieldType(false, toArrowType(typeInfo), null);

Review comment:
       I think, even this change should come only if list or struct are 
elements of other complex data types. We cannot make it global for all data 
types.

##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java
##########
@@ -185,7 +186,7 @@ private static Field toField(String name, TypeInfo 
typeInfo) {
         final TypeInfo keyTypeInfo = mapTypeInfo.getMapKeyTypeInfo();
         final TypeInfo valueTypeInfo = mapTypeInfo.getMapValueTypeInfo();
         final StructTypeInfo mapStructTypeInfo = new StructTypeInfo();
-        mapStructTypeInfo.setAllStructFieldNames(Lists.newArrayList("keys", 
"values"));
+        mapStructTypeInfo.setAllStructFieldNames(Lists.newArrayList("key", 
"value"));

Review comment:
       Make sense.

##########
File path: 
itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniLlapArrow.java
##########
@@ -123,8 +123,8 @@ public static void afterTest() {
     return new LlapArrowRowInputFormat(Long.MAX_VALUE);
   }
 
-  // Currently MAP type is not supported. Add it back when Arrow 1.0 is 
released.
-  // See: SPARK-21187
+  // Currently, loading from a text file gives errors with Map dataType.
+  // This needs to be fixed when adding support for non-ORC writes (text and 
parquet) for the llap-ext-client.

Review comment:
       This comment doesn't make sense here as this test is not specific to 
llap-ext-client. Instead mention the Hive JIRA which would solve this issue.

##########
File path: 
itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniLlapArrow.java
##########
@@ -123,8 +123,8 @@ public static void afterTest() {
     return new LlapArrowRowInputFormat(Long.MAX_VALUE);
   }
 
-  // Currently MAP type is not supported. Add it back when Arrow 1.0 is 
released.
-  // See: SPARK-21187
+  // Currently, loading from a text file gives errors with Map dataType.
+  // This needs to be fixed when adding support for non-ORC writes (text and 
parquet) for the llap-ext-client.

Review comment:
       Ideally, anything that works for ORC should work for non-ORC as well. 
Pls enable the tests on Map data type and see if it works after Arrow upgrade.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 659417)
    Time Spent: 1h 10m  (was: 1h)

> Support Map data-type natively in Arrow format
> ----------------------------------------------
>
>                 Key: HIVE-25553
>                 URL: https://issues.apache.org/jira/browse/HIVE-25553
>             Project: Hive
>          Issue Type: Improvement
>          Components: llap, Serializers/Deserializers
>            Reporter: Adesh Kumar Rao
>            Assignee: Adesh Kumar Rao
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs 
> data-type (where stuct is containing the key-value pair of the map). This 
> causes issues when reading Map datatype using llap-ext-client as it reads a 
> list of structs instead. 
> HiveWarehouseConnector which uses the llap-ext-client throws exception when 
> the schema (containing Map data type) is different from actual data (list of 
> structs).
>  
> Fixing this issue requires upgrading arrow version (where map data-type is 
> supported), modifying ArrowColumnarBatchSerDe and corresponding 
> Serializer/Deserializer to not use list as a workaround for map and use the 
> arrow map data-type instead. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25553) Support Map data-type natively in Arrow format

Reply via email to