Syed Shameerur Rahman created HIVE-25443:
--------------------------------------------

             Summary: Arrow SerDe Cannot serialize/deserialize complex data 
types When there are more than 1024 values
                 Key: HIVE-25443
                 URL: https://issues.apache.org/jira/browse/HIVE-25443
             Project: Hive
          Issue Type: Bug
          Components: Serializers/Deserializers
    Affects Versions: 3.1.2, 3.1.1, 3.0.0, 3.1.0
            Reporter: Syed Shameerur Rahman
            Assignee: Syed Shameerur Rahman
             Fix For: 4.0.0


Complex data types like MAP, STRUCT cannot be serialized/deserialzed using 
Arrow SerDe when there are more than 1024 values. This happens due to 
ColumnVector always being initialized with a size of 1024.

Issue #1 : 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213

Issue #2 : 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215

Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe :


{code:java}
@Test
   public void testListBooleanWithMoreThan1024Values() throws SerDeException {
     String[][] schema = {
             {"boolean_list", "array<boolean>"},
     };
  
     Object[][] rows = new Object[1025][1];
     for (int i = 0; i < 1025; i++) {
       rows[i][0] = new BooleanWritable(true);
     }
  
     initAndSerializeAndDeserialize(schema, toList(rows));
   }
  
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to