[ https://issues.apache.org/jira/browse/HIVE-25443?focusedWorklogId=636846&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636846 ]
ASF GitHub Bot logged work on HIVE-25443: ----------------------------------------- Author: ASF GitHub Bot Created on: 11/Aug/21 10:12 Start Date: 11/Aug/21 10:12 Worklog Time Spent: 10m Work Description: shameersss1 opened a new pull request #2581: URL: https://github.com/apache/hive/pull/2581 …pes When there are more than 1024 values <!-- Thanks for sending a pull request! Here are some tips for you: --> ### What changes were proposed in this pull request? Instead of initializing the ColumnVector with default size which is 1024, Initialize it with the the size of record size required. ### Why are the changes needed? Changes are needed to allow Arrow SerDe to Serialize/deserialize complex data types When there are more than 1024 values ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test were added to confirm the behaviour -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 636846) Remaining Estimate: 0h Time Spent: 10m > Arrow SerDe Cannot serialize/deserialize complex data types When there are > more than 1024 values > ------------------------------------------------------------------------------------------------ > > Key: HIVE-25443 > URL: https://issues.apache.org/jira/browse/HIVE-25443 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Affects Versions: 3.1.0, 3.0.0, 3.1.1, 3.1.2 > Reporter: Syed Shameerur Rahman > Assignee: Syed Shameerur Rahman > Priority: Major > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Complex data types like MAP, STRUCT cannot be serialized/deserialzed using > Arrow SerDe when there are more than 1024 values. This happens due to > ColumnVector always being initialized with a size of 1024. > Issue #1 : > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L213 > Issue #2 : > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java#L215 > Sample unit test to reproduce the case in TestArrowColumnarBatchSerDe : > {code:java} > @Test > public void testListBooleanWithMoreThan1024Values() throws SerDeException { > String[][] schema = { > {"boolean_list", "array<boolean>"}, > }; > > Object[][] rows = new Object[1025][1]; > for (int i = 0; i < 1025; i++) { > rows[i][0] = new BooleanWritable(true); > } > > initAndSerializeAndDeserialize(schema, toList(rows)); > } > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)