[ https://issues.apache.org/jira/browse/HIVE-26840?focusedWorklogId=836011&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-836011 ]
ASF GitHub Bot logged work on HIVE-26840: ----------------------------------------- Author: ASF GitHub Bot Created on: 28/Dec/22 20:46 Start Date: 28/Dec/22 20:46 Worklog Time Spent: 10m Work Description: cnauroth commented on PR #3859: URL: https://github.com/apache/hive/pull/3859#issuecomment-1366907555 Hello @amanraj2520 . Referring to the options above: 1. I think the ideal path is to get the test fixed. See below for more analysis from me and a proposed path forward. 2. I'd prefer not to revert back to the earlier version, because we set a goal on the 3.2 release to upgrade dependencies and clear out their CVEs as much as possible. That said, I reviewed CVEs on the older release, and I don't think they have much practical impact on Hive, so I'm not opposed to this as a fallback option if we get stuck. 3. I'd be concerned about a significant major version bump all the way to Arrow 2.0.0. I don't know Arrow well enough to comment on backward-compatibility of that upgrade. The test is failing specifically on [serializing a row of nulls in all columns](https://github.com/apache/hive/blob/branch-3/ql/src/test/org/apache/hadoop/hive/ql/io/arrow/TestArrowColumnarBatchSerDe.java#L374-L378). I confirmed that it's this row specifically by commenting out the row and seeing the test pass. I also confirmed that it's specifically failing while serializing a struct column. (Null values for primitive types are fine.) The problem appears to be that the serializer does not correctly track null values within a null struct. We should be calling the Arrow `vector.setNull`, but instead, it ends up calling `vector.setSafe`. There is another patch, [HIVE-25243](https://issues.apache.org/jira/browse/HIVE-25243) / PR #2391 that fixed this on master. I tried applying both your patch and a slightly different version of HIVE-25243, and then the test passed locally. The only thing I don't understand is why this was ever passing with the old version. I guess there are some versions of Arrow + Netty that are just more tolerant of clients calling `vector.setSafe` with null values. I propose that first, we merge in the current pull request, even with a known test failure. There are already a lot of changes in this patch. Then, I can queue up a separate backport of HIVE-25243. This is non-binding though, so let's see if we can get confirmation on the plan from a committer. Issue Time Tracking ------------------- Worklog Id: (was: 836011) Time Spent: 3h 40m (was: 3.5h) > Backport of HIVE-23073 and HIVE-24138 > ------------------------------------- > > Key: HIVE-26840 > URL: https://issues.apache.org/jira/browse/HIVE-26840 > Project: Hive > Issue Type: Sub-task > Reporter: Aman Raj > Assignee: Aman Raj > Priority: Critical > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)