[ 
https://issues.apache.org/jira/browse/HIVE-26840?focusedWorklogId=836011&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-836011
 ]

ASF GitHub Bot logged work on HIVE-26840:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Dec/22 20:46
            Start Date: 28/Dec/22 20:46
    Worklog Time Spent: 10m 
      Work Description: cnauroth commented on PR #3859:
URL: https://github.com/apache/hive/pull/3859#issuecomment-1366907555

   Hello @amanraj2520 . Referring to the options above:
   
   1. I think the ideal path is to get the test fixed. See below for more 
analysis from me and a proposed path forward.
   2. I'd prefer not to revert back to the earlier version, because we set a 
goal on the 3.2 release to upgrade dependencies and clear out their CVEs as 
much as possible. That said, I reviewed CVEs on the older release, and I don't 
think they have much practical impact on Hive, so I'm not opposed to this as a 
fallback option if we get stuck.
   3. I'd be concerned about a significant major version bump all the way to 
Arrow 2.0.0. I don't know Arrow well enough to comment on 
backward-compatibility of that upgrade.
   
   The test is failing specifically on [serializing a row of nulls in all 
columns](https://github.com/apache/hive/blob/branch-3/ql/src/test/org/apache/hadoop/hive/ql/io/arrow/TestArrowColumnarBatchSerDe.java#L374-L378).
 I confirmed that it's this row specifically by commenting out the row and 
seeing the test pass. I also confirmed that it's specifically failing while 
serializing a struct column. (Null values for primitive types are fine.)
   
   The problem appears to be that the serializer does not correctly track null 
values within a null struct. We should be calling the Arrow `vector.setNull`, 
but instead, it ends up calling `vector.setSafe`. There is another patch, 
[HIVE-25243](https://issues.apache.org/jira/browse/HIVE-25243) / PR #2391 that 
fixed this on master. I tried applying both your patch and a slightly different 
version of HIVE-25243, and then the test passed locally. The only thing I don't 
understand is why this was ever passing with the old version. I guess there are 
some versions of Arrow + Netty that are just more tolerant of clients calling 
`vector.setSafe` with null values.
   
   I propose that first, we merge in the current pull request, even with a 
known test failure. There are already a lot of changes in this patch. Then, I 
can queue up a separate backport of HIVE-25243. This is non-binding though, so 
let's see if we can get confirmation on the plan from a committer.




Issue Time Tracking
-------------------

    Worklog Id:     (was: 836011)
    Time Spent: 3h 40m  (was: 3.5h)

> Backport of HIVE-23073 and HIVE-24138
> -------------------------------------
>
>                 Key: HIVE-26840
>                 URL: https://issues.apache.org/jira/browse/HIVE-26840
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Aman Raj
>            Assignee: Aman Raj
>            Priority: Critical
>              Labels: pull-request-available
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to