Max Gekk created SPARK-57184:
--------------------------------

             Summary: Null struct corrupts nested CalendarInterval column values
                 Key: SPARK-57184
                 URL: https://issues.apache.org/jira/browse/SPARK-57184
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk
            Assignee: Max Gekk


SPARK-56981 / the nanosecond-timestamp column-vector work surfaced a latent bug 
in
WritableColumnVector.appendStruct(boolean isNull).

When a struct column is appended as NULL via appendStruct(true), the method 
recurses
into child columns that are themselves struct-shaped (StructType, VariantType) 
so that
their grandchild cursors stay aligned. A CalendarInterval child column is also
struct-shaped: it is backed by three grandchild primitive columns (months as 
int,
days as int, microseconds as long). However, the recursion guard did not include
CalendarIntervalType, so an interval child took the else branch 
(c.appendNull()),
which advances only the interval column's own cursor and leaves its three 
grandchild
columns un-advanced.

As a result, for a struct column with a CalendarInterval field, appending a 
NULL parent
row leaves the interval's grandchild cursors behind by one. A subsequent 
non-null row
then writes its months/days/microseconds into the wrong (earlier) grandchild 
slots, and
reading that row back returns a skewed/garbage interval value. This is silent 
data
corruption for the nested struct-of-interval case.

Fix: include CalendarIntervalType in the recursion guard in appendStruct so 
that a null
parent struct cascades appendStruct(true) into the interval child, advancing 
all three
grandchild cursors.

This was split out of the nanosecond-timestamp ColumnVector PR (SPARK-57100) 
per review,
since it is an independent fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to