Kris Mok created SPARK-39839:
--------------------------------

             Summary: Handle special case of null variable-length Decimal with 
non-zero offsetAndSize in UnsafeRow structural integrity check
                 Key: SPARK-39839
                 URL: https://issues.apache.org/jira/browse/SPARK-39839
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.3.0, 3.2.0, 3.1.0
            Reporter: Kris Mok


The {{UnsafeRow}} structural integrity check in 
{{UnsafeRowUtils.validateStructuralIntegrity}} is added in Spark 3.1.0. It’s 
supposed to validate that a given {{UnsafeRow}} conforms to the format that the 
{{UnsafeRowWriter}} would have produced.

Currently the check expects all fields that are marked as null should also have 
its field (i.e. the fixed-length part) set to all zeros. It needs to be updated 
to handle a special case for variable-length {{{}Decimal{}}}s, where the 
{{UnsafeRowWriter}} may mark a field as null but also leave the fixed-length 
part of the field as {{OffsetAndSize(offset=current_offset, size=0)}}. This may 
happen when the {{Decimal}} being written is either a real {{null}} or has 
overflowed the specified precision.

Logic in {{UnsafeRowWriter}}:

in general:
{code:scala}
  public void setNullAt(int ordinal) {
    BitSetMethods.set(getBuffer(), startingOffset, ordinal); // set null bit
    write(ordinal, 0L);                                      // also zero out 
the fixed-length field
  } {code}
special case for {{DecimalType}}:
{code:scala}
      // Make sure Decimal object has the same scale as DecimalType.
      // Note that we may pass in null Decimal object to set null for it.
      if (input == null || !input.changePrecision(precision, scale)) {
        BitSetMethods.set(getBuffer(), startingOffset, ordinal); // set null bit
        // keep the offset for future update
        setOffsetAndSize(ordinal, 0);                            // doesn't 
zero out the fixed-length field
      } {code}
The special case is introduced to allow all {{DecimalType}}s (including both 
fixed-length and variable-length ones) to be mutable – thus need to leave space 
for the variable-length field even if it’s currently null.

Note that this special case in {{UnsafeRowWriter}} has been there since Spark 
1.6.0, where as the integrity check was added in Spark 3.1.0. The check was 
originally added for Structured Streaming’s checkpoint evolution validation, so 
that a newer version of Spark can check whether or not an older checkpoint file 
for Structured Streaming queries can be supported, and/or if the contents of 
the checkpoint file is corrupted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to