Steve Loughran created SPARK-56637:
--------------------------------------

             Summary: Variant getFieldByKey() on large objects silently fails 
if variant metadata is unsorted
                 Key: SPARK-56637
                 URL: https://issues.apache.org/jira/browse/SPARK-56637
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 4.2.0
            Reporter: Steve Loughran


Variant method getFieldByKey(String key) looks up a key by simple walk if key 
count < 32, binary search if above that. But the binary search assumes the 
metadata is sorted. This is optional according to the format spec; there's a 
bit in the variant to indicate whether or not a variant's metadata is unsorted

Spark Variant class must do a full scan on unsorted variants. (that's ignoring 
the performance penalty of the scans); iceberg has adopted and parquet is 
adopting caching there.

Parquet has it's own version of this bug, 
https://github.com/apache/parquet-java/issues/3529



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to