Steve Loughran created SPARK-56637:
--------------------------------------
Summary: Variant getFieldByKey() on large objects silently fails
if variant metadata is unsorted
Key: SPARK-56637
URL: https://issues.apache.org/jira/browse/SPARK-56637
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 4.2.0
Reporter: Steve Loughran
Variant method getFieldByKey(String key) looks up a key by simple walk if key
count < 32, binary search if above that. But the binary search assumes the
metadata is sorted. This is optional according to the format spec; there's a
bit in the variant to indicate whether or not a variant's metadata is unsorted
Spark Variant class must do a full scan on unsorted variants. (that's ignoring
the performance penalty of the scans); iceberg has adopted and parquet is
adopting caching there.
Parquet has it's own version of this bug,
https://github.com/apache/parquet-java/issues/3529
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]