Anupam Yadav created SPARK-57052:
------------------------------------
Summary: [SS] Add state row format validation to multiGet and
valuesIterator in RocksDBStateStoreProvider
Key: SPARK-57052
URL: https://issues.apache.org/jira/browse/SPARK-57052
Project: Spark
Issue Type: Improvement
Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: Anupam Yadav
RocksDBStateStoreProvider performs state row format validation (via
StateStoreProvider.validateStateRowFormat) on most read-path methods to catch
schema incompatibilities early. However, two read methods that decode rows are
missing this validation:
||Method||Decodes rows?||Has validation?||
|get()|Yes|(/)|
|iterator()|Yes|(/)|
|prefixScan()|Yes|(/)|
|rangeScan()|Yes|(/)|
|multiGet()|Yes|(x)|
|valuesIterator()|Yes|(x)|
multiGet() decodes values via encodedValues.map(kvEncoder._2.decodeValue) but
never calls validateStateRowFormat. Similarly, valuesIterator() decodes values
via valueEncoder.decodeValues() without validation.
If a stateful operator evolves its schema between restarts, these methods will
silently return corrupted data instead of failing fast with
StateStoreKeyRowFormatValidationFailure /
StateStoreValueRowFormatValidationFailure.
h3. Fix
Add the same validateStateRowFormat guard (gated by !isValidated && value !=
null && !useColumnFamilies) to multiGet() and valuesIterator(), matching the
existing pattern in get() and iterator().
h3. Acceptance Criteria
* multiGet() calls validateStateRowFormat on the first non-null result,
consistent with get()
* valuesIterator() calls validateStateRowFormat on the first decoded row,
consistent with iterator()
* Add unit tests that verify StateStoreKeyRowFormatValidationFailure is thrown
from both methods when a mismatched schema is used
* Existing RocksDBStateStoreProviderSuite tests pass with no regressions
----
*Prior art:* SPARK-56539 / [PR
#55468|https://github.com/apache/spark/pull/55468] added the same validation to
prefixScan and rangeScan.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]