HeartSaVioR edited a comment on pull request #28707: URL: https://github.com/apache/spark/pull/28707#issuecomment-639239790
My alternative with wrapping state store is something like below: ``` class RowValidatingStateStore( underlying: StateStore, keyType: Seq[DataType], valueType: Seq[DataType]) extends StateStore { private var isValidated = false override def get(key: UnsafeRow): UnsafeRow = { val value = underlying.get(key) if (!isValidated) { validateRow(value, valueType) isValidated = true } value } override def id: StateStoreId = underlying.id override def version: Long = underlying.version override def put(key: UnsafeRow, value: UnsafeRow): Unit = underlying.put(key, value) override def remove(key: UnsafeRow): Unit = underlying.remove(key) override def commit(): Long = underlying.commit() override def abort(): Unit = underlying.abort() override def iterator(): Iterator[UnsafeRowPair] = underlying.iterator() override def metrics: StateStoreMetrics = underlying.metrics override def hasCommitted: Boolean = underlying.hasCommitted private def validateRow(row: UnsafeRow, rowDataType: Seq[DataType]): Unit = { // TODO: call util method with row and data type to validate - note that it can only check with value schema } } def get(...): StateStore = { require(version >= 0) val storeProvider = loadedProviders.synchronized { ... } // TODO: add if statement to see whether it should wrap state store or not new RowValidatingStateStore(storeProvider.getStore(version, keySchema, valueSchema)) } ``` The example code only checks in get operation, which is insufficient to check "key" row in state. That said, iterator approach still provides more possibility of validation, though the validation of unsafe row itself doesn't have enough coverage of checking various incompatibility issues (Definitely we should have another guards as well) so that's a sort of OK to only cover value side. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org