HeartSaVioR edited a comment on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639239790


   My alternative with wrapping state store is something like below:
   
   ```
     class RowValidatingStateStore(
         underlying: StateStore,
         keyType: Seq[DataType],
         valueType: Seq[DataType]) extends StateStore {
       private var isValidated = false
   
       override def get(key: UnsafeRow): UnsafeRow = {
         val value = underlying.get(key)
         if (!isValidated) {
           validateRow(value, valueType)
           isValidated = true
         }
         value
       }
   
       override def id: StateStoreId = underlying.id
       override def version: Long = underlying.version
       override def put(key: UnsafeRow, value: UnsafeRow): Unit = 
underlying.put(key, value)
       override def remove(key: UnsafeRow): Unit = underlying.remove(key)
       override def commit(): Long = underlying.commit()
       override def abort(): Unit = underlying.abort()
       override def iterator(): Iterator[UnsafeRowPair] = underlying.iterator()
       override def metrics: StateStoreMetrics = underlying.metrics
       override def hasCommitted: Boolean = underlying.hasCommitted
   
       private def validateRow(row: UnsafeRow, rowDataType: Seq[DataType]): 
Unit = {
         // TODO: call util method with row and data type to validate - note 
that it can only check with value schema
       }
     }
   
     def get(...): StateStore = {
       require(version >= 0)
       val storeProvider = loadedProviders.synchronized {
         ...
       }
       // TODO: add if statement to see whether it should wrap state store or 
not
       new RowValidatingStateStore(storeProvider.getStore(version, keySchema, 
valueSchema))
     }
   ```
   
   The example code only checks in get operation, which is insufficient to 
check "key" row in state. That said, iterator approach still provides more 
possibility of validation, though the validation of unsafe row itself doesn't 
have enough coverage of checking various incompatibility issues (Definitely we 
should have another guards as well) so that's a sort of OK to only cover value 
side.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to