GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/19416
[SPARK-22187][SS] Update unsaferow format for saved state such that we can set timeouts when state is null ## What changes were proposed in this pull request? Currently, the group state of user-defined-type is encoded as top-level columns in the UnsafeRows stores in the state store. The timeout timestamp is also saved as (when needed) as the last top-level column. Since the group state is serialized to top-level columns, you cannot save "null" as a value of state (setting null in all the top-level columns is not equivalent). So we don't let the user set the timeout without initializing the state for a key. Based on user experience, this leads to confusion. This PR is to change the row format such that the state is saved as nested columns. This would allow the state to be set to null, and avoid these confusing corner cases. ## How was this patch tested? Refactored tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tdas/spark SPARK-22187 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19416.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19416 ---- commit 301e0a15b87be8cd1c71090ece3497191bbd3881 Author: Tathagata Das <tathagata.das1...@gmail.com> Date: 2017-09-29T03:10:34Z Refactored all state operations into separate inner class commit 64a8d865f71a92ed9f76879eb6c5a24d1fef8cec Author: Tathagata Das <tathagata.das1...@gmail.com> Date: 2017-10-03T02:39:05Z Refactored and changed state format ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org