GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/19416

    [SPARK-22187][SS] Update unsaferow format for saved state such that we can 
set timeouts when state is null

    ## What changes were proposed in this pull request?
    
    Currently, the group state of user-defined-type is encoded as top-level 
columns in the UnsafeRows stores in the state store. The timeout timestamp is 
also saved as (when needed) as the last top-level column. Since the group state 
is serialized to top-level columns, you cannot save "null" as a value of state 
(setting null in all the top-level columns is not equivalent). So we don't let 
the user set the timeout without initializing the state for a key. Based on 
user experience, this leads to confusion.
    
    This PR is to change the row format such that the state is saved as nested 
columns. This would allow the state to be set to null, and avoid these 
confusing corner cases.
    
    ## How was this patch tested?
    Refactored tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark SPARK-22187

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19416.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19416
    
----
commit 301e0a15b87be8cd1c71090ece3497191bbd3881
Author: Tathagata Das <tathagata.das1...@gmail.com>
Date:   2017-09-29T03:10:34Z

    Refactored all state operations into separate inner class

commit 64a8d865f71a92ed9f76879eb6c5a24d1fef8cec
Author: Tathagata Das <tathagata.das1...@gmail.com>
Date:   2017-10-03T02:39:05Z

    Refactored and changed state format

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to