GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/21739

    [SPARK-22187][SS] Update unsaferow format for saved state such that we can 
set timeouts when state is null

    ## What changes were proposed in this pull request?
    
    Currently, the group state of user-defined-type is encoded as top-level 
columns in the UnsafeRows stores in the state store. The timeout timestamp is 
also saved as (when needed) as the last top-level column. Since the group state 
is serialized to top-level columns, you cannot save "null" as a value of state 
(setting null in all the top-level columns is not equivalent). So we don't let 
the user set the timeout without initializing the state for a key. Based on 
user experience, this leads to confusion.
    
    This PR is to change the row format such that the state is saved as nested 
columns. This would allow the state to be set to null, and avoid these 
confusing corner cases. However, queries recovering from existing checkpoint 
will use the previous format to maintain compatibility with existing production 
queries.
    
    ## How was this patch tested?
    Refactored existing end-to-end tests and added new tests for explicitly 
testing obj-to-row conversion for both state formats.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark SPARK-22187-1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21739.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21739
    
----
commit ef509c8986dbcc9b37387b0bde56c3d71abb7602
Author: Tathagata Das <tathagata.das1565@...>
Date:   2017-10-05T02:25:22Z

    Partial implementation

commit 976a7ea3d5d528e6f1091c696c7f6e865027ee23
Author: Tathagata Das <tathagata.das1565@...>
Date:   2018-07-09T11:05:10Z

    Fixed and added tests

commit cfc3f68aabeb4e83bfe8131e93e5f0133fba4869
Author: Tathagata Das <tathagata.das1565@...>
Date:   2018-07-09T11:19:01Z

    Refactored

commit 9525484a444ce231ff366bc556fe5a1d46ac4d4f
Author: Tathagata Das <tathagata.das1565@...>
Date:   2018-07-09T17:38:43Z

    Minor refactoring

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to