GitHub user HeartSaVioR opened a pull request:

    https://github.com/apache/spark/pull/21733

    [SPARK-24763][SS] Remove redundant key data from value in streaming 
aggregation

    * add option to configure enabling new feature: remove redundant key data 
from value
    * modify code to respect new option (turning on/off feature)
    * modify tests to run tests with both on/off
    * Add guard in OffsetSeqMetadata to prevent modifying option after 
executing query
    
    ## What changes were proposed in this pull request?
    
    This patch proposes a new flag option for stateful aggregation: remove 
redundant key data from value.
    Enabling new option runs similar with current, and uses less memory for 
state according to key/value fields of state operator.
    
    Please refer below link to see detailed perf. test result: 
    
https://issues.apache.org/jira/browse/SPARK-24763?focusedCommentId=16536539&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16536539
    
    Since the state between enabling the option and disabling the option is not 
compatible, the option is set to 'disable' by default (to ensure backward 
compatibility), and OffsetSeqMetadata would prevent modifying the option after 
executing query.
    
    ## How was this patch tested?
    
    Modify unit tests to cover both disabling option and enabling option.
    Also did manual tests to see whether propose patch improves state memory 
usage.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HeartSaVioR/spark SPARK-24763

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21733.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21733
    
----
commit 2a9cc496bb7f832b75b0090ef9a612f4fbc0f206
Author: Jungtaek Lim <kabhwan@...>
Date:   2018-07-08T09:37:12Z

    [SPARK-24763][SS] Remove redundant key data from value in streaming 
aggregation
    
    * add option to configure enabling new feature: remove redundant key data 
from value
    * modify code to respect new option (turning on/off feature)
    * modify tests to run tests with both on/off
    * Add guard in OffsetSeqMetadata to prevent modifying option after 
executing query

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to