GitHub user HeartSaVioR opened a pull request: https://github.com/apache/spark/pull/21733
[SPARK-24763][SS] Remove redundant key data from value in streaming aggregation * add option to configure enabling new feature: remove redundant key data from value * modify code to respect new option (turning on/off feature) * modify tests to run tests with both on/off * Add guard in OffsetSeqMetadata to prevent modifying option after executing query ## What changes were proposed in this pull request? This patch proposes a new flag option for stateful aggregation: remove redundant key data from value. Enabling new option runs similar with current, and uses less memory for state according to key/value fields of state operator. Please refer below link to see detailed perf. test result: https://issues.apache.org/jira/browse/SPARK-24763?focusedCommentId=16536539&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16536539 Since the state between enabling the option and disabling the option is not compatible, the option is set to 'disable' by default (to ensure backward compatibility), and OffsetSeqMetadata would prevent modifying the option after executing query. ## How was this patch tested? Modify unit tests to cover both disabling option and enabling option. Also did manual tests to see whether propose patch improves state memory usage. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HeartSaVioR/spark SPARK-24763 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21733.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21733 ---- commit 2a9cc496bb7f832b75b0090ef9a612f4fbc0f206 Author: Jungtaek Lim <kabhwan@...> Date: 2018-07-08T09:37:12Z [SPARK-24763][SS] Remove redundant key data from value in streaming aggregation * add option to configure enabling new feature: remove redundant key data from value * modify code to respect new option (turning on/off feature) * modify tests to run tests with both on/off * Add guard in OffsetSeqMetadata to prevent modifying option after executing query ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org