[jira] [Commented] (SPARK-24763) Remove redundant key data from value in streaming aggregation

Tathagata Das (JIRA) Tue, 21 Aug 2018 18:54:20 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-24763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588234#comment-16588234
 ]


Tathagata Das commented on SPARK-24763:
---------------------------------------

They will be. The merge script always puts the major version (i.e. 3.0.0)
there. Those will be redirected to 2.4.0 as well when we make the 2.4.0
release.

On Tue, Aug 21, 2018 at 3:39 PM, Jungtaek Lim (JIRA) <j...@apache.org>



> Remove redundant key data from value in streaming aggregation
> -------------------------------------------------------------
>
>                 Key: SPARK-24763
>                 URL: https://issues.apache.org/jira/browse/SPARK-24763
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.4.0
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>            Priority: Major
>             Fix For: 2.4.0, 3.0.0
>
>
> Key/Value of state in streaming aggregation is formatted as below:
>  * key: UnsafeRow containing group-by fields
>  * value: UnsafeRow containing key fields and another fields for aggregation 
> results
> which data for key is stored to both key and value.
> This is to avoid doing projection row to value while storing, and joining key 
> and value to restore origin row to boost performance, but while doing a 
> simple benchmark test, I found it not much helpful compared to "project and 
> join". (will paste test result in comment)
> So I would propose a new option: remove redundant in stateful aggregation. 
> I'm avoiding to modify default behavior of stateful aggregation, because 
> state value will not be compatible between current and option enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24763) Remove redundant key data from value in streaming aggregation

Reply via email to