[ 
https://issues.apache.org/jira/browse/SPARK-18471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670795#comment-15670795
 ] 

Anthony Truchet commented on SPARK-18471:
-----------------------------------------

Sure. But in our use case we do want to aggregate on a DenseVector.

Here is some context, we learn a logistic regression on very big hash space and 
volume of data. For each piece of data the features and the gradient will be 
sparse, but the aggregate will get denser and denser up to the point where it 
is almost fully dense (as observed on our current in house implementation).

So we do want to aggregate on DenseVector, but we do not need nor want to send 
100s of MB of 0 as part of the closure.

> In treeAggregate, generate (big) zeros instead of sending them.
> ---------------------------------------------------------------
>
>                 Key: SPARK-18471
>                 URL: https://issues.apache.org/jira/browse/SPARK-18471
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib, Spark Core
>            Reporter: Anthony Truchet
>            Priority: Minor
>
> When using optimization routine like LBFGS, treeAggregate curently sends the 
> zero vector as part of the closure. This zero can be huge (e.g. ML vectors 
> with millions of zeros) but can be easily generated.
> Several option are possible (upcoming patches to come soon for some of them).
> On is to provide a treeAggregateWithZeroGenerator method (either in core on 
> in MLlib) which wrap treeAggregate in an option and generate the zero if None.
> Another one is to rewrite treeAggregate to wrap an underlying implementation 
> which use a zero generator directly.
> There might be other better alternative we have not spotted...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to