[
https://issues.apache.org/jira/browse/SPARK-10785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990948#comment-14990948
]
holdenk commented on SPARK-10785:
---------------------------------
So looking at the tree work it looks like just did a grouByKey for each column
index - which isn't very useful if we've only got a single column (although its
quite possible I miss read some of that). I can do something useful though with
just a single column (just does a sort on the RDD and uses the sorted RDD for
the quantiles) if that sounds like what we are looking for?
> Scale QuantileDiscretizer using distributed binning
> ---------------------------------------------------
>
> Key: SPARK-10785
> URL: https://issues.apache.org/jira/browse/SPARK-10785
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Reporter: Joseph K. Bradley
>
> [SPARK-10064] improves binning in decision trees by distributing the
> computation. QuantileDiscretizer should do the same.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]