How to calculate percentiles with Spark?

2014-10-21 Thread sparkuser
Hi,

What would be the best way to get percentiles from a Spark RDD? I can see
JavaDoubleRDD or MLlib's  MultivariateStatisticalSummary
https://spark.apache.org/docs/latest/mllib-statistics.html   provide the
mean() but not percentiles.

Thank you!

Horace



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-calculate-percentiles-with-Spark-tp16937.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to calculate percentiles with Spark?

2014-10-21 Thread lordjoe
A rather more general question is - assume I have an JavaRDDK which is
sorted -
How can I convert this into a JavaPairRDDInteger,K where the Integer is
tie  index - 0...N - 1.
Easy to do on one machine
 JavaRDDK values = ... // create here

   JavaRDDInteger,K positions = values.mapToPair(new PairFunctionK,
Integer, K() {
private int index = 0;
@Override
public Tuple2Integer, K call(final K t) throws Exception {
return new Tuple2(index++,t);
  }
});
but will this code do the right thing on a cluster



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-calculate-percentiles-with-Spark-tp16937p16945.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org