Re: How to discretize Continuous Variable with Spark DataFrames

2016-01-25 Thread Jeff Zhang
It's very straightforward, please refer the document here http://spark.apache.org/docs/latest/ml-features.html#bucketizer On Mon, Jan 25, 2016 at 10:09 PM, Eli Super wrote: > Thanks Joshua , > > I can't understand what algorithm behind Bucketizer , how discretization >

Re: How to discretize Continuous Variable with Spark DataFrames

2016-01-25 Thread Eli Super
Thanks Joshua , I can't understand what algorithm behind Bucketizer , how discretization done ? Best Regards On Mon, Jan 25, 2016 at 3:40 PM, Joshua TAYLOR wrote: > It sounds like you may want the Bucketizer in SparkML. The overview docs > [1] include, "Bucketizer

Re: How to discretize Continuous Variable with Spark DataFrames

2016-01-25 Thread Joshua TAYLOR
It sounds like you may want the Bucketizer in SparkML. The overview docs [1] include, "Bucketizer transforms a column of continuous features to a column of feature buckets, where the buckets are specified by users." [1]: http://spark.apache.org/docs/latest/ml-features.html#bucketizer On Mon,

How to discretize Continuous Variable with Spark DataFrames

2016-01-25 Thread Eli Super
Hi What is a best way to discretize Continuous Variable within Spark DataFrames ? I want to discretize some variable 1) by equal frequency 2) by k-means I usually use R for this porpoises _http://www.inside-r.org/packages/cran/arules/docs/discretize R code for example : ### equal frequency