Re: Spark ML : One hot Encoding for multiple columns

2016-11-13 Thread Nicholas Sharkey
Amen > On Nov 13, 2016, at 7:55 PM, janardhan shetty wrote: > > These Jiras' are still unresolved: > https://issues.apache.org/jira/browse/SPARK-11215 > > Also there is https://issues.apache.org/jira/browse/SPARK-8418 > >> On Wed, Aug 17, 2016 at 11:15 AM, Nisha

Re: Spark ML : One hot Encoding for multiple columns

2016-11-13 Thread janardhan shetty
These Jiras' are still unresolved: https://issues.apache.org/jira/browse/SPARK-11215 Also there is https://issues.apache.org/jira/browse/SPARK-8418 On Wed, Aug 17, 2016 at 11:15 AM, Nisha Muktewar wrote: > > The OneHotEncoder does *not* accept multiple columns. > > You can

Re: Spark ML : One hot Encoding for multiple columns

2016-08-17 Thread Nisha Muktewar
The OneHotEncoder does *not* accept multiple columns. You can use Michal's suggestion where he uses Pipeline to set the stages and then executes them. The other option is to write a function that performs one hot encoding on a column and returns a dataframe with the encoded column and then call

Re: Spark ML : One hot Encoding for multiple columns

2016-08-17 Thread janardhan shetty
I had already tried this way : scala> val featureCols = Array("category","newone") featureCols: Array[String] = Array(category, newone) scala> val indexer = new StringIndexer().setInputCol(featureCols).setOutputCol("categoryIndex").fit(df1) :29: error: type mismatch; found : Array[String]

Re: Spark ML : One hot Encoding for multiple columns

2016-08-17 Thread Nisha Muktewar
I don't think it does. From the documentation: https://spark.apache.org/docs/2.0.0-preview/ml-features.html#onehotencoder, I see that it still accepts one column at a time. On Wed, Aug 17, 2016 at 10:18 AM, janardhan shetty wrote: > 2.0: > > One hot encoding currently

Re: Spark ML : One hot Encoding for multiple columns

2016-08-17 Thread Michał Zieliński
You can it just map over your columns and create a pipeline: val columns = Array("colA", "colB", "colC") val transformers: Array[PipelineStage] = columns.map { x => new OneHotEncoder().setInputCol(x).setOutputCol(x + "Encoded") } val pipeline = new Pipeline() .setStages(transformers) On 17

Spark ML : One hot Encoding for multiple columns

2016-08-17 Thread janardhan shetty
2.0: One hot encoding currently accepts single input column is there a way to include multiple columns ?