Re: Spark ML : One hot Encoding for multiple columns
Amen > On Nov 13, 2016, at 7:55 PM, janardhan shettywrote: > > These Jiras' are still unresolved: > https://issues.apache.org/jira/browse/SPARK-11215 > > Also there is https://issues.apache.org/jira/browse/SPARK-8418 > >> On Wed, Aug 17, 2016 at 11:15 AM, Nisha Muktewar wrote: >> >> The OneHotEncoder does not accept multiple columns. >> >> You can use Michal's suggestion where he uses Pipeline to set the stages and >> then executes them. >> >> The other option is to write a function that performs one hot encoding on a >> column and returns a dataframe with the encoded column and then call it >> multiple times for the rest of the columns. >> >> >> >> >>> On Wed, Aug 17, 2016 at 10:59 AM, janardhan shetty >>> wrote: >>> I had already tried this way : >>> >>> scala> val featureCols = Array("category","newone") >>> featureCols: Array[String] = Array(category, newone) >>> >>> scala> val indexer = new >>> StringIndexer().setInputCol(featureCols).setOutputCol("categoryIndex").fit(df1) >>> :29: error: type mismatch; >>> found : Array[String] >>> required: String >>> val indexer = new >>> StringIndexer().setInputCol(featureCols).setOutputCol("categoryIndex").fit(df1) >>> >>> On Wed, Aug 17, 2016 at 10:56 AM, Nisha Muktewar wrote: I don't think it does. From the documentation: https://spark.apache.org/docs/2.0.0-preview/ml-features.html#onehotencoder, I see that it still accepts one column at a time. > On Wed, Aug 17, 2016 at 10:18 AM, janardhan shetty > wrote: > 2.0: > > One hot encoding currently accepts single input column is there a way to > include multiple columns ? >>> >> >
Re: Spark ML : One hot Encoding for multiple columns
These Jiras' are still unresolved: https://issues.apache.org/jira/browse/SPARK-11215 Also there is https://issues.apache.org/jira/browse/SPARK-8418 On Wed, Aug 17, 2016 at 11:15 AM, Nisha Muktewarwrote: > > The OneHotEncoder does *not* accept multiple columns. > > You can use Michal's suggestion where he uses Pipeline to set the stages > and then executes them. > > The other option is to write a function that performs one hot encoding on > a column and returns a dataframe with the encoded column and then call it > multiple times for the rest of the columns. > > > > > On Wed, Aug 17, 2016 at 10:59 AM, janardhan shetty > wrote: > >> I had already tried this way : >> >> scala> val featureCols = Array("category","newone") >> featureCols: Array[String] = Array(category, newone) >> >> scala> val indexer = new StringIndexer().setInputCol(fe >> atureCols).setOutputCol("categoryIndex").fit(df1) >> :29: error: type mismatch; >> found : Array[String] >> required: String >> val indexer = new StringIndexer().setInputCol(fe >> atureCols).setOutputCol("categoryIndex").fit(df1) >> >> >> On Wed, Aug 17, 2016 at 10:56 AM, Nisha Muktewar >> wrote: >> >>> I don't think it does. From the documentation: >>> https://spark.apache.org/docs/2.0.0-preview/ml-features.html >>> #onehotencoder, I see that it still accepts one column at a time. >>> >>> On Wed, Aug 17, 2016 at 10:18 AM, janardhan shetty < >>> janardhan...@gmail.com> wrote: >>> 2.0: One hot encoding currently accepts single input column is there a way to include multiple columns ? >>> >>> >> >
Re: Spark ML : One hot Encoding for multiple columns
The OneHotEncoder does *not* accept multiple columns. You can use Michal's suggestion where he uses Pipeline to set the stages and then executes them. The other option is to write a function that performs one hot encoding on a column and returns a dataframe with the encoded column and then call it multiple times for the rest of the columns. On Wed, Aug 17, 2016 at 10:59 AM, janardhan shettywrote: > I had already tried this way : > > scala> val featureCols = Array("category","newone") > featureCols: Array[String] = Array(category, newone) > > scala> val indexer = new StringIndexer().setInputCol( > featureCols).setOutputCol("categoryIndex").fit(df1) > :29: error: type mismatch; > found : Array[String] > required: String > val indexer = new StringIndexer().setInputCol( > featureCols).setOutputCol("categoryIndex").fit(df1) > > > On Wed, Aug 17, 2016 at 10:56 AM, Nisha Muktewar > wrote: > >> I don't think it does. From the documentation: >> https://spark.apache.org/docs/2.0.0-preview/ml-features.html >> #onehotencoder, I see that it still accepts one column at a time. >> >> On Wed, Aug 17, 2016 at 10:18 AM, janardhan shetty < >> janardhan...@gmail.com> wrote: >> >>> 2.0: >>> >>> One hot encoding currently accepts single input column is there a way to >>> include multiple columns ? >>> >> >> >
Re: Spark ML : One hot Encoding for multiple columns
I had already tried this way : scala> val featureCols = Array("category","newone") featureCols: Array[String] = Array(category, newone) scala> val indexer = new StringIndexer().setInputCol(featureCols).setOutputCol("categoryIndex").fit(df1) :29: error: type mismatch; found : Array[String] required: String val indexer = new StringIndexer().setInputCol(featureCols).setOutputCol("categoryIndex").fit(df1) On Wed, Aug 17, 2016 at 10:56 AM, Nisha Muktewarwrote: > I don't think it does. From the documentation: > https://spark.apache.org/docs/2.0.0-preview/ml-features.html#onehotencoder, > I see that it still accepts one column at a time. > > On Wed, Aug 17, 2016 at 10:18 AM, janardhan shetty > wrote: > >> 2.0: >> >> One hot encoding currently accepts single input column is there a way to >> include multiple columns ? >> > >
Re: Spark ML : One hot Encoding for multiple columns
I don't think it does. From the documentation: https://spark.apache.org/docs/2.0.0-preview/ml-features.html#onehotencoder, I see that it still accepts one column at a time. On Wed, Aug 17, 2016 at 10:18 AM, janardhan shettywrote: > 2.0: > > One hot encoding currently accepts single input column is there a way to > include multiple columns ? >
Re: Spark ML : One hot Encoding for multiple columns
You can it just map over your columns and create a pipeline: val columns = Array("colA", "colB", "colC") val transformers: Array[PipelineStage] = columns.map { x => new OneHotEncoder().setInputCol(x).setOutputCol(x + "Encoded") } val pipeline = new Pipeline() .setStages(transformers) On 17 August 2016 at 18:18, janardhan shettywrote: > 2.0: > > One hot encoding currently accepts single input column is there a way to > include multiple columns ? >
Spark ML : One hot Encoding for multiple columns
2.0: One hot encoding currently accepts single input column is there a way to include multiple columns ?