Re: Spark ML : One hot Encoding for multiple columns

2016-11-13 Thread Nicholas Sharkey
Amen 

> On Nov 13, 2016, at 7:55 PM, janardhan shetty  wrote:
> 
> These Jiras'  are still unresolved:
> https://issues.apache.org/jira/browse/SPARK-11215
> 
> Also there is https://issues.apache.org/jira/browse/SPARK-8418
> 
>> On Wed, Aug 17, 2016 at 11:15 AM, Nisha Muktewar  wrote:
>> 
>> The OneHotEncoder does not accept multiple columns.
>> 
>> You can use Michal's suggestion where he uses Pipeline to set the stages and 
>> then executes them. 
>> 
>> The other option is to write a function that performs one hot encoding on a 
>> column and returns a dataframe with the encoded column and then call it 
>> multiple times for the rest of the columns.
>> 
>> 
>> 
>> 
>>> On Wed, Aug 17, 2016 at 10:59 AM, janardhan shetty  
>>> wrote:
>>> I had already tried this way :
>>> 
>>> scala> val featureCols = Array("category","newone")
>>> featureCols: Array[String] = Array(category, newone)
>>> 
>>> scala>  val indexer = new 
>>> StringIndexer().setInputCol(featureCols).setOutputCol("categoryIndex").fit(df1)
>>> :29: error: type mismatch;
>>>  found   : Array[String]
>>>  required: String
>>> val indexer = new 
>>> StringIndexer().setInputCol(featureCols).setOutputCol("categoryIndex").fit(df1)
>>> 
>>> 
 On Wed, Aug 17, 2016 at 10:56 AM, Nisha Muktewar  
 wrote:
 I don't think it does. From the documentation: 
 https://spark.apache.org/docs/2.0.0-preview/ml-features.html#onehotencoder,
  I see that it still accepts one column at a time.
 
> On Wed, Aug 17, 2016 at 10:18 AM, janardhan shetty 
>  wrote:
> 2.0:
> 
> One hot encoding currently accepts single input column is there a way to 
> include multiple columns ?
 
>>> 
>> 
> 


Re: Spark ML : One hot Encoding for multiple columns

2016-11-13 Thread janardhan shetty
These Jiras'  are still unresolved:
https://issues.apache.org/jira/browse/SPARK-11215

Also there is https://issues.apache.org/jira/browse/SPARK-8418

On Wed, Aug 17, 2016 at 11:15 AM, Nisha Muktewar  wrote:

>
> The OneHotEncoder does *not* accept multiple columns.
>
> You can use Michal's suggestion where he uses Pipeline to set the stages
> and then executes them.
>
> The other option is to write a function that performs one hot encoding on
> a column and returns a dataframe with the encoded column and then call it
> multiple times for the rest of the columns.
>
>
>
>
> On Wed, Aug 17, 2016 at 10:59 AM, janardhan shetty  > wrote:
>
>> I had already tried this way :
>>
>> scala> val featureCols = Array("category","newone")
>> featureCols: Array[String] = Array(category, newone)
>>
>> scala>  val indexer = new StringIndexer().setInputCol(fe
>> atureCols).setOutputCol("categoryIndex").fit(df1)
>> :29: error: type mismatch;
>>  found   : Array[String]
>>  required: String
>> val indexer = new StringIndexer().setInputCol(fe
>> atureCols).setOutputCol("categoryIndex").fit(df1)
>>
>>
>> On Wed, Aug 17, 2016 at 10:56 AM, Nisha Muktewar 
>> wrote:
>>
>>> I don't think it does. From the documentation:
>>> https://spark.apache.org/docs/2.0.0-preview/ml-features.html
>>> #onehotencoder, I see that it still accepts one column at a time.
>>>
>>> On Wed, Aug 17, 2016 at 10:18 AM, janardhan shetty <
>>> janardhan...@gmail.com> wrote:
>>>
 2.0:

 One hot encoding currently accepts single input column is there a way
 to include multiple columns ?

>>>
>>>
>>
>


Re: Spark ML : One hot Encoding for multiple columns

2016-08-17 Thread Nisha Muktewar
The OneHotEncoder does *not* accept multiple columns.

You can use Michal's suggestion where he uses Pipeline to set the stages
and then executes them.

The other option is to write a function that performs one hot encoding on a
column and returns a dataframe with the encoded column and then call it
multiple times for the rest of the columns.




On Wed, Aug 17, 2016 at 10:59 AM, janardhan shetty 
wrote:

> I had already tried this way :
>
> scala> val featureCols = Array("category","newone")
> featureCols: Array[String] = Array(category, newone)
>
> scala>  val indexer = new StringIndexer().setInputCol(
> featureCols).setOutputCol("categoryIndex").fit(df1)
> :29: error: type mismatch;
>  found   : Array[String]
>  required: String
> val indexer = new StringIndexer().setInputCol(
> featureCols).setOutputCol("categoryIndex").fit(df1)
>
>
> On Wed, Aug 17, 2016 at 10:56 AM, Nisha Muktewar 
> wrote:
>
>> I don't think it does. From the documentation:
>> https://spark.apache.org/docs/2.0.0-preview/ml-features.html
>> #onehotencoder, I see that it still accepts one column at a time.
>>
>> On Wed, Aug 17, 2016 at 10:18 AM, janardhan shetty <
>> janardhan...@gmail.com> wrote:
>>
>>> 2.0:
>>>
>>> One hot encoding currently accepts single input column is there a way to
>>> include multiple columns ?
>>>
>>
>>
>


Re: Spark ML : One hot Encoding for multiple columns

2016-08-17 Thread janardhan shetty
I had already tried this way :

scala> val featureCols = Array("category","newone")
featureCols: Array[String] = Array(category, newone)

scala>  val indexer = new
StringIndexer().setInputCol(featureCols).setOutputCol("categoryIndex").fit(df1)
:29: error: type mismatch;
 found   : Array[String]
 required: String
val indexer = new
StringIndexer().setInputCol(featureCols).setOutputCol("categoryIndex").fit(df1)


On Wed, Aug 17, 2016 at 10:56 AM, Nisha Muktewar  wrote:

> I don't think it does. From the documentation:
> https://spark.apache.org/docs/2.0.0-preview/ml-features.html#onehotencoder,
> I see that it still accepts one column at a time.
>
> On Wed, Aug 17, 2016 at 10:18 AM, janardhan shetty  > wrote:
>
>> 2.0:
>>
>> One hot encoding currently accepts single input column is there a way to
>> include multiple columns ?
>>
>
>


Re: Spark ML : One hot Encoding for multiple columns

2016-08-17 Thread Nisha Muktewar
I don't think it does. From the documentation:
https://spark.apache.org/docs/2.0.0-preview/ml-features.html#onehotencoder,
I see that it still accepts one column at a time.

On Wed, Aug 17, 2016 at 10:18 AM, janardhan shetty 
wrote:

> 2.0:
>
> One hot encoding currently accepts single input column is there a way to
> include multiple columns ?
>


Re: Spark ML : One hot Encoding for multiple columns

2016-08-17 Thread Michał Zieliński
You can it just map over your columns and create a pipeline:

val columns = Array("colA", "colB", "colC")
val transformers: Array[PipelineStage] = columns.map {
x => new OneHotEncoder().setInputCol(x).setOutputCol(x + "Encoded")
}
val pipeline = new Pipeline()
  .setStages(transformers)



On 17 August 2016 at 18:18, janardhan shetty  wrote:

> 2.0:
>
> One hot encoding currently accepts single input column is there a way to
> include multiple columns ?
>


Spark ML : One hot Encoding for multiple columns

2016-08-17 Thread janardhan shetty
2.0:

One hot encoding currently accepts single input column is there a way to
include multiple columns ?