Re: Best way to tranform string label into long label for classification problem

2016-06-28 Thread Jaonary Rabarisoa
Thank you Xinh. That's what I need.

Le mar. 28 juin 2016 à 17:43, Xinh Huynh  a écrit :

> Hi Jao,
>
> Here's one option:
> http://spark.apache.org/docs/latest/ml-features.html#stringindexer
> "StringIndexer encodes a string column of labels to a column of label
> indices. The indices are in [0, numLabels), ordered by label frequencies."
>
> Xinh
>
> On Tue, Jun 28, 2016 at 12:29 AM, Jaonary Rabarisoa 
> wrote:
>
>> Dear all,
>>
>> I'm trying to a find a way to transform a DataFrame into a data that is
>> more suitable for third party classification algorithm. The DataFrame have
>> two columns : "feature" represented by a vector and "label" represented by
>> a string. I want the "label" to be a number between [0, number of classes -
>> 1].
>> Do you have any ideas to do it efficiently ?
>>
>>  Cheers,
>>
>> Jao
>>
>
>


Re: Best way to tranform string label into long label for classification problem

2016-06-28 Thread Xinh Huynh
Hi Jao,

Here's one option:
http://spark.apache.org/docs/latest/ml-features.html#stringindexer
"StringIndexer encodes a string column of labels to a column of label
indices. The indices are in [0, numLabels), ordered by label frequencies."

Xinh

On Tue, Jun 28, 2016 at 12:29 AM, Jaonary Rabarisoa 
wrote:

> Dear all,
>
> I'm trying to a find a way to transform a DataFrame into a data that is
> more suitable for third party classification algorithm. The DataFrame have
> two columns : "feature" represented by a vector and "label" represented by
> a string. I want the "label" to be a number between [0, number of classes -
> 1].
> Do you have any ideas to do it efficiently ?
>
>  Cheers,
>
> Jao
>