Re: How to binarize data in spark

praveen S Thu, 06 Aug 2015 20:03:15 -0700

Use StringIndexer in MLib1.4 :
https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/ml/feature/StringIndexer.html


On Thu, Aug 6, 2015 at 8:49 PM, Adamantios Corais <
adamantios.cor...@gmail.com> wrote:

> I have a set of data based on which I want to create a classification
> model. Each row has the following form:
>
> user1,class1,product1
>> user1,class1,product2
>> user1,class1,product5
>> user2,class1,product2
>> user2,class1,product5
>> user3,class2,product1
>> etc
>
>
> There are about 1M users, 2 classes, and 1M products. What I would like to
> do next is create the sparse vectors (something already supported by MLlib)
> BUT in order to apply that function I have to create the dense vectors
> (with the 0s), first. In other words, I have to binarize my data. What's
> the easiest (or most elegant) way of doing that?
>
>
> *// Adamantios*
>
>
>

Re: How to binarize data in spark

Reply via email to