Use StringIndexer in MLib1.4 : https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/ml/feature/StringIndexer.html
On Thu, Aug 6, 2015 at 8:49 PM, Adamantios Corais < adamantios.cor...@gmail.com> wrote: > I have a set of data based on which I want to create a classification > model. Each row has the following form: > > user1,class1,product1 >> user1,class1,product2 >> user1,class1,product5 >> user2,class1,product2 >> user2,class1,product5 >> user3,class2,product1 >> etc > > > There are about 1M users, 2 classes, and 1M products. What I would like to > do next is create the sparse vectors (something already supported by MLlib) > BUT in order to apply that function I have to create the dense vectors > (with the 0s), first. In other words, I have to binarize my data. What's > the easiest (or most elegant) way of doing that? > > > *// Adamantios* > > >