I have a set of data based on which I want to create a classification model. Each row has the following form:
user1,class1,product1 > user1,class1,product2 > user1,class1,product5 > user2,class1,product2 > user2,class1,product5 > user3,class2,product1 > etc There are about 1M users, 2 classes, and 1M products. What I would like to do next is create the sparse vectors (something already supported by MLlib) BUT in order to apply that function I have to create the dense vectors (with the 0s), first. In other words, I have to binarize my data. What's the easiest (or most elegant) way of doing that? *// Adamantios*