converting categorical values in csv file to numerical values

2015-11-05 Thread Balachandar R.A.
HI I am new to spark MLlib and machine learning. I have a csv file that consists of around 100 thousand rows and 20 columns. Of these 20 columns, 10 contains string values. Each value in these columns are not necessarily unique. They are kind of categorical, that is, the values could be one

Re: converting categorical values in csv file to numerical values

2015-11-05 Thread tog
Hi Bala Can't you do a simple dictionnary and map those values to numbers? Cheers Guillaume On 5 November 2015 at 09:54, Balachandar R.A. wrote: > HI > > > I am new to spark MLlib and machine learning. I have a csv file that > consists of around 100 thousand rows and

Re: converting categorical values in csv file to numerical values

2015-11-05 Thread tog
If you corpus is large (nlp) this is indeed the best solution otherwise (few words I.e. Categories) I guess you will end up with the same result On Friday, 6 November 2015, Balachandar R.A. wrote: > Hi Guillaume, > > > This is always an option. However, I read about

Re: converting categorical values in csv file to numerical values

2015-11-05 Thread Balachandar R.A.
Hi Guillaume, This is always an option. However, I read about HashingTF which exactly does this quite efficiently and can scale too. Hence, looking for a solution using this technique. regards Bala On 5 November 2015 at 18:50, tog wrote: > Hi Bala > > Can't you