From: ssti...@live.com
To: men...@gmail.com
Subject: RE: Decision tree: categorical variables
Date: Wed, 20 Aug 2014 12:09:52 -0700





Hi Xiangrui,
My data is in the following format:
0,1,5,A,8,1,M0,1,5,B,4,1,M1,0,2,B,7,0,U0,1,3,C,8,0,M0,0,5,C,1,0,M1,1,5,C,8,0,U0,0,5,B,8,0,M1,0,3,B,2,1,M0,1,5,B,8,0,F1,0,2,B,4,0,F0,1,5,A,8,0,F
I can create a map like this: val catmap = Map(3-> 3, 6 -> 2)
However, I am not sure what should I do when I parse the data. In the default 
case, I parse it like:
val parsedData = data.map { line =>     val parts = 
line.split(',').map(_.toDouble)     LabeledPoint(parts(0), 
Vectors.dense(parts.tail))     }
Do In need to explicitly do something for columns 3 and 6 or just specifying 
map will suffice....



> Date: Tue, 19 Aug 2014 16:45:35 -0700
> Subject: Re: Decision tree: categorical variables
> From: men...@gmail.com
> To: ssti...@live.com
> CC: user@spark.apache.org
> 
> The categorical features must be encoded into indices starting from 0:
> 0, 1, ..., numCategories - 1. Then you can provide the
> categoricalFeatureInfo map to specify which columns contain
> categorical features and the number of categories in each. Joseph is
> updating the user guide. But if you want to try something now, you can
> take look at the docs of DecisionTree.trainClassifier and
> trainRegressor:
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala#L360
> 
> -Xiangrui
> 
> On Tue, Aug 19, 2014 at 4:24 PM, Sameer Tilak <ssti...@live.com> wrote:
> > Hi All,
> >
> > Is there any example of MLlib decision tree handling categorical variables?
> > My dataset includes few categorical variables (20 out of 100 features) so
> > was interested in knowing how I can use the current version of decision tree
> > implementation to handle this situation? I looked at the LabeledData and not
> > sure if that the way to go..
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 
                                                                                
  

Reply via email to