Hi ,
can you pls share how you resolved the parsing issue. It would be of great
help...
Thanks.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Decision-tree-categorical-variables-tp12433p22943.html
Sent from the Apache Spark User List mailing list
Hi Keerthi
As Xiangrui mentioned in the reply, the categorical variables are assumed
to be encoded as integers between 0 and k - 1, if k is the parameter you
are passing as the category info map. So you will need to handle this
during parsing (your columns 3 and 6 need to be converted into ints
Was able to resolve the parsing issue. Thanks!
From: ssti...@live.com
To: user@spark.apache.org
Subject: FW: Decision tree: categorical variables
Date: Wed, 20 Aug 2014 12:48:10 -0700
From: ssti...@live.com
To: men...@gmail.com
Subject: RE: Decision tree: categorical variables
Date: Wed, 20
The categorical features must be encoded into indices starting from 0:
0, 1, ..., numCategories - 1. Then you can provide the
categoricalFeatureInfo map to specify which columns contain
categorical features and the number of categories in each. Joseph is
updating the user guide. But if you want to