Hi Mike,

just to make sure we are on the same page,

> I have mixed data type (continuous and categorical). Should I 
> tree.DecisionTreeClassifier() or tree.DecisionTreeRegressor()?

that's independent from the previous email. The comment 

> > "scikit-learn implementation does not support categorical variables for 
> > now". 

we discussed via the previous email was referring to feature variables. Whether 
you choose the DT regressor or classifier depends on the format of your target 
variable.

Best,
Sebastian

> On Sep 13, 2019, at 11:41 PM, C W <tmrs...@gmail.com> wrote:
> 
> Thanks, Sebastian. It's great to know that it works, just need to do 
> one-hot-encoding first.
> 
> I have mixed data type (continuous and categorical). Should I 
> tree.DecisionTreeClassifier() or tree.DecisionTreeRegressor()?
> 
> I'm guessing tree.DecisionTreeClassifier()?
> 
> Best,
> 
> Mike
> 
> On Fri, Sep 13, 2019 at 11:59 PM Sebastian Raschka 
> <m...@sebastianraschka.com> wrote:
> Hi,
> 
> if you have the category "car" as shown in your example, this would 
> effectively be something like
> 
> BMW=0
> Toyota=1
> Audi=2
> 
> Sure, the algorithm will execute just fine on the feature column with values 
> in {0, 1, 2}. However, the problem is that it will come up with binary rules 
> like x_i>= 0.5, x_i>= 1.5, and x_i>= 2.5. I.e., it will treat it is a 
> continuous variable. 
> 
> What you can do is to encode this feature via one-hot encoding -- basically 
> extend it into 2 (or 3) binary variables. This has it's own problems (if you 
> have a feature with many possible values, you will end up with a large number 
> of binary variables, and they may dominate in the resulting tree over other 
> feature variables).
> 
> In any case, I guess this is what 
> 
> > "scikit-learn implementation does not support categorical variables for 
> > now". 
> 
> 
> means ;).
> 
> Best,
> Sebastian
> 
> > On Sep 13, 2019, at 9:38 PM, C W <tmrs...@gmail.com> wrote:
> > 
> > Hello all,
> > I'm very confused. Can the decision tree module handle both continuous and 
> > categorical features in the dataset? In this case, it's just CART 
> > (Classification and Regression Trees).
> > 
> > For example,
> > Gender Age Income  Car   Attendance
> > Male     30   10000   BMW          Yes
> > Female 35     9000  Toyota          No
> > Male     50   12000    Audi           Yes
> > 
> > According to the documentation 
> > https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart,
> >  it can not! 
> > 
> > It says: "scikit-learn implementation does not support categorical 
> > variables for now". 
> > 
> > Is this true? If not, can someone point me to an example? If yes, what do 
> > people do?
> > 
> > Thank you very much!
> > 
> > 
> > 
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to