Hi Mike, just to make sure we are on the same page,
> I have mixed data type (continuous and categorical). Should I > tree.DecisionTreeClassifier() or tree.DecisionTreeRegressor()? that's independent from the previous email. The comment > > "scikit-learn implementation does not support categorical variables for > > now". we discussed via the previous email was referring to feature variables. Whether you choose the DT regressor or classifier depends on the format of your target variable. Best, Sebastian > On Sep 13, 2019, at 11:41 PM, C W <tmrs...@gmail.com> wrote: > > Thanks, Sebastian. It's great to know that it works, just need to do > one-hot-encoding first. > > I have mixed data type (continuous and categorical). Should I > tree.DecisionTreeClassifier() or tree.DecisionTreeRegressor()? > > I'm guessing tree.DecisionTreeClassifier()? > > Best, > > Mike > > On Fri, Sep 13, 2019 at 11:59 PM Sebastian Raschka > <m...@sebastianraschka.com> wrote: > Hi, > > if you have the category "car" as shown in your example, this would > effectively be something like > > BMW=0 > Toyota=1 > Audi=2 > > Sure, the algorithm will execute just fine on the feature column with values > in {0, 1, 2}. However, the problem is that it will come up with binary rules > like x_i>= 0.5, x_i>= 1.5, and x_i>= 2.5. I.e., it will treat it is a > continuous variable. > > What you can do is to encode this feature via one-hot encoding -- basically > extend it into 2 (or 3) binary variables. This has it's own problems (if you > have a feature with many possible values, you will end up with a large number > of binary variables, and they may dominate in the resulting tree over other > feature variables). > > In any case, I guess this is what > > > "scikit-learn implementation does not support categorical variables for > > now". > > > means ;). > > Best, > Sebastian > > > On Sep 13, 2019, at 9:38 PM, C W <tmrs...@gmail.com> wrote: > > > > Hello all, > > I'm very confused. Can the decision tree module handle both continuous and > > categorical features in the dataset? In this case, it's just CART > > (Classification and Regression Trees). > > > > For example, > > Gender Age Income Car Attendance > > Male 30 10000 BMW Yes > > Female 35 9000 Toyota No > > Male 50 12000 Audi Yes > > > > According to the documentation > > https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, > > it can not! > > > > It says: "scikit-learn implementation does not support categorical > > variables for now". > > > > Is this true? If not, can someone point me to an example? If yes, what do > > people do? > > > > Thank you very much! > > > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn