Ahh, you are right. Regression vs. Classification is about the type of target variable, not features.
Thanks, more clear now. Mike On Sat, Sep 14, 2019 at 1:23 AM Sebastian Raschka <m...@sebastianraschka.com> wrote: > Hi Mike, > > just to make sure we are on the same page, > > > I have mixed data type (continuous and categorical). Should I > tree.DecisionTreeClassifier() or tree.DecisionTreeRegressor()? > > that's independent from the previous email. The comment > > > > "scikit-learn implementation does not support categorical variables > for now". > > we discussed via the previous email was referring to feature variables. > Whether you choose the DT regressor or classifier depends on the format of > your target variable. > > Best, > Sebastian > > > On Sep 13, 2019, at 11:41 PM, C W <tmrs...@gmail.com> wrote: > > > > Thanks, Sebastian. It's great to know that it works, just need to do > one-hot-encoding first. > > > > I have mixed data type (continuous and categorical). Should I > tree.DecisionTreeClassifier() or tree.DecisionTreeRegressor()? > > > > I'm guessing tree.DecisionTreeClassifier()? > > > > Best, > > > > Mike > > > > On Fri, Sep 13, 2019 at 11:59 PM Sebastian Raschka < > m...@sebastianraschka.com> wrote: > > Hi, > > > > if you have the category "car" as shown in your example, this would > effectively be something like > > > > BMW=0 > > Toyota=1 > > Audi=2 > > > > Sure, the algorithm will execute just fine on the feature column with > values in {0, 1, 2}. However, the problem is that it will come up with > binary rules like x_i>= 0.5, x_i>= 1.5, and x_i>= 2.5. I.e., it will treat > it is a continuous variable. > > > > What you can do is to encode this feature via one-hot encoding -- > basically extend it into 2 (or 3) binary variables. This has it's own > problems (if you have a feature with many possible values, you will end up > with a large number of binary variables, and they may dominate in the > resulting tree over other feature variables). > > > > In any case, I guess this is what > > > > > "scikit-learn implementation does not support categorical variables > for now". > > > > > > means ;). > > > > Best, > > Sebastian > > > > > On Sep 13, 2019, at 9:38 PM, C W <tmrs...@gmail.com> wrote: > > > > > > Hello all, > > > I'm very confused. Can the decision tree module handle both continuous > and categorical features in the dataset? In this case, it's just CART > (Classification and Regression Trees). > > > > > > For example, > > > Gender Age Income Car Attendance > > > Male 30 10000 BMW Yes > > > Female 35 9000 Toyota No > > > Male 50 12000 Audi Yes > > > > > > According to the documentation > https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, > it can not! > > > > > > It says: "scikit-learn implementation does not support categorical > variables for now". > > > > > > Is this true? If not, can someone point me to an example? If yes, what > do people do? > > > > > > Thank you very much! > > > > > > > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn@python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn