I will just add that if you have heterogeneous types, you might want to look at the ColumnTransformer: https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html
You might want to apply some scaling (would not be relevant for tree thought) and encode categories (ordinal encoding for the tree-based) and then dispatch these data to a decision tree. The previous example shows how to construct such a preprocessor and pipeline it with a predictor. On Sat, 14 Sep 2019 at 07:29, C W <tmrs...@gmail.com> wrote: > Ahh, you are right. Regression vs. Classification is about the type of > target variable, not features. > > Thanks, more clear now. > > Mike > > On Sat, Sep 14, 2019 at 1:23 AM Sebastian Raschka < > m...@sebastianraschka.com> wrote: > >> Hi Mike, >> >> just to make sure we are on the same page, >> >> > I have mixed data type (continuous and categorical). Should I >> tree.DecisionTreeClassifier() or tree.DecisionTreeRegressor()? >> >> that's independent from the previous email. The comment >> >> > > "scikit-learn implementation does not support categorical variables >> for now". >> >> we discussed via the previous email was referring to feature variables. >> Whether you choose the DT regressor or classifier depends on the format of >> your target variable. >> >> Best, >> Sebastian >> >> > On Sep 13, 2019, at 11:41 PM, C W <tmrs...@gmail.com> wrote: >> > >> > Thanks, Sebastian. It's great to know that it works, just need to do >> one-hot-encoding first. >> > >> > I have mixed data type (continuous and categorical). Should I >> tree.DecisionTreeClassifier() or tree.DecisionTreeRegressor()? >> > >> > I'm guessing tree.DecisionTreeClassifier()? >> > >> > Best, >> > >> > Mike >> > >> > On Fri, Sep 13, 2019 at 11:59 PM Sebastian Raschka < >> m...@sebastianraschka.com> wrote: >> > Hi, >> > >> > if you have the category "car" as shown in your example, this would >> effectively be something like >> > >> > BMW=0 >> > Toyota=1 >> > Audi=2 >> > >> > Sure, the algorithm will execute just fine on the feature column with >> values in {0, 1, 2}. However, the problem is that it will come up with >> binary rules like x_i>= 0.5, x_i>= 1.5, and x_i>= 2.5. I.e., it will treat >> it is a continuous variable. >> > >> > What you can do is to encode this feature via one-hot encoding -- >> basically extend it into 2 (or 3) binary variables. This has it's own >> problems (if you have a feature with many possible values, you will end up >> with a large number of binary variables, and they may dominate in the >> resulting tree over other feature variables). >> > >> > In any case, I guess this is what >> > >> > > "scikit-learn implementation does not support categorical variables >> for now". >> > >> > >> > means ;). >> > >> > Best, >> > Sebastian >> > >> > > On Sep 13, 2019, at 9:38 PM, C W <tmrs...@gmail.com> wrote: >> > > >> > > Hello all, >> > > I'm very confused. Can the decision tree module handle both >> continuous and categorical features in the dataset? In this case, it's just >> CART (Classification and Regression Trees). >> > > >> > > For example, >> > > Gender Age Income Car Attendance >> > > Male 30 10000 BMW Yes >> > > Female 35 9000 Toyota No >> > > Male 50 12000 Audi Yes >> > > >> > > According to the documentation >> https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, >> it can not! >> > > >> > > It says: "scikit-learn implementation does not support categorical >> variables for now". >> > > >> > > Is this true? If not, can someone point me to an example? If yes, >> what do people do? >> > > >> > > Thank you very much! >> > > >> > > >> > > >> > > _______________________________________________ >> > > scikit-learn mailing list >> > > scikit-learn@python.org >> > > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn@python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn@python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn