I'm getting some funny results. I am doing a regression decision tree, the response variables are assigned to levels.
The funny part is: the tree is taking one-hot-encoding (BMW=0, Toyota=1, Audi=2) as numerical values, not category. The tree splits at 0.5 and 1.5. Am I doing one-hot-encoding wrong? How does the sklearn know internally 0 vs. 1 is categorical, not numerical? In R for instance, you do as.factor(), which explicitly states the data type. Thank you! On Wed, Sep 18, 2019 at 11:13 AM Andreas Mueller <t3k...@gmail.com> wrote: > > > On 9/15/19 8:16 AM, Guillaume Lemaître wrote: > > > > On Sat, 14 Sep 2019 at 20:59, C W <tmrs...@gmail.com> wrote: > >> Thanks, Guillaume. >> Column transformer looks pretty neat. I've also heard though, this >> pipeline can be tedious to set up? Specifying what you want for every >> feature is a pain. >> > > It would be interesting for us which part of the pipeline is tedious to > set up to know if we can improve something there. > Do you mean, that you would like to automatically detect of which type of > feature (categorical/numerical) and apply a > default encoder/scaling such as discuss there: > https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127 > > IMO, one a user perspective, it would be cleaner in some cases at the cost > of applying blindly a black box > which might be dangerous. > > Also see > https://amueller.github.io/dabl/dev/generated/dabl.EasyPreprocessor.html#dabl.EasyPreprocessor > Which basically does that. > > > > >> >> Jaiver, >> Actually, you guessed right. My real data has only one numerical >> variable, looks more like this: >> >> Gender Date Income Car Attendance >> Male 2019/3/01 10000 BMW Yes >> Female 2019/5/02 9000 Toyota No >> Male 2019/7/15 12000 Audi Yes >> >> I am predicting income using all other categorical variables. Maybe it is >> catboost! >> >> Thanks, >> >> M >> >> >> >> >> >> >> On Sat, Sep 14, 2019 at 9:25 AM Javier López <jlo...@ende.cc> >> <jlo...@ende.cc> wrote: >> >>> If you have datasets with many categorical features, and perhaps many >>> categories, the tools in sklearn are quite limited, >>> but there are alternative implementations of boosted trees that are >>> designed with categorical features in mind. Take a look >>> at catboost [1], which has an sklearn-compatible API. >>> >>> J >>> >>> [1] https://catboost.ai/ >>> >>> On Sat, Sep 14, 2019 at 3:40 AM C W <tmrs...@gmail.com> wrote: >>> >>>> Hello all, >>>> I'm very confused. Can the decision tree module handle both continuous >>>> and categorical features in the dataset? In this case, it's just CART >>>> (Classification and Regression Trees). >>>> >>>> For example, >>>> Gender Age Income Car Attendance >>>> Male 30 10000 BMW Yes >>>> Female 35 9000 Toyota No >>>> Male 50 12000 Audi Yes >>>> >>>> According to the documentation >>>> https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, >>>> it can not! >>>> >>>> It says: "scikit-learn implementation does not support categorical >>>> variables for now". >>>> >>>> Is this true? If not, can someone point me to an example? If yes, what >>>> do people do? >>>> >>>> Thank you very much! >>>> >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn@python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > > _______________________________________________ > scikit-learn mailing > listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn