Re: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features?

Andreas Mueller Wed, 18 Sep 2019 08:14:34 -0700


On 9/15/19 8:16 AM, Guillaume Lemaître wrote:

On Sat, 14 Sep 2019 at 20:59, C W <[email protected]<mailto:[email protected]>> wrote:
    Thanks, Guillaume.
    Column transformer looks pretty neat. I've also heard though, this
    pipeline can be tedious to set up? Specifying what you want for
    every feature is a pain.
It would be interesting for us which part of the pipeline is tediousto set up to know if we can improve something there.Do you mean, that you would like to automatically detect of which typeof feature (categorical/numerical) and apply adefault encoder/scaling such as discuss there:https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127
IMO, one a user perspective, it would be cleaner in some cases at thecost of applying blindly a black box
which might be dangerous.

Also seehttps://amueller.github.io/dabl/dev/generated/dabl.EasyPreprocessor.html#dabl.EasyPreprocessor

Which basically does that.


    Jaiver,
    Actually, you guessed right. My real data has only one numerical
    variable, looks more like this:

    Gender Date            Income  Car   Attendance
    Male     2019/3/01   10000   BMW          Yes
    Female 2019/5/02    9000   Toyota          No
    Male     2019/7/15   12000    Audi           Yes

    I am predicting income using all other categorical variables.
    Maybe it is catboost!

    Thanks,

    M






    On Sat, Sep 14, 2019 at 9:25 AM Javier López <[email protected]> wrote:

        If you have datasets with many categorical features, and
        perhaps many categories, the tools in sklearn are quite limited,
        but there are alternative implementations of boosted trees
        that are designed with categorical features in mind. Take a look
        at catboost [1], which has an sklearn-compatible API.

        J

        [1] https://catboost.ai/

        On Sat, Sep 14, 2019 at 3:40 AM C W <[email protected]
        <mailto:[email protected]>> wrote:

            Hello all,
            I'm very confused. Can the decision tree module handle
            both continuous and categorical features in the dataset?
            In this case, it's just CART (Classification and
            Regression Trees).

            For example,
            Gender Age Income  Car   Attendance
            Male     30   10000   BMW          Yes
            Female 35     9000  Toyota          No
            Male     50   12000    Audi           Yes

            According to the documentation
            
https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart,
            it can not!

            It says: "scikit-learn implementation does not support
            categorical variables for now".

            Is this true? If not, can someone point me to an example?
            If yes, what do people do?

            Thank you very much!



            _______________________________________________
            scikit-learn mailing list
            [email protected] <mailto:[email protected]>
            https://mail.python.org/mailman/listinfo/scikit-learn

        _______________________________________________
        scikit-learn mailing list
        [email protected] <mailto:[email protected]>
        https://mail.python.org/mailman/listinfo/scikit-learn

    _______________________________________________
    scikit-learn mailing list
    [email protected] <mailto:[email protected]>
    https://mail.python.org/mailman/listinfo/scikit-learn



--
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features?

Reply via email to