On Oct 4, 2019, at 11:48 AM, C W <tmrs...@gmail.com
<mailto:tmrs...@gmail.com>> wrote:
I'm getting some funny results. I am doing a regression decision
tree, the response variables are assigned to levels.
The funny part is: the tree is taking one-hot-encoding (BMW=0,
Toyota=1, Audi=2) as numerical values, not category.
The tree splits at 0.5 and 1.5. Am I doing one-hot-encoding
wrong? How does the sklearn know internally 0 vs. 1 is
categorical, not numerical?
In R for instance, you do as.factor(), which explicitly states
the data type.
Thank you!
On Wed, Sep 18, 2019 at 11:13 AM Andreas Mueller
<t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:
On 9/15/19 8:16 AM, Guillaume Lemaître wrote:
On Sat, 14 Sep 2019 at 20:59, C W <tmrs...@gmail.com
<mailto:tmrs...@gmail.com>> wrote:
Thanks, Guillaume.
Column transformer looks pretty neat. I've also heard
though, this pipeline can be tedious to set up?
Specifying what you want for every feature is a pain.
It would be interesting for us which part of the pipeline is
tedious to set up to know if we can improve something there.
Do you mean, that you would like to automatically detect of
which type of feature (categorical/numerical) and apply a
default encoder/scaling such as discuss there:
https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127
IMO, one a user perspective, it would be cleaner in some
cases at the cost of applying blindly a black box
which might be dangerous.
Also see
https://amueller.github.io/dabl/dev/generated/dabl.EasyPreprocessor.html#dabl.EasyPreprocessor
Which basically does that.
Jaiver,
Actually, you guessed right. My real data has only one
numerical variable, looks more like this:
Gender Date Income Car Attendance
Male 2019/3/01 10000 BMW Yes
Female 2019/5/02 9000 Toyota No
Male 2019/7/15 12000 Audi Yes
I am predicting income using all other categorical
variables. Maybe it is catboost!
Thanks,
M
On Sat, Sep 14, 2019 at 9:25 AM Javier López
<jlo...@ende.cc> <mailto:jlo...@ende.cc> wrote:
If you have datasets with many categorical features,
and perhaps many categories, the tools in sklearn
are quite limited,
but there are alternative implementations of boosted
trees that are designed with categorical features in
mind. Take a look
at catboost [1], which has an sklearn-compatible API.
J
[1] https://catboost.ai/
On Sat, Sep 14, 2019 at 3:40 AM C W
<tmrs...@gmail.com <mailto:tmrs...@gmail.com>> wrote:
Hello all,
I'm very confused. Can the decision tree module
handle both continuous and categorical features
in the dataset? In this case, it's just CART
(Classification and Regression Trees).
For example,
Gender Age Income Car Attendance
Male 30 10000 BMW Yes
Female 35 9000 Toyota No
Male 50 12000 Audi Yes
According to the documentation
https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart,
it can not!
It says: "scikit-learn implementation does not
support categorical variables for now".
Is this true? If not, can someone point me to an
example? If yes, what do people do?
Thank you very much!
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
<mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
--
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn