2013/2/27 David Montgomery <[email protected]>:
> Ok....now I am really confused on how to interpret the tree.
>
> So...I am trying to build a Prob est tree. All of the independent variables
> are categorical and created dummies. What is throwing me off are the <=.
>
> I should have a rule that says e.g. if city=LA,NY and TIME=Noon then .20.
>
> In the chart I see city=Dubai<=.500 What does that mean? What I am trying
> so see is a chart that I would usually see in SPSS answer tree or SAS etc.
>
> So..how do I interpret the city=Dubai<=.500?
This is a decision node that means that when True, the city is not
Dubai. ("city=Dubai" == 0 as this is a boolean variable that can only
have 0 or 1 as value).
> My aim is to get a node id and to create sql rules to extract data.
>
> Unless I am wrong, it appears the the dtree algo is not designed to extract
> rules and even assign a rule to a node id. Dtrees in scikits are solely for
> prediction. Is this a fair statement?
Yes. In general a single decision tree is a bad model anyway. It's
better to ensemble them as RandomForests / ExtraTrees, GBRT, or
AdaBoost. But then extracting rule is even less straight forward,
although probably doable if your are ready to dive into the code and
understand how the ensemble models work internally.
> I will be taking the *.dot file not to graph but to somehow parse the file
> so I can create my rules.
You can also have a look at the source code for the
tree.export_graphviz method and write a similar export_sql(tree)
function on your own.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general