2013/2/27 David Montgomery <[email protected]>: > Ok....now I am really confused on how to interpret the tree. > > So...I am trying to build a Prob est tree. All of the independent variables > are categorical and created dummies. What is throwing me off are the <=. > > I should have a rule that says e.g. if city=LA,NY and TIME=Noon then .20. > > In the chart I see city=Dubai<=.500 What does that mean?
city.Dubai <= 0.5 means that if the indicator variable city=Dubai is smaller than 0.5 (i.e if city=Dubai is 0) then examples get routed down the left child otherwise they get routed down the right child. > What I am trying > so see is a chart that I would usually see in SPSS answer tree or SAS etc. since both SPSS and SAS are proprietary I've no clue how they look like > > So..how do I interpret the city=Dubai<=.500? The split node basically asks: is the city feature not Dubai? - if so go down left else right In order to generate rules from decision trees you have to look at a whole path (from root to leaf). Currently, there is no way to extracting rules from decision trees - you have to write your own code that analyzes the tree structure. > > My aim is to get a node id and to create sql rules to extract data. > > Unless I am wrong, it appears the the dtree algo is not designed to extract > rules and even assign a rule to a node id. Dtrees in scikits are solely for > prediction. Is this a fair statement? correct, scikit-learn is mostly a machine learning library; in fact, AFAIK you where the first user to request such a feature. > > I will be taking the *.dot file not to graph but to somehow parse the file > so I can create my rules. better operate on the DecisionTreeRegressor/Classifier.tree_ object. It represents the binary decision tree as a number of parallel arrays; you can find the documentation/code here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx#L38 best, Peter > > Thanks > > > > > > > > > > > > > > > > > > On Wed, Feb 27, 2013 at 11:57 PM, Peter Prettenhofer > <[email protected]> wrote: >> >> Looks good to me - save the output to a file (e.g. foobar.dot) and run >> the following command: >> >> $ dot -Tpdf foobar.dot -o foobar.pdf >> >> When I open the pdf all labels are correctly displayed - remember that >> they are not indicator features - so the thresholds are usually >> "country=AU <= 0.5". >> >> You can find more information here: >> http://scikit-learn.org/dev/modules/tree.html#classification >> >> 2013/2/27 David Montgomery <[email protected]>: >> > Thanks I used DictVectorizer() >> > >> > I am now trying to add lables to the tree graph. Below are the labels >> > and >> > the digraph Tree. However, I dont see lables on the tree nodes. Did I >> > not >> > use feature names correct? >> > >> > >> > >> > >> > measurements = [ >> > {'country':'US','city': 'Dubai'}, >> > {'country':'US','city': 'London'}, >> > {'country':'US','city': 'San Fransisco'}, >> > {'country':'US','city': 'Dubai'}, >> > {'country':'AU','city': 'Mel'}, >> > {'country':'AU','city': 'Sydney'}, >> > {'country':'AU','city': 'Mel'}, >> > {'country':'AU','city': 'Sydney'}, >> > {'country':'AU','city': 'Mel'}, >> > {'country':'AU','city': 'Sydney'}, >> > ] >> > y = [0,0,0,1,1,1,1,1,1,1] >> > >> > >> > vec = DictVectorizer() >> > X = vec.fit_transform(measurements) >> > feature_name = vec.get_feature_names() >> > clf = tree.DecisionTreeRegressor() >> > clf = clf.fit(X.todense(), y) >> > with open("au.dot", 'w') as f: >> > f = tree.export_graphviz(clf, out_file=f,feature_names=feature_name) >> > >> > >> > feature_name = ['city=Dubai', 'city=London', 'city=Mel', 'city=San >> > Fransisco', 'city=Sydney', 'country=AU', 'country=US'] >> > >> > digraph Tree { >> > 0 [label="country=AU <= 0.5000\nerror = 2.1\nsamples = 10\nvalue = [ >> > 0.7]", >> > shape="box"] ; >> > 1 [label="city=Dubai <= 0.5000\nerror = 0.75\nsamples = 4\nvalue = [ >> > 0.25]", >> > shape="box"] ; >> > 0 -> 1 ; >> > 2 [label="error = 0.0000\nsamples = 2\nvalue = [ 0.]", shape="box"] ; >> > 1 -> 2 ; >> > 3 [label="error = 0.5000\nsamples = 2\nvalue = [ 0.5]", shape="box"] ; >> > 1 -> 3 ; >> > 4 [label="error = 0.0000\nsamples = 6\nvalue = [ 1.]", shape="box"] ; >> > 0 -> 4 ; >> > } >> > >> > >> > >> > >> > On Wed, Feb 27, 2013 at 9:50 PM, Peter Prettenhofer >> > <[email protected]> wrote: >> >> >> >> Hi David, >> >> >> >> I recommend that you load the data using Pandas (``pandas.read_csv``). >> >> Scikit-learn does not support categorical features out-of-the-box; you >> >> need to encode them as dummy variables (aka one-hot encoding) - you >> >> can do this either using ``sklearn.preprocessing.DictVectorizer`` or >> >> via ``pandas.get_dummies`` . >> >> >> >> HTH, >> >> Peter >> >> >> >> 2013/2/27 David Montgomery <[email protected]>: >> >> > Hi, >> >> > >> >> > I have a data structure that looks like this: >> >> > >> >> > 1 NewYork 1 6 high >> >> > 0 LA 3 4 low >> >> > ....... >> >> > >> >> > I am trying to predict probability where Y is column one. The all of >> >> > the >> >> > attributes of the X are categorical and I will use a dtree >> >> > regression. >> >> > How >> >> > do I load this data into the y and X? >> >> > >> >> > Thanks >> >> > >> >> > >> >> > >> >> > ------------------------------------------------------------------------------ >> >> > Everyone hates slow websites. So do we. >> >> > Make your web apps faster with AppDynamics >> >> > Download AppDynamics Lite for free today: >> >> > http://p.sf.net/sfu/appdyn_d2d_feb >> >> > _______________________________________________ >> >> > Scikit-learn-general mailing list >> >> > [email protected] >> >> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> > >> >> >> >> >> >> >> >> -- >> >> Peter Prettenhofer >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Everyone hates slow websites. So do we. >> >> Make your web apps faster with AppDynamics >> >> Download AppDynamics Lite for free today: >> >> http://p.sf.net/sfu/appdyn_d2d_feb >> >> _______________________________________________ >> >> Scikit-learn-general mailing list >> >> [email protected] >> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > >> > >> > >> > >> > ------------------------------------------------------------------------------ >> > Everyone hates slow websites. So do we. >> > Make your web apps faster with AppDynamics >> > Download AppDynamics Lite for free today: >> > http://p.sf.net/sfu/appdyn_d2d_feb >> > _______________________________________________ >> > Scikit-learn-general mailing list >> > [email protected] >> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > >> >> >> >> -- >> Peter Prettenhofer >> >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://p.sf.net/sfu/appdyn_d2d_feb >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > -- Peter Prettenhofer ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
