Thanks I used DictVectorizer()

I am now trying to add lables to the tree graph.   Below are the labels and
the digraph Tree.  However, I dont see lables on the tree nodes.  Did I not
use feature names correct?




measurements = [
{'country':'US','city': 'Dubai'},
{'country':'US','city': 'London'},
{'country':'US','city': 'San Fransisco'},
{'country':'US','city': 'Dubai'},
{'country':'AU','city': 'Mel'},
{'country':'AU','city': 'Sydney'},
{'country':'AU','city': 'Mel'},
{'country':'AU','city': 'Sydney'},
{'country':'AU','city': 'Mel'},
{'country':'AU','city': 'Sydney'},
]
y = [0,0,0,1,1,1,1,1,1,1]


vec = DictVectorizer()
X = vec.fit_transform(measurements)
feature_name = vec.get_feature_names()
clf = tree.DecisionTreeRegressor()
clf = clf.fit(X.todense(), y)
with open("au.dot", 'w') as f:
    f = tree.export_graphviz(clf, out_file=f,feature_names=feature_name)


feature_name = ['city=Dubai', 'city=London', 'city=Mel', 'city=San
Fransisco', 'city=Sydney', 'country=AU', 'country=US']

digraph Tree {
0 [label="country=AU <= 0.5000\nerror = 2.1\nsamples = 10\nvalue = [ 0.7]",
shape="box"] ;
1 [label="city=Dubai <= 0.5000\nerror = 0.75\nsamples = 4\nvalue = [
0.25]", shape="box"] ;
0 -> 1 ;
2 [label="error = 0.0000\nsamples = 2\nvalue = [ 0.]", shape="box"] ;
1 -> 2 ;
3 [label="error = 0.5000\nsamples = 2\nvalue = [ 0.5]", shape="box"] ;
1 -> 3 ;
4 [label="error = 0.0000\nsamples = 6\nvalue = [ 1.]", shape="box"] ;
0 -> 4 ;
}




On Wed, Feb 27, 2013 at 9:50 PM, Peter Prettenhofer <
[email protected]> wrote:

> Hi David,
>
> I recommend that you load the data using Pandas (``pandas.read_csv``).
> Scikit-learn does not support categorical features out-of-the-box; you
> need to encode them as dummy variables (aka one-hot encoding) - you
> can do this either using ``sklearn.preprocessing.DictVectorizer`` or
> via ``pandas.get_dummies`` .
>
> HTH,
>  Peter
>
> 2013/2/27 David Montgomery <[email protected]>:
> > Hi,
> >
> > I have a data structure that looks like this:
> >
> > 1 NewYork 1 6 high
> > 0 LA 3 4 low
> > .......
> >
> > I am trying to predict probability where Y is column one.  The all of the
> > attributes of the X are categorical and I will use a dtree regression.
>  How
> > do I load this data into the y and X?
> >
> > Thanks
> >
> >
> ------------------------------------------------------------------------------
> > Everyone hates slow websites. So do we.
> > Make your web apps faster with AppDynamics
> > Download AppDynamics Lite for free today:
> > http://p.sf.net/sfu/appdyn_d2d_feb
> > _______________________________________________
> > Scikit-learn-general mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
>
>
>
> --
> Peter Prettenhofer
>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_feb
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to