[Scikit-learn-general] clf.fit freezes on small dataset in scikit-learn

Josh Wasserstein Wed, 03 Jul 2013 11:38:11 -0700

This is odd. I can successfully run the example `grid_search_digits.py`.
However, I am unable to do a grid search on my own data.


I have the following setup:
===============
    import sklearn
    from sklearn.svm import SVC
    from sklearn.grid_search import GridSearchCV
    from sklearn.cross_validation import LeaveOneOut
    from sklearn.metrics import auc_score

    # ... Build X and y ....

    tuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
                         'C': [1, 10, 100, 1000]},
                        {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]

    loo = LeaveOneOut(len(y))
    clf = GridSearchCV(SVC(C=1), tuned_parameters, score_func=auc_score)
    clf.fit(X, y, cv=loo)
    ....
    print clf.best_estimator_
    ....
===============
But I never get passed `clf.fit` (I left it run for ~1hr).

I have tried also with

    clf.fit(X, y, cv=10)

and with

    skf = StratifiedKFold(y,2)
    clf.fit(X, y, cv=skf)

and had the same problem (it never finishes the clf.fit statement). My data
is simple:

    > X.shape
    (27,26)

    > y.shape
    5

    > y.dtype
    dtype('int64')


    >?y
    Type:       ndarray
    String Form:[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1]
    Length:     27
    File:
/home/jacob04/opt/python/numpy/numpy-1.7.1/lib/python2.7/site-

packages/numpy/__init__.py
    Docstring:  <no docstring>
    Class Docstring:
    ndarray(shape, dtype=float, buffer=None, offset=0,
            strides=None, order=None)

    > ?X
    Type:       ndarray
    String Form:
           [[ -3.61238468e+03  -3.61253920e+03  -3.61290196e+03
-3.61326679e+03
               7.84590361e+02   0.0000 <...> 0000e+00   2.22389150e+00
2.53252959e+00
               2.11606216e+00  -1.99613432e+05  -1.99564828e+05]]
    Length:     27
    File:
/home/jacob04/opt/python/numpy/numpy-1.7.1/lib/python2.7/site-

packages/numpy/__init__.py
    Docstring:  <no docstring>
    Class Docstring:
    ndarray(shape, dtype=float, buffer=None, offset=0,
            strides=None, order=None)

This is all with the latest version of scikit-learn (0.13.1) and:

    $ pip freeze
    Cython==0.19.1
    PIL==1.1.7
    PyXB==1.2.2
    PyYAML==3.10
    argparse==1.2.1
    distribute==0.6.34
    epc==0.0.5
    ipython==0.13.2
    jedi==0.6.0
    matplotlib==1.3.x
    nltk==2.0.4
    nose==1.3.0
    numexpr==2.1
    numpy==1.7.1
    pandas==0.11.0
    pyparsing==1.5.7
    python-dateutil==2.1
    pytz==2013b
    rpy2==2.3.1
    scikit-learn==0.13.1
    scipy==0.12.0
    sexpdata==0.0.3
    six==1.3.0
    stemming==1.0.1
    -e git+
https://github.com/PyTables/PyTables.git@df7b20444b0737cf34686b5d88b4e674ec85575b#egg=tables-dev
    tornado==3.0.1
    wsgiref==0.1.2

Thanks,

Jacob

PS: This thread is based on the following StackOverflow post:
http://stackoverflow.com/questions/17455302/clf-fit-freezes-on-small-dataset-in-scikit-learn

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] clf.fit freezes on small dataset in scikit-learn

Reply via email to