I am not aware of a definitive, complete solution. Lars has built an
NLTK-compatible classifier interface in nltk.classify.scikitlearn, while
scikit-learn provides the various components in sklearn.feature_extraction
that handle text directly, or would allow you to readily produce arrays
from feature dicts.



NLTK has some limited provisions for feature extraction, and  doesn't
directly target feature extraction or provide a consistent interface for
it. Similar to scikit-learn, it


On Sun, Jul 7, 2013 at 2:53 AM, Tom Fawcett <[email protected]> wrote:

> Hi.  I’m trying to figure out a good general framework for working with
> text (classification and clustering).  There is an odd intersection of
> Python packages and no clear way to integrate them optimally:
>
> - NLTK seems like the best at handling natural language.
> - sklearn has the strongest components of learning and evaluation.
> - Pandas is very good for data storage, transformation, and visualization.
>
> Each can do a little of what the others can do, and some integrations
> exist (pandas and sklearn both use numpy arrays so they’re pretty
> compatible), but it seems like there’s no clear, good way to integrate
> them.  It’s very common to want to go from raw text to stemming and
> n-grams, term frequencies, and finally to TFIDF matrices for learning.  But
> from my searching, people either stay in one package or write ad hoc glue
> code to transform the data.
>
> My question: Is there any interface package, or best practices
> documentation, for using them together to do large-scale text processing?
>  I can write my own glue code if I have to, but I’d rather not reinvent the
> wheel.
>
> Thanks,
> -Tom
>
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to