2013/7/6 Joel Nothman <[email protected]>: > Sorry, that sent prematurely. > > > On Sun, Jul 7, 2013 at 6:58 AM, Joel Nothman <[email protected]> > wrote: >> >> I am not aware of a definitive, complete solution. Lars has built an >> NLTK-compatible classifier interface in nltk.classify.scikitlearn, while >> scikit-learn provides the various components in sklearn.feature_extraction >> that handle text directly, or would allow you to readily produce arrays from >> feature dicts. > > > I don't think there's any clear, generic way for them to interface better: > both systems prefer to interface with native types (dicts, numpy arrays) > rather than sophisticated framework components. (But I'm also not convinced > that NLTK is the right tool for a lot of large-scale feature extraction > jobs.) > > I also don't know what data you want to analyse in Pandas: the feature data? > the classification results? > > In each of these packages' attempts to remain singular in their purpose and > therefore independent, you only really get occasional blog posts and PyCon > tutorials from the likes of Olivier that tie them together. Frustratingly, > something like > http://www.slideshare.net/ogrisel/statistical-machine-learning-for-text-classification-with-scikitlearn-and-nltk > is rapidly outdated. > > I think it would be in scikit-learn's best interests to provide up-to-date > examples of both these interactions, although it means maintaining an > examples package with more external dependencies.
The documentation on feature extraction is up-to-date, quite complete and has an example snippet to use nltk for text pre-processing (lemmatization and tokenization): http://scikit-learn.org/dev/modules/feature_extraction.html#text-feature-extraction However this is just for text classification / clustering. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
