2013/7/6 Joel Nothman <[email protected]>:
> Sorry, that sent prematurely.
>
>
> On Sun, Jul 7, 2013 at 6:58 AM, Joel Nothman <[email protected]>
> wrote:
>>
>> I am not aware of a definitive, complete solution. Lars has built an
>> NLTK-compatible classifier interface in nltk.classify.scikitlearn, while
>> scikit-learn provides the various components in sklearn.feature_extraction
>> that handle text directly, or would allow you to readily produce arrays from
>> feature dicts.
>
>
> I don't think there's any clear, generic way for them to interface better:
> both systems prefer to interface with native types (dicts, numpy arrays)
> rather than sophisticated framework components. (But I'm also not convinced
> that NLTK is the right tool for a lot of large-scale feature extraction
> jobs.)
>
> I also don't know what data you want to analyse in Pandas: the feature data?
> the classification results?
>
> In each of these packages' attempts to remain singular in their purpose and
> therefore independent, you only really get occasional blog posts and PyCon
> tutorials from the likes of Olivier that tie them together. Frustratingly,
> something like
> http://www.slideshare.net/ogrisel/statistical-machine-learning-for-text-classification-with-scikitlearn-and-nltk
> is rapidly outdated.
>
> I think it would be in scikit-learn's best interests to provide up-to-date
> examples of both these interactions, although it means maintaining an
> examples package with more external dependencies.

The documentation on feature extraction is up-to-date, quite complete
and has an example snippet to use nltk for text pre-processing
(lemmatization and tokenization):

http://scikit-learn.org/dev/modules/feature_extraction.html#text-feature-extraction

However this is just for text classification  / clustering.

--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to