2012/5/25 Philipp Singer <[email protected]>:
> Hey!
>
> Is it possible to easly include stemming to text feature extraction in
> scikit-learn?
>
> I know that nltk has an implementation of the Porter stemmer, but I do
> not want to change my whole
> text feature extraction process to nltl if possible. Would be nice if I
> could include that somehow easyly.

In the 0.11 version you can pass a preprocessor, tokenizer and / or
analyzer function to the CountVectorizer and TfIdfVectorizer classes:

http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html

Alternatively you can derive from one of those classes and override
the build_preprocessor, build_tokenizer and / or build_analyzer
methods to customize those steps the way you need and call the nltk
stemmer there.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to