[Scikit-learn-general] CountVectorizer needs too much input?

Doug Coleman Fri, 23 Nov 2012 12:50:25 -0800

Hi,

I just want some n-grams--I don't necessarily want to tell CountVectorizer
my life story. It's pretty stingy about giving n-grams unless you pass it a
ton of data or something.


Am I using it wrong? Are there kwargs that I missed that would support this
kind of use case?

Thanks,
Doug


In [225]: cv = CountVectorizer(analyzer='char', stop_words=None,
ngram_range=(1,5))

In [226]: cv.fit(['Gimme n-grams!'])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-226-ccd2238d644b> in <module>()
----> 1 cv.fit(['Gimme n-grams!'])

/usr/local/lib/python2.7/site-packages/sklearn/feature_extraction/text.pyc
in fit(self, raw_documents, y)
    430         self
    431         """
--> 432         self.fit_transform(raw_documents)
    433         return self
    434

/usr/local/lib/python2.7/site-packages/sklearn/feature_extraction/text.pyc
in fit_transform(self, raw_documents, y)
    518         vocab = dict(((t, i) for i, t in enumerate(sorted(terms))))
    519         if not vocab:
--> 520             raise ValueError("empty vocabulary; training set may
have"
    521                              " contained only stop words")
    522         self.vocabulary_ = vocab

ValueError: empty vocabulary; training set may have contained only stop
words

------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] CountVectorizer needs too much input?

Reply via email to