Hi,
I just want some n-grams--I don't necessarily want to tell CountVectorizer
my life story. It's pretty stingy about giving n-grams unless you pass it a
ton of data or something.
Am I using it wrong? Are there kwargs that I missed that would support this
kind of use case?
Thanks,
Doug
In [225]: cv = CountVectorizer(analyzer='char', stop_words=None,
ngram_range=(1,5))
In [226]: cv.fit(['Gimme n-grams!'])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-226-ccd2238d644b> in <module>()
----> 1 cv.fit(['Gimme n-grams!'])
/usr/local/lib/python2.7/site-packages/sklearn/feature_extraction/text.pyc
in fit(self, raw_documents, y)
430 self
431 """
--> 432 self.fit_transform(raw_documents)
433 return self
434
/usr/local/lib/python2.7/site-packages/sklearn/feature_extraction/text.pyc
in fit_transform(self, raw_documents, y)
518 vocab = dict(((t, i) for i, t in enumerate(sorted(terms))))
519 if not vocab:
--> 520 raise ValueError("empty vocabulary; training set may
have"
521 " contained only stop words")
522 self.vocabulary_ = vocab
ValueError: empty vocabulary; training set may have contained only stop
words
------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general