2012/5/14 JAGANADH G <[email protected]>:
>
>
> On Fri, May 11, 2012 at 3:06 PM, Olivier Grisel <[email protected]>
> wrote:
>>
>> 2012/5/10 JAGANADH G <[email protected]>:
>> > Hi all
>> >
>> > Is there any way to get the TF-IDF value mapped with the word vector in
>> > sklearn.
>> >
>> > I would like to get output like
>> >
>> > w1 -> TF-IDF
>> > w2 -> TF-IDF
>>
>> TF is sample-dependent but the IDF weights for each feature index are
>> stored as an array attribute named `idf_` on the fitted vectorizer
>> along with the `vocabulary_` that gives you the mapping from words to
>> IDF weights.
>>
>> See the documentation for more details:
>>
>>
>> http://scikit-learn.org/dev/modules/feature_extraction.html#text-feature-extraction
>>
>
> Thanks Olivier ,
> I tried the same . I am pasting the code below . Am I following the correct
> procedure ??
>
> [code]
> from sklearn.datasets import load_files
> categories = ["pos","neg"]
> mov_train =
> load_files("/usr/share/nltk_data/corpora/movie_reviews",categories=categories,shuffle=True,random_state=42)
> from sklearn.feature_extraction.text import CountVectorizer
> from sklearn.feature_extraction.text import TfidfTransformer
>
> cvect = CountVectorizer()
> train_counts = cvect.fit_transform(mov_train.data)
> tfidf_tr = TfidfTransformer(use_idf=True).fit(train_counts)
>
> for word,fr in zip(cvect.vocabulary_,tfidf_tr.idf_):
> print '%r => %r' % (word, fr)
Vocabulary is a dict so the iteration order is not deterministic. Instead do:
for word, feature_idx in cvect.vocabulary_.iteritems():
print '%r => %r' % (word, tfidf_tr.idf_[feature_idx])
(untested).
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general