Does anybody here know how to access a Python compressed sparse row format 
(CSR) object? [1]

I am using Python to do a bit of topic modeling (think “classification”), and 
so far, the results are more than plausible, but the results only return topics 
not documents corresponding to the topics. Along the way, my script creates a 
compressed sparse row format object, and it looks something like this:

  (0, 16099)    0.055924002143
  (0, 9497)     0.0256051292226
  (0, 16202)    0.140746540109
  (0, 38982)    0.000842900625312
  :     :
  (309, 40805)  0.0435077792741
  (309, 45679)  0.0435077792741
  (309, 19462)  0.0435077792741
  (309, 8346)   0.0435077792741
  (309, 31204)  0.0435077792741

Where the first column denotes a document identifier, the second column denotes 
a topic identifier, and the third column denotes the score of the topic in the 
document. In the example above, document #0 is a lot about topic #16202 but not 
a lot about topic #38982.

I want to query my CSR object. For example, given a topic identifier (ie. 
48692), return a list of all document identifiers and scores from the object. I 
will then sort the scores to find which documents which most significantly use 
the given topic.

I can’t for the life of me figure out how to get what I need. I can get 
specific values of rows like this where tfidf is my CRS object:

  >>> print( tfidf[ 309, 31204 ] )
  >>> 0.0435077792741

Any help would be greatly appreciated.

[1] CSR - http://bit.ly/2fPj42V

—
Eric Morgan

Reply via email to