Joseph K. Bradley created SPARK-9245:
----------------------------------------

             Summary: DistributedLDAModel predict top topic per doc-term 
instance
                 Key: SPARK-9245
                 URL: https://issues.apache.org/jira/browse/SPARK-9245
             Project: Spark
          Issue Type: New Feature
          Components: MLlib
            Reporter: Joseph K. Bradley


For each (document, term) pair, return top topic.  Note that instances of (doc, 
term) pairs within a document (a.k.a. "tokens") are exchangeable, so we should 
provide an estimate per document-term, rather than per token.

Synopsis for DistributedLDAModel:
{code}
/** @return RDD of (doc ID, vector of top topic index for each term) */
def topTopicAssignments: RDD[(Long, Vector)]
{code}
Note that using Vector will let us have a sparse encoding which is 
Java-friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to