Joseph K. Bradley created SPARK-9245: ----------------------------------------
Summary: DistributedLDAModel predict top topic per doc-term instance Key: SPARK-9245 URL: https://issues.apache.org/jira/browse/SPARK-9245 Project: Spark Issue Type: New Feature Components: MLlib Reporter: Joseph K. Bradley For each (document, term) pair, return top topic. Note that instances of (doc, term) pairs within a document (a.k.a. "tokens") are exchangeable, so we should provide an estimate per document-term, rather than per token. Synopsis for DistributedLDAModel: {code} /** @return RDD of (doc ID, vector of top topic index for each term) */ def topTopicAssignments: RDD[(Long, Vector)] {code} Note that using Vector will let us have a sparse encoding which is Java-friendly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org