[ 
https://issues.apache.org/jira/browse/SPARK-13161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15178124#comment-15178124
 ] 

John Hogue commented on SPARK-13161:
------------------------------------

Author-Topic model is ideal for working with extremely short documents such as 
social media posts because it includes an additional distribution for the 
relation between the author and topics. The use case would be to apply LDA AT 
to model out topics that appear in the corpus with a much higher quality output 
that resembles a news ticker instead of a jumble of words. Combining this with 
the ability to save the prior topics and predict against them you would be able 
to see how topics change over time in word distribution as well as volume.

> Extend MLlib LDA to include options for Author Topic Modeling
> -------------------------------------------------------------
>
>                 Key: SPARK-13161
>                 URL: https://issues.apache.org/jira/browse/SPARK-13161
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.6.0
>            Reporter: John Hogue
>            Priority: Minor
>
> The author-topic model, a generative model for documents that extends Latent 
> Dirichlet Allocation.
> By modeling the interests of authors, we can answer a range of important 
> queries about the content of document collections. With an appropriate author 
> model, we can establish which subjects an author writes about, which authors 
> are likely to have written documents similar to an observed document, and 
> which authors produce similar work.
> Full whitepaper here.
> http://mimno.infosci.cornell.edu/info6150/readings/398.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to