[ https://issues.apache.org/jira/browse/SPARK-8555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076488#comment-15076488 ]
Sean Owen commented on SPARK-8555: ---------------------------------- Have a look at https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-MLlib-specificContributionGuidelines ; generally speaking there are so many algorithms to implement and most aren't that useful or widely used, and so few really belong in MLlib itself. I'm not commenting on HDP here, though I don't think it's that commonly used. The idea is that it should prove itself out externally. > Online Variational Inference for the Hierarchical Dirichlet Process > ------------------------------------------------------------------- > > Key: SPARK-8555 > URL: https://issues.apache.org/jira/browse/SPARK-8555 > Project: Spark > Issue Type: New Feature > Components: MLlib > Reporter: yuhao yang > Priority: Minor > > The task is created for exploration on the online HDP algorithm described in > http://jmlr.csail.mit.edu/proceedings/papers/v15/wang11a/wang11a.pdf. > Major advantage for the algorithm: one pass on corpus, streaming friendly, > automatic K (topic number). > Currently the scope is to support online HDP for topic modeling, i.e. > probably an optimizer for LDA. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org