[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

jkbradley Mon, 15 Dec 2014 15:03:33 -0800

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/1269#issuecomment-67083325
  
    @akopich   Thanks for the updates!  (Much easier to see the diff now)
    
    The decision about setters vs. constructor arguments was from [this JIRA 
(design doc linked from 
JIRA)](https://issues.apache.org/jira/browse/SPARK-3530).  The main issues were 
binary compatibility (adding a new parameter to an algorithm requires adding a 
new constructor when we don't use setters) and problems with Java not 
supporting named parameters and default arguments.
    
    For testing, I agree it's tough to test "accuracy" for topic models.  It 
would be great to see comparisons on the same dataset, just to get a sense of 
whether perplexity decreases similarly and is in the same ballpark.  Same with 
scalability & speed.
    As far as other implementations, your other non-Spark implementation would 
be helpful if it's been tested before.  It would be great to see comparisons 
with other Spark LDA implementations.  E.g., 
[https://github.com/apache/spark/pull/2388] is LDA using GraphX, and it would 
be very helpful to see how performance varies.  (It might also take a bit more 
work though.)
    What sounds reasonable to you?
    
    I'm recommending this testing as separate from unit tests.  (A unit test 
for perplexity sounds a bit unstable.)  Perhaps the test code could become an 
example instead of a unit test.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling

Reply via email to