[ https://issues.apache.org/jira/browse/SPARK-6259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584934#comment-14584934 ]
Yu Ishikawa edited comment on SPARK-6259 at 6/14/15 4:50 AM: ------------------------------------------------------------- [~josephkb] and [~mengxr], I'm having a trouble with implementing {{LDAModel.describeTopics ()}} in Python. Because pyspark can't interpret a return value of the same name method in Scala, that is {{Array[(Array\[Int\], Array\[Double\])\]}}. It seems that pyspark can't interpret multiple array classes in a tuple class. but I have confirmed it could do an array class in a tuple in pyspark. So I think creating a wrapper class for LDA topic would be nice. What do think about that? Or is there any good way for pyspark to deal with a return value which include a completed tuple in Scala? h4. Current API {noformat} def describeTopics(maxTermsPerTopic: Int): Array[(Array[Int], Array[Double])] {noformat} h4. My Suggestion We should also implement a pickler class for the class in {{PythonMLLibAPI}}, like {{RatingPickler}}. {noformat} case class LDATopic(terms: Array[Int], termWeights: Array[Double]) def describeTopics(maxTermsPerTopic: Int): Array[LDATopic] {noformat} - Pros -- We can probably implement the method in python. - Cons -- We should deprecated the current implementation of {{LDAModel.describeTopics}} in Scala was (Author: yuu.ishik...@gmail.com): [~josephkb] and [~mengxr], I'm having a trouble with implementing {{LDAModel.describeTopics ()}} in Python. Because pyspark can't interpret a return value of the same name method in Scala, that is {{Array[(Array\[Int\], Array\[Double\])\]}}. It seems that pyspark can't interpret multiple array classes in a tuple class. but I have confirmed it could do an array class in a tuple in pyspark. So I think creating a wrapper class for LDA topic would be nice. What do think about that? Or is there any good way for pyspark to deal with a return value which include a completed tuple in Scala? h4. Current API {noformat} def describeTopics(maxTermsPerTopic: Int): Array[(Array[Int], Array[Double])] {noformat} h4. My Suggestion We should also implement a pickler class for the class in {{PythonMLLibAPI}, like {{RatingPickler}}. {noformat} case class LDATopic(terms: Array[Int], termWeights: Array[Double]) def describeTopics(maxTermsPerTopic: Int): Array[LDATopic] {noformat} - Pros -- We can probably implement the method in python. - Cons -- We should deprecated the current implementation of {{LDAModel.describeTopics}} in Scala > Python API for LDA > ------------------ > > Key: SPARK-6259 > URL: https://issues.apache.org/jira/browse/SPARK-6259 > Project: Spark > Issue Type: Improvement > Components: MLlib, PySpark > Affects Versions: 1.3.0 > Reporter: Joseph K. Bradley > > Add Python API for LDA. > This task may be blocked by ongoing work on LDA which may require API changes: > * [SPARK-5563] > * [SPARK-5556] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org