[jira] [Comment Edited] (SPARK-6259) Python API for LDA

Yu Ishikawa (JIRA) Sat, 13 Jun 2015 21:51:09 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-6259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584934#comment-14584934
 ]


Yu Ishikawa edited comment on SPARK-6259 at 6/14/15 4:50 AM:
-------------------------------------------------------------

[~josephkb] and [~mengxr], I'm having a trouble with implementing 
{{LDAModel.describeTopics ()}} in Python. Because pyspark can't interpret a 
return value of the same name method in Scala, that is {{Array[(Array\[Int\], 
Array\[Double\])\]}}. It seems that pyspark can't interpret multiple array 
classes in a tuple class. but I have confirmed it could do an array class in a 
tuple in pyspark.

So I think creating a wrapper class for LDA topic would be nice. What do think 
about that?
Or is there any good way for pyspark to deal with a return value which include 
a completed tuple in Scala?

h4. Current API

{noformat}
def describeTopics(maxTermsPerTopic: Int): Array[(Array[Int], Array[Double])] 
{noformat}

h4. My Suggestion

We should also implement a pickler class for the class in {{PythonMLLibAPI}}, 
like {{RatingPickler}}.

{noformat}
case class LDATopic(terms: Array[Int], termWeights: Array[Double])
def describeTopics(maxTermsPerTopic: Int): Array[LDATopic]
{noformat}

- Pros
-- We can probably implement the method in python.
- Cons
-- We should deprecated the current implementation of 
{{LDAModel.describeTopics}} in Scala


was (Author: yuu.ishik...@gmail.com):
[~josephkb] and [~mengxr], I'm having a trouble with implementing 
{{LDAModel.describeTopics ()}} in Python. Because pyspark can't interpret a 
return value of the same name method in Scala, that is {{Array[(Array\[Int\], 
Array\[Double\])\]}}. It seems that pyspark can't interpret multiple array 
classes in a tuple class. but I have confirmed it could do an array class in a 
tuple in pyspark.

So I think creating a wrapper class for LDA topic would be nice. What do think 
about that?
Or is there any good way for pyspark to deal with a return value which include 
a completed tuple in Scala?

h4. Current API

{noformat}
def describeTopics(maxTermsPerTopic: Int): Array[(Array[Int], Array[Double])] 
{noformat}

h4. My Suggestion

We should also implement a pickler class for the class in {{PythonMLLibAPI}, 
like {{RatingPickler}}.

{noformat}
case class LDATopic(terms: Array[Int], termWeights: Array[Double])
def describeTopics(maxTermsPerTopic: Int): Array[LDATopic]
{noformat}

- Pros
-- We can probably implement the method in python.
- Cons
-- We should deprecated the current implementation of 
{{LDAModel.describeTopics}} in Scala

> Python API for LDA
> ------------------
>
>                 Key: SPARK-6259
>                 URL: https://issues.apache.org/jira/browse/SPARK-6259
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib, PySpark
>    Affects Versions: 1.3.0
>            Reporter: Joseph K. Bradley
>
> Add Python API for LDA.
> This task may be blocked by ongoing work on LDA which may require API changes:
> * [SPARK-5563]
> * [SPARK-5556]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-6259) Python API for LDA

Reply via email to