[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-09 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-155148314
  
@yu-iskw Thanks for this!  Quick request: Could you please send a little 
follow-up PR to document (in the Python doc) what is being returned?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-09 Thread yu-iskw
Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-155175314
  
@jkbradley sure!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-09 Thread yu-iskw
Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-155232384
  
@jkbradley I send the PR at https://github.com/apache/spark/pull/9577.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread yu-iskw
Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154585403
  
@jkbradley @davies could you review it? I modified the type conversion 
using `SerDe.dumps`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154632203
  
**[Test build #2002 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2002/consoleFull)**
 for PR 8643 at commit 
[`e91c65a`](https://github.com/apache/spark/commit/e91c65aef57ae870b0ceb0970068cefcc1231198).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread yu-iskw
Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154629369
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread yu-iskw
Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154643744
  
Thank you for merging it and your great support!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8643


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/8643#discussion_r44207597
  
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -773,10 +776,10 @@ def train(cls, rdd, k=10, maxIterations=20, 
docConcentration=-1.0,
 :param optimizer:   LDAOptimizer used to perform the 
actual calculation.
 Currently "em", "online" are supported. Default to "em".
 """
-model = callMLlibFunc("trainLDAModel", rdd, k, maxIterations,
-  docConcentration, topicConcentration, seed,
-  checkpointInterval, optimizer)
-return LDAModel(model)
+wrapper_model = callMLlibFunc("trainLDAModel", rdd, k, 
maxIterations,
--- End diff --

we could still call it `model`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/8643#discussion_r44207604
  
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -745,9 +749,8 @@ def load(cls, sc, path):
 raise TypeError("sc should be a SparkContext, got type %s" % 
type(sc))
 if not isinstance(path, basestring):
 raise TypeError("path should be a basestring, got type %s" % 
type(path))
-java_model = 
sc._jvm.org.apache.spark.mllib.clustering.DistributedLDAModel.load(
-sc._jsc.sc(), path)
-return cls(java_model)
+wrapper_model = callMLlibFunc("loadLDAModel", sc, path)
--- End diff --

call it `model` for short


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154615987
  
LGTM, but a few minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread yu-iskw
Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154618704
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154642699
  
LGTM, merging this into master and 1.6 branch, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread yu-iskw
Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154617157
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154640453
  
**[Test build #2002 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2002/consoleFull)**
 for PR 8643 at commit 
[`e91c65a`](https://github.com/apache/spark/commit/e91c65aef57ae870b0ceb0970068cefcc1231198).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class LDAModel(JavaModelWrapper, JavaSaveable, Loader):`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread yu-iskw
Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154640689
  
@davies thanks for the review. I fixed them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/8643#discussion_r44207559
  
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -687,9 +687,14 @@ class LDAModel(JavaModelWrapper):
 ... [2, SparseVector(2, {0: 1.0})],
 ... ]
 >>> rdd =  sc.parallelize(data)
->>> model = LDA.train(rdd, k=2)
+>>> model = LDA.train(rdd, k=2, seed = 1)
--- End diff --

no space around `=`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/8643#discussion_r44172629
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.mllib.api.python
+
+import scala.collection.JavaConverters
+
+import org.apache.spark.SparkContext
+import org.apache.spark.mllib.clustering.LDAModel
+import org.apache.spark.mllib.linalg.Matrix
+
+/**
+ * Wrapper around LDAModel to provide helper methods in Python
+ */
+private[python] class LDAModelWrapper(model: LDAModel) {
+
+  def topicsMatrix(): Matrix = model.topicsMatrix
+
+  def vocabSize(): Int = model.vocabSize
+
+  def describeTopics(): java.util.List[Array[Any]] = 
describeTopics(this.model.vocabSize)
+
+  def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = {
+
+val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, 
termWeights) =>
+Array.empty[Any] ++ terms ++ termWeights
+  }.toSeq
+JavaConverters.seqAsJavaListConverter(seq).asJava
--- End diff --

you could return this in Java :  List

[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread yu-iskw
Github user yu-iskw commented on a diff in the pull request:

https://github.com/apache/spark/pull/8643#discussion_r44164309
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.mllib.api.python
+
+import scala.collection.JavaConverters
+
+import org.apache.spark.SparkContext
+import org.apache.spark.mllib.clustering.LDAModel
+import org.apache.spark.mllib.linalg.Matrix
+
+/**
+ * Wrapper around LDAModel to provide helper methods in Python
+ */
+private[python] class LDAModelWrapper(model: LDAModel) {
+
+  def topicsMatrix(): Matrix = model.topicsMatrix
+
+  def vocabSize(): Int = model.vocabSize
+
+  def describeTopics(): java.util.List[Array[Any]] = 
describeTopics(this.model.vocabSize)
+
+  def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = {
+
+val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, 
termWeights) =>
+Array.empty[Any] ++ terms ++ termWeights
+  }.toSeq
+JavaConverters.seqAsJavaListConverter(seq).asJava
--- End diff --

ping @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread yu-iskw
Github user yu-iskw commented on a diff in the pull request:

https://github.com/apache/spark/pull/8643#discussion_r44168908
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.mllib.api.python
+
+import scala.collection.JavaConverters
+
+import org.apache.spark.SparkContext
+import org.apache.spark.mllib.clustering.LDAModel
+import org.apache.spark.mllib.linalg.Matrix
+
+/**
+ * Wrapper around LDAModel to provide helper methods in Python
+ */
+private[python] class LDAModelWrapper(model: LDAModel) {
+
+  def topicsMatrix(): Matrix = model.topicsMatrix
+
+  def vocabSize(): Int = model.vocabSize
+
+  def describeTopics(): java.util.List[Array[Any]] = 
describeTopics(this.model.vocabSize)
+
+  def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = {
+
+val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, 
termWeights) =>
+Array.empty[Any] ++ terms ++ termWeights
+  }.toSeq
+JavaConverters.seqAsJavaListConverter(seq).asJava
--- End diff --

I tried to test serialization directly. It worked well. Why do we failed to 
serialize the `describeTopics `'s return value...?
https://gist.github.com/yu-iskw/22fb83895024a29ea048


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/8643#discussion_r44169316
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.mllib.api.python
+
+import scala.collection.JavaConverters
+
+import org.apache.spark.SparkContext
+import org.apache.spark.mllib.clustering.LDAModel
+import org.apache.spark.mllib.linalg.Matrix
+
+/**
+ * Wrapper around LDAModel to provide helper methods in Python
+ */
+private[python] class LDAModelWrapper(model: LDAModel) {
+
+  def topicsMatrix(): Matrix = model.topicsMatrix
+
+  def vocabSize(): Int = model.vocabSize
+
+  def describeTopics(): java.util.List[Array[Any]] = 
describeTopics(this.model.vocabSize)
+
+  def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = {
+
+val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, 
termWeights) =>
+Array.empty[Any] ++ terms ++ termWeights
+  }.toSeq
+JavaConverters.seqAsJavaListConverter(seq).asJava
--- End diff --

I think it failed in Python 3, you may need to specify the encoding:
```
r = PickleSerializer().loads(bytes(r), encoding=encoding)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread yu-iskw
Github user yu-iskw commented on a diff in the pull request:

https://github.com/apache/spark/pull/8643#discussion_r44171965
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.mllib.api.python
+
+import scala.collection.JavaConverters
+
+import org.apache.spark.SparkContext
+import org.apache.spark.mllib.clustering.LDAModel
+import org.apache.spark.mllib.linalg.Matrix
+
+/**
+ * Wrapper around LDAModel to provide helper methods in Python
+ */
+private[python] class LDAModelWrapper(model: LDAModel) {
+
+  def topicsMatrix(): Matrix = model.topicsMatrix
+
+  def vocabSize(): Int = model.vocabSize
+
+  def describeTopics(): java.util.List[Array[Any]] = 
describeTopics(this.model.vocabSize)
+
+  def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = {
+
+val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, 
termWeights) =>
+Array.empty[Any] ++ terms ++ termWeights
+  }.toSeq
+JavaConverters.seqAsJavaListConverter(seq).asJava
--- End diff --

Thanks! Let me think about it. Because python's mllib `call` doesn't have 
`encoding` option.

Sorry, one more thing, what is the difference between the two cases in this 
gist. When comparing the return value with `1`, It seems to be going well. 
However, when comparing with the expected value, it failed.
https://gist.github.com/yu-iskw/59c66bb90d9311c0b408


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154526569
  
Build started sha1 is merged.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154526739
  
**[Test build #45250 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45250/consoleFull)**
 for PR 8643 at commit 
[`56b69f9`](https://github.com/apache/spark/commit/56b69f9b5e1d71f6850dbd70d521ca067cc4b671).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154527979
  

Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45250/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154527959
  
**[Test build #45250 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45250/consoleFull)**
 for PR 8643 at commit 
[`56b69f9`](https://github.com/apache/spark/commit/56b69f9b5e1d71f6850dbd70d521ca067cc4b671).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class LDAModel(JavaModelWrapper, JavaSaveable, Loader):`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread yu-iskw
Github user yu-iskw commented on a diff in the pull request:

https://github.com/apache/spark/pull/8643#discussion_r44185186
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.mllib.api.python
+
+import scala.collection.JavaConverters
+
+import org.apache.spark.SparkContext
+import org.apache.spark.mllib.clustering.LDAModel
+import org.apache.spark.mllib.linalg.Matrix
+
+/**
+ * Wrapper around LDAModel to provide helper methods in Python
+ */
+private[python] class LDAModelWrapper(model: LDAModel) {
+
+  def topicsMatrix(): Matrix = model.topicsMatrix
+
+  def vocabSize(): Int = model.vocabSize
+
+  def describeTopics(): java.util.List[Array[Any]] = 
describeTopics(this.model.vocabSize)
+
+  def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = {
+
+val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, 
termWeights) =>
+Array.empty[Any] ++ terms ++ termWeights
+  }.toSeq
+JavaConverters.seqAsJavaListConverter(seq).asJava
--- End diff --

Thanks! I did it!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154526527
  
Build triggered. sha1 is merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154529620
  
Build started sha1 is merged.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154546841
  
**[Test build #45258 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45258/consoleFull)**
 for PR 8643 at commit 
[`81ee096`](https://github.com/apache/spark/commit/81ee096a04fc77dd3b2940be98f9b86bfae69efd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154529543
  
Build triggered. sha1 is merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154531334
  

Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45252/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread yu-iskw
Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154532750
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread yu-iskw
Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154537228
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread shaneknapp
Github user shaneknapp commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154543591
  
jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154544425
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154544398
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154558189
  
**[Test build #45258 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45258/consoleFull)**
 for PR 8643 at commit 
[`81ee096`](https://github.com/apache/spark/commit/81ee096a04fc77dd3b2940be98f9b86bfae69efd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class LDAModel(JavaModelWrapper, JavaSaveable, Loader):`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-154558405
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-03 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/8643#discussion_r43723078
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.mllib.api.python
+
+import scala.collection.JavaConverters
+
+import org.apache.spark.SparkContext
+import org.apache.spark.mllib.clustering.LDAModel
+import org.apache.spark.mllib.linalg.Matrix
+
+/**
+ * Wrapper around LDAModel to provide helper methods in Python
+ */
+private[python] class LDAModelWrapper(model: LDAModel) {
+
+  def topicsMatrix(): Matrix = model.topicsMatrix
+
+  def vocabSize(): Int = model.vocabSize
+
+  def describeTopics(): java.util.List[Array[Any]] = 
describeTopics(this.model.vocabSize)
+
+  def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = {
+
+val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, 
termWeights) =>
+Array.empty[Any] ++ terms ++ termWeights
+  }.toSeq
+JavaConverters.seqAsJavaListConverter(seq).asJava
--- End diff --

Before call `SerDe.dumps()`, you still need to convert the Tuple2 into 
`Array[Any]()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-03 Thread yu-iskw
Github user yu-iskw commented on a diff in the pull request:

https://github.com/apache/spark/pull/8643#discussion_r43723787
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.mllib.api.python
+
+import scala.collection.JavaConverters
+
+import org.apache.spark.SparkContext
+import org.apache.spark.mllib.clustering.LDAModel
+import org.apache.spark.mllib.linalg.Matrix
+
+/**
+ * Wrapper around LDAModel to provide helper methods in Python
+ */
+private[python] class LDAModelWrapper(model: LDAModel) {
+
+  def topicsMatrix(): Matrix = model.topicsMatrix
+
+  def vocabSize(): Int = model.vocabSize
+
+  def describeTopics(): java.util.List[Array[Any]] = 
describeTopics(this.model.vocabSize)
+
+  def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = {
+
+val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, 
termWeights) =>
+Array.empty[Any] ++ terms ++ termWeights
+  }.toSeq
+JavaConverters.seqAsJavaListConverter(seq).asJava
--- End diff --

Got it. I'll give it a try. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-03 Thread yu-iskw
Github user yu-iskw commented on a diff in the pull request:

https://github.com/apache/spark/pull/8643#discussion_r43823836
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.mllib.api.python
+
+import scala.collection.JavaConverters
+
+import org.apache.spark.SparkContext
+import org.apache.spark.mllib.clustering.LDAModel
+import org.apache.spark.mllib.linalg.Matrix
+
+/**
+ * Wrapper around LDAModel to provide helper methods in Python
+ */
+private[python] class LDAModelWrapper(model: LDAModel) {
+
+  def topicsMatrix(): Matrix = model.topicsMatrix
+
+  def vocabSize(): Int = model.vocabSize
+
+  def describeTopics(): java.util.List[Array[Any]] = 
describeTopics(this.model.vocabSize)
+
+  def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = {
+
+val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, 
termWeights) =>
+Array.empty[Any] ++ terms ++ termWeights
+  }.toSeq
+JavaConverters.seqAsJavaListConverter(seq).asJava
--- End diff --

@davies could you give me a little bit help? I tried to serialize the 
entire list after converting into `Array[Any](...)`. And then, when 
deserializing it in Python, there was something wrong with pickle : `TypeError: 
must be a unicode character, not bytes`.

- My patch

https://github.com/yu-iskw/spark/compare/SPARK-8467-2...yu-iskw:SPARK-8467-2.trial
- unit testing errors
https://gist.github.com/yu-iskw/60e0db67b1e222fc7fd4


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-02 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/8643#discussion_r43657655
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.mllib.api.python
+
+import org.apache.spark.SparkContext
+import org.apache.spark.api.java.JavaSparkContext
+import org.apache.spark.mllib.clustering.LDAModel
+import org.apache.spark.mllib.linalg.Matrix
+import org.apache.spark.sql.{DataFrame, SQLContext}
+
+/**
+ * Wrapper around LDAModel to provide helper methods in Python
+ */
+private[python] class LDAModelWrapper(model: LDAModel) {
+
+  def topicsMatrix(): Matrix = model.topicsMatrix
+
+  def vocabSize(): Int = model.vocabSize
+
+  def describeTopics(jsc: JavaSparkContext): DataFrame = 
describeTopics(this.model.vocabSize, jsc)
+
+  def describeTopics(maxTermsPerTopic: Int, jsc: JavaSparkContext): 
DataFrame = {
+// Since the return value of `describeTopics` is a little complicated,
+// it is converted into `Row` to take advantage of DataFrame 
serialization.
+val sqlContext = new SQLContext(jsc.sc)
+val topics = model.describeTopics(maxTermsPerTopic)
+sqlContext.createDataFrame(topics).toDF("terms", "termWeights")
--- End diff --

Serialize a DataFrame will trigger a Spark job, we could still use Pickle 
to serialize them without DataFrame, via `PythomMLLibAPI.dumps()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-02 Thread yu-iskw
Github user yu-iskw commented on a diff in the pull request:

https://github.com/apache/spark/pull/8643#discussion_r43658405
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.mllib.api.python
+
+import org.apache.spark.SparkContext
+import org.apache.spark.api.java.JavaSparkContext
+import org.apache.spark.mllib.clustering.LDAModel
+import org.apache.spark.mllib.linalg.Matrix
+import org.apache.spark.sql.{DataFrame, SQLContext}
+
+/**
+ * Wrapper around LDAModel to provide helper methods in Python
+ */
+private[python] class LDAModelWrapper(model: LDAModel) {
+
+  def topicsMatrix(): Matrix = model.topicsMatrix
+
+  def vocabSize(): Int = model.vocabSize
+
+  def describeTopics(jsc: JavaSparkContext): DataFrame = 
describeTopics(this.model.vocabSize, jsc)
+
+  def describeTopics(maxTermsPerTopic: Int, jsc: JavaSparkContext): 
DataFrame = {
+// Since the return value of `describeTopics` is a little complicated,
+// it is converted into `Row` to take advantage of DataFrame 
serialization.
+val sqlContext = new SQLContext(jsc.sc)
+val topics = model.describeTopics(maxTermsPerTopic)
+sqlContext.createDataFrame(topics).toDF("terms", "termWeights")
--- End diff --

@davies thanks for the comment. Should we rather `PythonMLlibAPI.dmups()` 
than Java Any types like below?

https://github.com/yu-iskw/spark/commit/e1c66d050f7c4edbe1bf4e3b57b145cc62c23630#diff-71f42172be0b5fc14827b7bb31f4e80bR34


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-02 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/8643#discussion_r43720867
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.mllib.api.python
+
+import scala.collection.JavaConverters
+
+import org.apache.spark.SparkContext
+import org.apache.spark.mllib.clustering.LDAModel
+import org.apache.spark.mllib.linalg.Matrix
+
+/**
+ * Wrapper around LDAModel to provide helper methods in Python
+ */
+private[python] class LDAModelWrapper(model: LDAModel) {
+
+  def topicsMatrix(): Matrix = model.topicsMatrix
+
+  def vocabSize(): Int = model.vocabSize
+
+  def describeTopics(): java.util.List[Array[Any]] = 
describeTopics(this.model.vocabSize)
+
+  def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = {
+
+val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, 
termWeights) =>
+Array.empty[Any] ++ terms ++ termWeights
+  }.toSeq
+JavaConverters.seqAsJavaListConverter(seq).asJava
--- End diff --

If the result is large, each element in the list will need an RPC call to 
get the data into Python, I'd prefer serialize the entire list in JVM first, 
then return Array[Byte], deserialize in Python.

Could you benchmark the difference for larger `k`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-02 Thread yu-iskw
Github user yu-iskw commented on a diff in the pull request:

https://github.com/apache/spark/pull/8643#discussion_r43722360
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.mllib.api.python
+
+import scala.collection.JavaConverters
+
+import org.apache.spark.SparkContext
+import org.apache.spark.mllib.clustering.LDAModel
+import org.apache.spark.mllib.linalg.Matrix
+
+/**
+ * Wrapper around LDAModel to provide helper methods in Python
+ */
+private[python] class LDAModelWrapper(model: LDAModel) {
+
+  def topicsMatrix(): Matrix = model.topicsMatrix
+
+  def vocabSize(): Int = model.vocabSize
+
+  def describeTopics(): java.util.List[Array[Any]] = 
describeTopics(this.model.vocabSize)
+
+  def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = {
+
+val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, 
termWeights) =>
+Array.empty[Any] ++ terms ++ termWeights
+  }.toSeq
+JavaConverters.seqAsJavaListConverter(seq).asJava
--- End diff --

@davies thank you for tha comment. Could you please give me more 
information about the type conversion?

How do we deserialize the return value in Python? Especially, I don't know 
how to deserialize `scala.Tuple2` in Python. I tried to serialize the entire 
list with `SerDe.dumps()` at `LDAModelWrapper.describeTopics()`, and then 
confirm the return value in Python. 

https://github.com/yu-iskw/spark/commit/54e5fda86ac3f3bac76b165615a0231d88a717aa

The testing result is as follow:
```
**
Failed example:
model.describeTopics()
Expected:
[([1, 0], [0.5..., 0.49...]), ([0, 1], [0.5..., 0.49...])]
Got:
({u'__class__': u'scala.Tuple2'}, {u'__class__': u'scala.Tuple2'})
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-02 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/8643#discussion_r43720929
  
--- Diff: python/pyspark/mllib/clustering.py ---
@@ -690,6 +690,21 @@ class LDAModel(JavaModelWrapper):
 >>> model = LDA.train(rdd, k=2)
 >>> model.vocabSize()
 2
+>>> topics = model.describeTopics()
+>>> len(topics)
+2
+>>> len(list(topics[0])[0])
+2
+>>> len(list(topics[0])[1])
+2
+>>> topics = model.describeTopics(1)
--- End diff --

One test case should be enough.

These test will become example in docs, so it's better to be more readable, 
for example:
```
>>> list(sorted(model.describeTopics()))
[([0, 1], [0.50..., 0.49...]), ([1, 0], [0.50..., 0.49...])]
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-02 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/8643#discussion_r43720248
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.mllib.api.python
+
+import scala.collection.JavaConverters
+
+import org.apache.spark.SparkContext
+import org.apache.spark.mllib.clustering.LDAModel
+import org.apache.spark.mllib.linalg.Matrix
+
+/**
+ * Wrapper around LDAModel to provide helper methods in Python
+ */
+private[python] class LDAModelWrapper(model: LDAModel) {
+
+  def topicsMatrix(): Matrix = model.topicsMatrix
+
+  def vocabSize(): Int = model.vocabSize
+
+  def describeTopics(): java.util.List[Array[Any]] = 
describeTopics(this.model.vocabSize)
+
+  def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = {
+
+val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, 
termWeights) =>
+Array.empty[Any] ++ terms ++ termWeights
--- End diff --

nit: this line could be `Array[Any](terms, termWeights)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-153237999
  
**[Test build #44881 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44881/consoleFull)**
 for PR 8643 at commit 
[`0bc114e`](https://github.com/apache/spark/commit/0bc114e9fe8f61745436a1a1d49a8dac41e6db26).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-153247346
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44881/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-153247107
  
**[Test build #44881 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44881/consoleFull)**
 for PR 8643 at commit 
[`0bc114e`](https://github.com/apache/spark/commit/0bc114e9fe8f61745436a1a1d49a8dac41e6db26).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class LDAModel(JavaModelWrapper, JavaSaveable, Loader):`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-153247342
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-02 Thread yu-iskw
Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-153247670
  
I reverted DataFrame serialization to Java Any types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-152976265
  
Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-152976149
  
**[Test build #44799 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44799/consoleFull)**
 for PR 8643 at commit 
[`353a6b0`](https://github.com/apache/spark/commit/353a6b0e5b532d8d786e9ac42a6c355a2d675a85).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class LDAModel(JavaModelWrapper, JavaSaveable, Loader):`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-152976267
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44799/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-153237504
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-153237516
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-152938719
  
 Build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-152938736
  
Build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-11-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-152939997
  
**[Test build #44799 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44799/consoleFull)**
 for PR 8643 at commit 
[`353a6b0`](https://github.com/apache/spark/commit/353a6b0e5b532d8d786e9ac42a6c355a2d675a85).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-10-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-150372009
  
**[Test build #44182 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44182/consoleFull)**
 for PR 8643 at commit 
[`2f70193`](https://github.com/apache/spark/commit/2f701930677c7d359c0a706d16c2438509528b77).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-10-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-150371516
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-10-22 Thread yu-iskw
Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-150372929
  
@jkbradley sorry for the delay of my update. I tried to use DataFrame 
serialization at 
https://github.com/yu-iskw/spark/commit/2f701930677c7d359c0a706d16c2438509528b77.
 Could you review it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-10-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-150392863
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44182/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-10-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-150392759
  
**[Test build #44182 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44182/consoleFull)**
 for PR 8643 at commit 
[`2f70193`](https://github.com/apache/spark/commit/2f701930677c7d359c0a706d16c2438509528b77).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class LDAModel(JavaModelWrapper, JavaSaveable, Loader):`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-10-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-150392862
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-09-17 Thread yu-iskw
Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-140981031
  
@jkbradley thank you for the comment. Just to be sure, 
`LDAModelWrapper`.`describeTopics()` should return a DataFrame and then extract 
the return value from the DataFrame in pyspark, right?
If so, I'm not sure about that. It looks a little strange for me to use 
APIs related to DataFrame under spark.mllib. 

@davies what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-09-17 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-141212929
  
@yu-iskw I think it's OK to use DataFrame internally in spark.mllib.  It 
already has the dependency, and it would be a private API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-09-11 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-139610362
  
@yu-iskw Rather than using Java Any types and the old serialization 
patterns, would it be easier to convert to a local DataFrame?  We should be 
able to take advantage of DataFrame serialization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-138304711
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-138304729
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-09-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-138327468
  
  [Test build #42099 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42099/console)
 for   PR 8643 at commit 
[`97e78b6`](https://github.com/apache/spark/commit/97e78b6dd7dce4e9a34aae5b1de626dece40a7ac).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class LDAModel(JavaModelWrapper, JavaSaveable, Loader):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-138327571
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42099/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-138327569
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-09-07 Thread yu-iskw
GitHub user yu-iskw opened a pull request:

https://github.com/apache/spark/pull/8643

[SPARK-8467][MLlib][PySpark] Add LDAModel.describeTopics() in Python

Could @jkbradley and @davies review it?

- Create a wrapper class: `LDAModelWrapper` for `LDAModel`. Because we 
can't deal with the return value of`describeTopics` in Scala from pyspark 
directly. `Array[(Array[Int], Array[Double])]` is too complicated to convert it.
- Add `loadLDAModel` in `PythonMLlibAPI`. Since `LDAModel` in Scala is an 
abstract class and we need to call `load` of `DistributedLDAModel`.

[[SPARK-8467] Add LDAModel.describeTopics() in Python - ASF 
JIRA](https://issues.apache.org/jira/browse/SPARK-8467)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yu-iskw/spark SPARK-8467-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8643.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8643


commit f300798ca1c400452f3d5d0f48566ccea1dd1455
Author: Yu ISHIKAWA 
Date:   2015-09-07T13:32:28Z

[SPARK-8467][MLlib][PySpark] Add LDAModel.describeTopics() in Python




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-138306190
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-09-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-138306187
  
  [Test build #42096 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42096/console)
 for   PR 8643 at commit 
[`f300798`](https://github.com/apache/spark/commit/f300798ca1c400452f3d5d0f48566ccea1dd1455).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class LDAModel(JavaModelWrapper, JavaSaveable, Loader):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-138306191
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42096/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-138318657
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-138318637
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-09-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-138305856
  
  [Test build #42096 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42096/consoleFull)
 for   PR 8643 at commit 
[`f300798`](https://github.com/apache/spark/commit/f300798ca1c400452f3d5d0f48566ccea1dd1455).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...

2015-09-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8643#issuecomment-138319885
  
  [Test build #42099 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42099/consoleFull)
 for   PR 8643 at commit 
[`97e78b6`](https://github.com/apache/spark/commit/97e78b6dd7dce4e9a34aae5b1de626dece40a7ac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org