[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-155148314 @yu-iskw Thanks for this! Quick request: Could you please send a little follow-up PR to document (in the Python doc) what is being returned? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-155175314 @jkbradley sure! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-155232384 @jkbradley I send the PR at https://github.com/apache/spark/pull/9577. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154585403 @jkbradley @davies could you review it? I modified the type conversion using `SerDe.dumps`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154632203 **[Test build #2002 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2002/consoleFull)** for PR 8643 at commit [`e91c65a`](https://github.com/apache/spark/commit/e91c65aef57ae870b0ceb0970068cefcc1231198). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154629369 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154643744 Thank you for merging it and your great support! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8643 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r44207597 --- Diff: python/pyspark/mllib/clustering.py --- @@ -773,10 +776,10 @@ def train(cls, rdd, k=10, maxIterations=20, docConcentration=-1.0, :param optimizer: LDAOptimizer used to perform the actual calculation. Currently "em", "online" are supported. Default to "em". """ -model = callMLlibFunc("trainLDAModel", rdd, k, maxIterations, - docConcentration, topicConcentration, seed, - checkpointInterval, optimizer) -return LDAModel(model) +wrapper_model = callMLlibFunc("trainLDAModel", rdd, k, maxIterations, --- End diff -- we could still call it `model` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r44207604 --- Diff: python/pyspark/mllib/clustering.py --- @@ -745,9 +749,8 @@ def load(cls, sc, path): raise TypeError("sc should be a SparkContext, got type %s" % type(sc)) if not isinstance(path, basestring): raise TypeError("path should be a basestring, got type %s" % type(path)) -java_model = sc._jvm.org.apache.spark.mllib.clustering.DistributedLDAModel.load( -sc._jsc.sc(), path) -return cls(java_model) +wrapper_model = callMLlibFunc("loadLDAModel", sc, path) --- End diff -- call it `model` for short --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154615987 LGTM, but a few minor comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154618704 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154642699 LGTM, merging this into master and 1.6 branch, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154617157 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154640453 **[Test build #2002 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2002/consoleFull)** for PR 8643 at commit [`e91c65a`](https://github.com/apache/spark/commit/e91c65aef57ae870b0ceb0970068cefcc1231198). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `class LDAModel(JavaModelWrapper, JavaSaveable, Loader):`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154640689 @davies thanks for the review. I fixed them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r44207559 --- Diff: python/pyspark/mllib/clustering.py --- @@ -687,9 +687,14 @@ class LDAModel(JavaModelWrapper): ... [2, SparseVector(2, {0: 1.0})], ... ] >>> rdd = sc.parallelize(data) ->>> model = LDA.train(rdd, k=2) +>>> model = LDA.train(rdd, k=2, seed = 1) --- End diff -- no space around `=` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r44172629 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.mllib.api.python + +import scala.collection.JavaConverters + +import org.apache.spark.SparkContext +import org.apache.spark.mllib.clustering.LDAModel +import org.apache.spark.mllib.linalg.Matrix + +/** + * Wrapper around LDAModel to provide helper methods in Python + */ +private[python] class LDAModelWrapper(model: LDAModel) { + + def topicsMatrix(): Matrix = model.topicsMatrix + + def vocabSize(): Int = model.vocabSize + + def describeTopics(): java.util.List[Array[Any]] = describeTopics(this.model.vocabSize) + + def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = { + +val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, termWeights) => +Array.empty[Any] ++ terms ++ termWeights + }.toSeq +JavaConverters.seqAsJavaListConverter(seq).asJava --- End diff -- you could return this in Java : List
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r44164309 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.mllib.api.python + +import scala.collection.JavaConverters + +import org.apache.spark.SparkContext +import org.apache.spark.mllib.clustering.LDAModel +import org.apache.spark.mllib.linalg.Matrix + +/** + * Wrapper around LDAModel to provide helper methods in Python + */ +private[python] class LDAModelWrapper(model: LDAModel) { + + def topicsMatrix(): Matrix = model.topicsMatrix + + def vocabSize(): Int = model.vocabSize + + def describeTopics(): java.util.List[Array[Any]] = describeTopics(this.model.vocabSize) + + def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = { + +val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, termWeights) => +Array.empty[Any] ++ terms ++ termWeights + }.toSeq +JavaConverters.seqAsJavaListConverter(seq).asJava --- End diff -- ping @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r44168908 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.mllib.api.python + +import scala.collection.JavaConverters + +import org.apache.spark.SparkContext +import org.apache.spark.mllib.clustering.LDAModel +import org.apache.spark.mllib.linalg.Matrix + +/** + * Wrapper around LDAModel to provide helper methods in Python + */ +private[python] class LDAModelWrapper(model: LDAModel) { + + def topicsMatrix(): Matrix = model.topicsMatrix + + def vocabSize(): Int = model.vocabSize + + def describeTopics(): java.util.List[Array[Any]] = describeTopics(this.model.vocabSize) + + def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = { + +val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, termWeights) => +Array.empty[Any] ++ terms ++ termWeights + }.toSeq +JavaConverters.seqAsJavaListConverter(seq).asJava --- End diff -- I tried to test serialization directly. It worked well. Why do we failed to serialize the `describeTopics `'s return value...? https://gist.github.com/yu-iskw/22fb83895024a29ea048 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r44169316 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.mllib.api.python + +import scala.collection.JavaConverters + +import org.apache.spark.SparkContext +import org.apache.spark.mllib.clustering.LDAModel +import org.apache.spark.mllib.linalg.Matrix + +/** + * Wrapper around LDAModel to provide helper methods in Python + */ +private[python] class LDAModelWrapper(model: LDAModel) { + + def topicsMatrix(): Matrix = model.topicsMatrix + + def vocabSize(): Int = model.vocabSize + + def describeTopics(): java.util.List[Array[Any]] = describeTopics(this.model.vocabSize) + + def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = { + +val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, termWeights) => +Array.empty[Any] ++ terms ++ termWeights + }.toSeq +JavaConverters.seqAsJavaListConverter(seq).asJava --- End diff -- I think it failed in Python 3, you may need to specify the encoding: ``` r = PickleSerializer().loads(bytes(r), encoding=encoding) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r44171965 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.mllib.api.python + +import scala.collection.JavaConverters + +import org.apache.spark.SparkContext +import org.apache.spark.mllib.clustering.LDAModel +import org.apache.spark.mllib.linalg.Matrix + +/** + * Wrapper around LDAModel to provide helper methods in Python + */ +private[python] class LDAModelWrapper(model: LDAModel) { + + def topicsMatrix(): Matrix = model.topicsMatrix + + def vocabSize(): Int = model.vocabSize + + def describeTopics(): java.util.List[Array[Any]] = describeTopics(this.model.vocabSize) + + def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = { + +val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, termWeights) => +Array.empty[Any] ++ terms ++ termWeights + }.toSeq +JavaConverters.seqAsJavaListConverter(seq).asJava --- End diff -- Thanks! Let me think about it. Because python's mllib `call` doesn't have `encoding` option. Sorry, one more thing, what is the difference between the two cases in this gist. When comparing the return value with `1`, It seems to be going well. However, when comparing with the expected value, it failed. https://gist.github.com/yu-iskw/59c66bb90d9311c0b408 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154526569 Build started sha1 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154526739 **[Test build #45250 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45250/consoleFull)** for PR 8643 at commit [`56b69f9`](https://github.com/apache/spark/commit/56b69f9b5e1d71f6850dbd70d521ca067cc4b671). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154527979 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45250/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154527959 **[Test build #45250 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45250/consoleFull)** for PR 8643 at commit [`56b69f9`](https://github.com/apache/spark/commit/56b69f9b5e1d71f6850dbd70d521ca067cc4b671). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `class LDAModel(JavaModelWrapper, JavaSaveable, Loader):`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r44185186 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.mllib.api.python + +import scala.collection.JavaConverters + +import org.apache.spark.SparkContext +import org.apache.spark.mllib.clustering.LDAModel +import org.apache.spark.mllib.linalg.Matrix + +/** + * Wrapper around LDAModel to provide helper methods in Python + */ +private[python] class LDAModelWrapper(model: LDAModel) { + + def topicsMatrix(): Matrix = model.topicsMatrix + + def vocabSize(): Int = model.vocabSize + + def describeTopics(): java.util.List[Array[Any]] = describeTopics(this.model.vocabSize) + + def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = { + +val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, termWeights) => +Array.empty[Any] ++ terms ++ termWeights + }.toSeq +JavaConverters.seqAsJavaListConverter(seq).asJava --- End diff -- Thanks! I did it!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154526527 Build triggered. sha1 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154529620 Build started sha1 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154546841 **[Test build #45258 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45258/consoleFull)** for PR 8643 at commit [`81ee096`](https://github.com/apache/spark/commit/81ee096a04fc77dd3b2940be98f9b86bfae69efd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154529543 Build triggered. sha1 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154531334 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45252/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154532750 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154537228 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user shaneknapp commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154543591 jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154544425 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154544398 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154558189 **[Test build #45258 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45258/consoleFull)** for PR 8643 at commit [`81ee096`](https://github.com/apache/spark/commit/81ee096a04fc77dd3b2940be98f9b86bfae69efd). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `class LDAModel(JavaModelWrapper, JavaSaveable, Loader):`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-154558405 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r43723078 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.mllib.api.python + +import scala.collection.JavaConverters + +import org.apache.spark.SparkContext +import org.apache.spark.mllib.clustering.LDAModel +import org.apache.spark.mllib.linalg.Matrix + +/** + * Wrapper around LDAModel to provide helper methods in Python + */ +private[python] class LDAModelWrapper(model: LDAModel) { + + def topicsMatrix(): Matrix = model.topicsMatrix + + def vocabSize(): Int = model.vocabSize + + def describeTopics(): java.util.List[Array[Any]] = describeTopics(this.model.vocabSize) + + def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = { + +val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, termWeights) => +Array.empty[Any] ++ terms ++ termWeights + }.toSeq +JavaConverters.seqAsJavaListConverter(seq).asJava --- End diff -- Before call `SerDe.dumps()`, you still need to convert the Tuple2 into `Array[Any]()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r43723787 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.mllib.api.python + +import scala.collection.JavaConverters + +import org.apache.spark.SparkContext +import org.apache.spark.mllib.clustering.LDAModel +import org.apache.spark.mllib.linalg.Matrix + +/** + * Wrapper around LDAModel to provide helper methods in Python + */ +private[python] class LDAModelWrapper(model: LDAModel) { + + def topicsMatrix(): Matrix = model.topicsMatrix + + def vocabSize(): Int = model.vocabSize + + def describeTopics(): java.util.List[Array[Any]] = describeTopics(this.model.vocabSize) + + def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = { + +val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, termWeights) => +Array.empty[Any] ++ terms ++ termWeights + }.toSeq +JavaConverters.seqAsJavaListConverter(seq).asJava --- End diff -- Got it. I'll give it a try. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r43823836 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.mllib.api.python + +import scala.collection.JavaConverters + +import org.apache.spark.SparkContext +import org.apache.spark.mllib.clustering.LDAModel +import org.apache.spark.mllib.linalg.Matrix + +/** + * Wrapper around LDAModel to provide helper methods in Python + */ +private[python] class LDAModelWrapper(model: LDAModel) { + + def topicsMatrix(): Matrix = model.topicsMatrix + + def vocabSize(): Int = model.vocabSize + + def describeTopics(): java.util.List[Array[Any]] = describeTopics(this.model.vocabSize) + + def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = { + +val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, termWeights) => +Array.empty[Any] ++ terms ++ termWeights + }.toSeq +JavaConverters.seqAsJavaListConverter(seq).asJava --- End diff -- @davies could you give me a little bit help? I tried to serialize the entire list after converting into `Array[Any](...)`. And then, when deserializing it in Python, there was something wrong with pickle : `TypeError: must be a unicode character, not bytes`. - My patch https://github.com/yu-iskw/spark/compare/SPARK-8467-2...yu-iskw:SPARK-8467-2.trial - unit testing errors https://gist.github.com/yu-iskw/60e0db67b1e222fc7fd4 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r43657655 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.mllib.api.python + +import org.apache.spark.SparkContext +import org.apache.spark.api.java.JavaSparkContext +import org.apache.spark.mllib.clustering.LDAModel +import org.apache.spark.mllib.linalg.Matrix +import org.apache.spark.sql.{DataFrame, SQLContext} + +/** + * Wrapper around LDAModel to provide helper methods in Python + */ +private[python] class LDAModelWrapper(model: LDAModel) { + + def topicsMatrix(): Matrix = model.topicsMatrix + + def vocabSize(): Int = model.vocabSize + + def describeTopics(jsc: JavaSparkContext): DataFrame = describeTopics(this.model.vocabSize, jsc) + + def describeTopics(maxTermsPerTopic: Int, jsc: JavaSparkContext): DataFrame = { +// Since the return value of `describeTopics` is a little complicated, +// it is converted into `Row` to take advantage of DataFrame serialization. +val sqlContext = new SQLContext(jsc.sc) +val topics = model.describeTopics(maxTermsPerTopic) +sqlContext.createDataFrame(topics).toDF("terms", "termWeights") --- End diff -- Serialize a DataFrame will trigger a Spark job, we could still use Pickle to serialize them without DataFrame, via `PythomMLLibAPI.dumps()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r43658405 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.mllib.api.python + +import org.apache.spark.SparkContext +import org.apache.spark.api.java.JavaSparkContext +import org.apache.spark.mllib.clustering.LDAModel +import org.apache.spark.mllib.linalg.Matrix +import org.apache.spark.sql.{DataFrame, SQLContext} + +/** + * Wrapper around LDAModel to provide helper methods in Python + */ +private[python] class LDAModelWrapper(model: LDAModel) { + + def topicsMatrix(): Matrix = model.topicsMatrix + + def vocabSize(): Int = model.vocabSize + + def describeTopics(jsc: JavaSparkContext): DataFrame = describeTopics(this.model.vocabSize, jsc) + + def describeTopics(maxTermsPerTopic: Int, jsc: JavaSparkContext): DataFrame = { +// Since the return value of `describeTopics` is a little complicated, +// it is converted into `Row` to take advantage of DataFrame serialization. +val sqlContext = new SQLContext(jsc.sc) +val topics = model.describeTopics(maxTermsPerTopic) +sqlContext.createDataFrame(topics).toDF("terms", "termWeights") --- End diff -- @davies thanks for the comment. Should we rather `PythonMLlibAPI.dmups()` than Java Any types like below? https://github.com/yu-iskw/spark/commit/e1c66d050f7c4edbe1bf4e3b57b145cc62c23630#diff-71f42172be0b5fc14827b7bb31f4e80bR34 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r43720867 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.mllib.api.python + +import scala.collection.JavaConverters + +import org.apache.spark.SparkContext +import org.apache.spark.mllib.clustering.LDAModel +import org.apache.spark.mllib.linalg.Matrix + +/** + * Wrapper around LDAModel to provide helper methods in Python + */ +private[python] class LDAModelWrapper(model: LDAModel) { + + def topicsMatrix(): Matrix = model.topicsMatrix + + def vocabSize(): Int = model.vocabSize + + def describeTopics(): java.util.List[Array[Any]] = describeTopics(this.model.vocabSize) + + def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = { + +val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, termWeights) => +Array.empty[Any] ++ terms ++ termWeights + }.toSeq +JavaConverters.seqAsJavaListConverter(seq).asJava --- End diff -- If the result is large, each element in the list will need an RPC call to get the data into Python, I'd prefer serialize the entire list in JVM first, then return Array[Byte], deserialize in Python. Could you benchmark the difference for larger `k`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r43722360 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.mllib.api.python + +import scala.collection.JavaConverters + +import org.apache.spark.SparkContext +import org.apache.spark.mllib.clustering.LDAModel +import org.apache.spark.mllib.linalg.Matrix + +/** + * Wrapper around LDAModel to provide helper methods in Python + */ +private[python] class LDAModelWrapper(model: LDAModel) { + + def topicsMatrix(): Matrix = model.topicsMatrix + + def vocabSize(): Int = model.vocabSize + + def describeTopics(): java.util.List[Array[Any]] = describeTopics(this.model.vocabSize) + + def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = { + +val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, termWeights) => +Array.empty[Any] ++ terms ++ termWeights + }.toSeq +JavaConverters.seqAsJavaListConverter(seq).asJava --- End diff -- @davies thank you for tha comment. Could you please give me more information about the type conversion? How do we deserialize the return value in Python? Especially, I don't know how to deserialize `scala.Tuple2` in Python. I tried to serialize the entire list with `SerDe.dumps()` at `LDAModelWrapper.describeTopics()`, and then confirm the return value in Python. https://github.com/yu-iskw/spark/commit/54e5fda86ac3f3bac76b165615a0231d88a717aa The testing result is as follow: ``` ** Failed example: model.describeTopics() Expected: [([1, 0], [0.5..., 0.49...]), ([0, 1], [0.5..., 0.49...])] Got: ({u'__class__': u'scala.Tuple2'}, {u'__class__': u'scala.Tuple2'}) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r43720929 --- Diff: python/pyspark/mllib/clustering.py --- @@ -690,6 +690,21 @@ class LDAModel(JavaModelWrapper): >>> model = LDA.train(rdd, k=2) >>> model.vocabSize() 2 +>>> topics = model.describeTopics() +>>> len(topics) +2 +>>> len(list(topics[0])[0]) +2 +>>> len(list(topics[0])[1]) +2 +>>> topics = model.describeTopics(1) --- End diff -- One test case should be enough. These test will become example in docs, so it's better to be more readable, for example: ``` >>> list(sorted(model.describeTopics())) [([0, 1], [0.50..., 0.49...]), ([1, 0], [0.50..., 0.49...])] ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/8643#discussion_r43720248 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/api/python/LDAModelWrapper.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.mllib.api.python + +import scala.collection.JavaConverters + +import org.apache.spark.SparkContext +import org.apache.spark.mllib.clustering.LDAModel +import org.apache.spark.mllib.linalg.Matrix + +/** + * Wrapper around LDAModel to provide helper methods in Python + */ +private[python] class LDAModelWrapper(model: LDAModel) { + + def topicsMatrix(): Matrix = model.topicsMatrix + + def vocabSize(): Int = model.vocabSize + + def describeTopics(): java.util.List[Array[Any]] = describeTopics(this.model.vocabSize) + + def describeTopics(maxTermsPerTopic: Int): java.util.List[Array[Any]] = { + +val seq = model.describeTopics(maxTermsPerTopic).map { case (terms, termWeights) => +Array.empty[Any] ++ terms ++ termWeights --- End diff -- nit: this line could be `Array[Any](terms, termWeights)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-153237999 **[Test build #44881 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44881/consoleFull)** for PR 8643 at commit [`0bc114e`](https://github.com/apache/spark/commit/0bc114e9fe8f61745436a1a1d49a8dac41e6db26). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-153247346 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44881/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-153247107 **[Test build #44881 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44881/consoleFull)** for PR 8643 at commit [`0bc114e`](https://github.com/apache/spark/commit/0bc114e9fe8f61745436a1a1d49a8dac41e6db26). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `class LDAModel(JavaModelWrapper, JavaSaveable, Loader):`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-153247342 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-153247670 I reverted DataFrame serialization to Java Any types. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-152976265 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-152976149 **[Test build #44799 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44799/consoleFull)** for PR 8643 at commit [`353a6b0`](https://github.com/apache/spark/commit/353a6b0e5b532d8d786e9ac42a6c355a2d675a85). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_:\n * `class LDAModel(JavaModelWrapper, JavaSaveable, Loader):`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-152976267 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44799/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-153237504 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-153237516 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-152938719 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-152938736 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-152939997 **[Test build #44799 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44799/consoleFull)** for PR 8643 at commit [`353a6b0`](https://github.com/apache/spark/commit/353a6b0e5b532d8d786e9ac42a6c355a2d675a85). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-150372009 **[Test build #44182 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44182/consoleFull)** for PR 8643 at commit [`2f70193`](https://github.com/apache/spark/commit/2f701930677c7d359c0a706d16c2438509528b77). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-150371516 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-150372929 @jkbradley sorry for the delay of my update. I tried to use DataFrame serialization at https://github.com/yu-iskw/spark/commit/2f701930677c7d359c0a706d16c2438509528b77. Could you review it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-150392863 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44182/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-150392759 **[Test build #44182 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44182/consoleFull)** for PR 8643 at commit [`2f70193`](https://github.com/apache/spark/commit/2f701930677c7d359c0a706d16c2438509528b77). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `class LDAModel(JavaModelWrapper, JavaSaveable, Loader):`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-150392862 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-140981031 @jkbradley thank you for the comment. Just to be sure, `LDAModelWrapper`.`describeTopics()` should return a DataFrame and then extract the return value from the DataFrame in pyspark, right? If so, I'm not sure about that. It looks a little strange for me to use APIs related to DataFrame under spark.mllib. @davies what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-141212929 @yu-iskw I think it's OK to use DataFrame internally in spark.mllib. It already has the dependency, and it would be a private API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-139610362 @yu-iskw Rather than using Java Any types and the old serialization patterns, would it be easier to convert to a local DataFrame? We should be able to take advantage of DataFrame serialization. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-138304711 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-138304729 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-138327468 [Test build #42099 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42099/console) for PR 8643 at commit [`97e78b6`](https://github.com/apache/spark/commit/97e78b6dd7dce4e9a34aae5b1de626dece40a7ac). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class LDAModel(JavaModelWrapper, JavaSaveable, Loader):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-138327571 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42099/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-138327569 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
GitHub user yu-iskw opened a pull request: https://github.com/apache/spark/pull/8643 [SPARK-8467][MLlib][PySpark] Add LDAModel.describeTopics() in Python Could @jkbradley and @davies review it? - Create a wrapper class: `LDAModelWrapper` for `LDAModel`. Because we can't deal with the return value of`describeTopics` in Scala from pyspark directly. `Array[(Array[Int], Array[Double])]` is too complicated to convert it. - Add `loadLDAModel` in `PythonMLlibAPI`. Since `LDAModel` in Scala is an abstract class and we need to call `load` of `DistributedLDAModel`. [[SPARK-8467] Add LDAModel.describeTopics() in Python - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8467) You can merge this pull request into a Git repository by running: $ git pull https://github.com/yu-iskw/spark SPARK-8467-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8643.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8643 commit f300798ca1c400452f3d5d0f48566ccea1dd1455 Author: Yu ISHIKAWADate: 2015-09-07T13:32:28Z [SPARK-8467][MLlib][PySpark] Add LDAModel.describeTopics() in Python --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-138306190 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-138306187 [Test build #42096 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42096/console) for PR 8643 at commit [`f300798`](https://github.com/apache/spark/commit/f300798ca1c400452f3d5d0f48566ccea1dd1455). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class LDAModel(JavaModelWrapper, JavaSaveable, Loader):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-138306191 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42096/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-138318657 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-138318637 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-138305856 [Test build #42096 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42096/consoleFull) for PR 8643 at commit [`f300798`](https://github.com/apache/spark/commit/f300798ca1c400452f3d5d0f48566ccea1dd1455). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8467][MLlib][PySpark] Add LDAModel.desc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8643#issuecomment-138319885 [Test build #42099 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42099/consoleFull) for PR 8643 at commit [`97e78b6`](https://github.com/apache/spark/commit/97e78b6dd7dce4e9a34aae5b1de626dece40a7ac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org