[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/6354 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-112886719 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-112886667 [Test build #35046 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35046/console) for PR 6354 at commit [`fc4dc1e`](https://github.com/apache/spark/commit/fc4dc1e8d69e0eb6803fab23e8835b9753908f3a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MatrixUDT(UserDefinedType):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-112875483 LGTM, waiting for the tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-112858102 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-112858304 [Test build #35046 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35046/consoleFull) for PR 6354 at commit [`fc4dc1e`](https://github.com/apache/spark/commit/fc4dc1e8d69e0eb6803fab23e8835b9753908f3a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-112858082 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-112857730 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-112830735 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-112830642 [Test build #35043 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35043/console) for PR 6354 at commit [`fc4dc1e`](https://github.com/apache/spark/commit/fc4dc1e8d69e0eb6803fab23e8835b9753908f3a). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MatrixUDT(UserDefinedType):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-112805307 [Test build #35043 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35043/consoleFull) for PR 6354 at commit [`fc4dc1e`](https://github.com/apache/spark/commit/fc4dc1e8d69e0eb6803fab23e8835b9753908f3a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-112803651 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-112803758 ping @davies anything left? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-112803607 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-107773521 [Test build #868 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/868/consoleFull) for PR 6354 at commit [`c940a44`](https://github.com/apache/spark/commit/c940a44191e072894289e67924922860b30e4e8d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MatrixUDT(UserDefinedType):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-107751555 [Test build #868 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/868/consoleFull) for PR 6354 at commit [`c940a44`](https://github.com/apache/spark/commit/c940a44191e072894289e67924922860b30e4e8d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-105685053 **[Test build #33533 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33533/consoleFull)** for PR 6354 at commit [`c940a44`](https://github.com/apache/spark/commit/c940a44191e072894289e67924922860b30e4e8d) after a configured wait of `150m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-105685060 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-105685061 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33533/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/6354#discussion_r31074471 --- Diff: python/pyspark/sql/dataframe.py --- @@ -224,7 +224,10 @@ def schema(self): StructType(List(StructField(age,IntegerType,true),StructField(name,StringType,true))) """ if self._schema is None: -self._schema = _parse_datatype_json_string(self._jdf.schema().json()) +try: +self._schema = _parse_datatype_json_string(self._jdf.schema().json()) +except AttributeError: +raise Exception("Unable to parse datatype from schema.") --- End diff -- Could you put something about the original exception into the message? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-105644996 [Test build #33533 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33533/consoleFull) for PR 6354 at commit [`c940a44`](https://github.com/apache/spark/commit/c940a44191e072894289e67924922860b30e4e8d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-105644587 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-105644614 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-105644210 @davies thanks. that worked. could you give a pass now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-105608870 @MechCoder I had the same problem, but it turned out that there is an outdated python/lib/pyspark.zip there. After removing it, it worked fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-105596699 Sure. from pyspark.mllib.linalg import DenseMatrix, SparseMatrix, MatrixUDT dm1 = DenseMatrix(3, 2, [0, 1, 4, 5, 9, 10]) sm1 = SparseMatrix(1, 1, [0, 1], [0], [2.0]) rdd = sc.parallelize([("dense", dm1)]) df = rdd.toDF() df.collect() I know there is something silly, but I'm not able to figure out on my own, : --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-105574591 @MechCoder Could you have a unit test to reproduce the error? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-104879867 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-104879844 [Test build #33404 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33404/consoleFull) for PR 6354 at commit [`aa9c391`](https://github.com/apache/spark/commit/aa9c3914d761c63ace177974b74d3eeb01d389e6). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MatrixUDT(UserDefinedType):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-104879869 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33404/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-104868248 [Test build #33404 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33404/consoleFull) for PR 6354 at commit [`aa9c391`](https://github.com/apache/spark/commit/aa9c3914d761c63ace177974b74d3eeb01d389e6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-104868178 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-104868180 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-104855363 > pyUDT should be defined in MatrixUDT Thanks. I am now able to create a dataframe, but when I do `df.collect` it crashes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-104790464 Could you also catch the exception from `_parse_datatype_json_string` in DataFrame.schema() and raise a different one (For example, just Exception()) ? Or AttributeError will be handled specially, causing infinite loop. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-104790085 @MechCoder pyUDT should be defined in MatrixUDT --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/6354#discussion_r30934342 --- Diff: python/pyspark/mllib/linalg.py --- @@ -163,6 +163,59 @@ def simpleString(self): return "vector" +class MatrixUDT(UserDefinedType): +""" +SQL user-defined type (UDT) for Matrix. +""" + +@classmethod +def sqlType(cls): +return StructType([ +StructField("type", ByteType(), False), +StructField("numRows", IntegerType(), False), +StructField("numCols", IntegerType(), False), +StructField("colPtrs", ArrayType(IntegerType(), False), True), +StructField("rowIndices", ArrayType(IntegerType(), False), True), +StructField("values", ArrayType(DoubleType(), False), True), +StructField("isTransposed", BooleanType(), False)]) + +@classmethod +def module(cls): +return "pyspark.mllib.linalg" + +@classmethod +def scalaUDT(cls): +return "org.apache.spark.mllib.linalg.MatrixUDT" + +def serialize(self, obj): +if isinstance(obj, SparseMatrix): +colPtrs = [int(i) for i in obj.colPtrs] +rowIndices = [int(i) for i in obj.rowIndices] +values = [float(v) for v in obj.values] +return (0, obj.numRows, obj.numCols, colPtrs, +rowIndices, values, bool(obj.isTransposed)) +elif isinstance(obj, DenseMatrix): +values = [float(v) for v in obj.values] +return (1, obj.numRows, obj.numCols, None, None, values, +bool(obj.isTransposed)) +else: +raise TypeError("cannot serialize %r of type %r" % (obj, type(obj))) --- End diff -- the `repr(obj)` could be very long, I think just having `type(obj)` is fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/6354#discussion_r30934294 --- Diff: python/pyspark/mllib/linalg.py --- @@ -163,6 +163,59 @@ def simpleString(self): return "vector" +class MatrixUDT(UserDefinedType): +""" +SQL user-defined type (UDT) for Matrix. +""" + +@classmethod +def sqlType(cls): +return StructType([ +StructField("type", ByteType(), False), +StructField("numRows", IntegerType(), False), +StructField("numCols", IntegerType(), False), +StructField("colPtrs", ArrayType(IntegerType(), False), True), +StructField("rowIndices", ArrayType(IntegerType(), False), True), +StructField("values", ArrayType(DoubleType(), False), True), +StructField("isTransposed", BooleanType(), False)]) + +@classmethod +def module(cls): +return "pyspark.mllib.linalg" + +@classmethod +def scalaUDT(cls): +return "org.apache.spark.mllib.linalg.MatrixUDT" + +def serialize(self, obj): +if isinstance(obj, SparseMatrix): +colPtrs = [int(i) for i in obj.colPtrs] +rowIndices = [int(i) for i in obj.rowIndices] +values = [float(v) for v in obj.values] --- End diff -- Do we still need `values` for SparseMatrix, that could be HUGE. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-104655938 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33341/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-104655924 [Test build #33341 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33341/consoleFull) for PR 6354 at commit [`62a2a7d`](https://github.com/apache/spark/commit/62a2a7d06aaba477b999e854c27b624123e7a006). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MatrixUDT(UserDefinedType):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-104655937 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-104635990 The serialization and deserialization works but trying to create a DataFrame using a Matrix, gives me this error, RuntimeError: maximum recursion depth exceeded in __instancecheck__ Code to replicate from pyspark.mllib.linalg import DenseMatrix, SparseMatrix, MatrixUDT dm1 = DenseMatrix(3, 2, [0, 1, 4, 5, 9, 10]) sm1 = SparseMatrix(1, 1, [0, 1], [0], [2.0]) rdd = sc.parallelize([("dense", dm1)]) rdd.toDF() This fails with the above mentioned error. cc @davies @rxin Any thoughts would be appreciated,, --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-104624457 [Test build #33341 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33341/consoleFull) for PR 6354 at commit [`62a2a7d`](https://github.com/apache/spark/commit/62a2a7d06aaba477b999e854c27b624123e7a006). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-104623765 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6390] [SQL] [MLlib] Port MatrixUDT to P...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6354#issuecomment-104623787 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org