[GitHub] spark issue #17732: Branch 2.0
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17732 ping @tangchun --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17737 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76095/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17737 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17737 **[Test build #76095 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76095/testReport)** for PR 17737 at commit [`2815ff1`](https://github.com/apache/spark/commit/2815ff167b0ce9f6e0d2d6ae9f3d4fb0f3ce94d2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17736 cc @hvanhovell for review ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17739: [SPARK-20443][MLLIB][ML] set ALS blockify size
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17739 **[Test build #76096 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76096/testReport)** for PR 17739 at commit [`78e060e`](https://github.com/apache/spark/commit/78e060e3455ecdc95fdedb6adccc0a375188e2d5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17222 @zjffdu - how would you feel about putting the return back, and just plumbing it through as required? It seems like it would be useful to have users able to programmatically do this (I find my self effectively doing this in some of my own personnel notebooks)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17739: [SPARK-20443][MLLIB][ML] set ALS blockify size
GitHub user mpjlu opened a pull request: https://github.com/apache/spark/pull/17739 [SPARK-20443][MLLIB][ML] set ALS blockify size ## What changes were proposed in this pull request? The blockSize of MLLIB ALS is very important for ALS performance. In our test, when the blockSize is 128, the performance is about 4X comparing with the blockSize is 4096 (default value). The following are our test results: BlockSize(recommendationForAll time) 128(124s), 256(160s), 512(184s), 1024(244s), 2048(332s), 4096(488s), 8192(OOM) The Test Environment: 3 workers: each work 10 core, each work 30G memory, each work 1 executor. The Data: User 48W, and Item 1.7W ## How was this patch tested? The existing UT You can merge this pull request into a Git repository by running: $ git pull https://github.com/mpjlu/spark setAlsBlockSize Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17739.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17739 commit 78e060e3455ecdc95fdedb6adccc0a375188e2d5 Author: Peng Date: 2017-04-24T05:01:13Z set ALS blockify size --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17736 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76094/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17736 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17736 **[Test build #76094 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76094/testReport)** for PR 17736 at commit [`f295782`](https://github.com/apache/spark/commit/f29578219d6eebc9913c359a360ff9eafcb513fc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17737 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17737 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76093/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17737 **[Test build #76093 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76093/testReport)** for PR 17737 at commit [`af8ac74`](https://github.com/apache/spark/commit/af8ac74b624d54b16339083319e33e8af098655e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17737 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76092/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17737 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17737 **[Test build #76092 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76092/testReport)** for PR 17737 at commit [`bb5de1f`](https://github.com/apache/spark/commit/bb5de1f2ef66a4775c8d8bc4f632535d45b3f0b4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/17736 LGTM. Thanks. @cloud-fan @rxin this fixes our production jobs when we port our applications from 1.6 to 2.0. I think it's a important bug fix. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17736 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76091/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17736 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17736 **[Test build #76091 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76091/testReport)** for PR 17736 at commit [`a0f4a13`](https://github.com/apache/spark/commit/a0f4a13763c077e57c2dcb5fff12d81f3bb2ceb9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 I scanned split critical of sklearn and xgboost. 1. sklearn count all continuous values and split at mean value. commit 5147fd09c6a063188efde444f47bd006fa5f95f0 sklearn/tree/_splitter.pyx: 484: ```python current.threshold = (Xf[p - 1] + Xf[p]) / 2.0 ``` 2. xgboost: commit 49bdb5c97fccd81b1fdf032eab4599a065c6c4f6 + If all continuous values are used as candidate, it uses mean value. src/tree/updater_colmaker.cc: 555: ```c++ e.best.Update(loss_chg, fid, (fvalue + e.last_fvalue) * 0.5f, d_step == -1); ``` + If continuous feature are quantized, it uses `cut`. I'm not familiar with C++ and update_histmaker.cc is a little complicate, hence I don't know what is `cut` indeed. However, it should be the same with current spark's split critical, I guess. src/tree/updater_histmaker.cc: 194: ```c++ if (best->Update(static_cast(loss_chg), fid, hist.cut[i], false)) { ``` Anyway, weighted mean is more reasonable than mean or cut value in my option. And the PR is trivial enhancement for tree module, and it's not worth to spend much time because of obvious conclusion. However, we will be more confident if more feedback of experts are collected. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17737 **[Test build #76095 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76095/testReport)** for PR 17737 at commit [`2815ff1`](https://github.com/apache/spark/commit/2815ff167b0ce9f6e0d2d6ae9f3d4fb0f3ce94d2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17738: [SPARK-20422][Spark Core] Worker registration retries sh...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17738 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17738: [SPARK-20422][Spark Core] Worker registration ret...
GitHub user unsleepy22 opened a pull request: https://github.com/apache/spark/pull/17738 [SPARK-20422][Spark Core] Worker registration retries should be configurable ## What changes were proposed in this pull request? make prolonged registration retries configurable ## How was this patch tested? unit tests, integration tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/unsleepy22/spark SPARK-20422 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17738.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17738 commit 8bb2d4a37d4db8d8e9c78c41de3328ada30ea693 Author: Cody Date: 2017-04-24T04:02:43Z make prolonged registration retries configurable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17737 cc @srowen, @holdenk, @felixcheung, @map222 and @zero323 who were in related PRs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r112861995 --- Diff: python/pyspark/sql/column.py --- @@ -337,26 +381,39 @@ def isin(self, *cols): return Column(jc) # order -asc = _unary_op("asc", "Returns a sort expression based on the" - " ascending order of the given column name.") -desc = _unary_op("desc", "Returns a sort expression based on the" - " descending order of the given column name.") +_asc_doc = """ +Returns a sort expression based on the ascending order of the given column name + +>>> from pyspark.sql import Row +>>> df2 = spark.createDataFrame([Row(name=u'Tom', height=80), Row(name=u'Alice', height=None)]) +>>> df2.select(df2.name).orderBy(df2.name.asc()).collect() +[Row(name=u'Alice'), Row(name=u'Tom')] +""" +_desc_doc = """ +Returns a sort expression based on the descending order of the given column name. + +>>> from pyspark.sql import Row +>>> df2 = spark.createDataFrame([Row(name=u'Tom', height=80), Row(name=u'Alice', height=None)]) +>>> df2.select(df2.name).orderBy(df2.name.desc()).collect() +[Row(name=u'Tom'), Row(name=u'Alice')] +""" + +asc = ignore_unicode_prefix(_unary_op("asc", _asc_doc)) +desc = ignore_unicode_prefix(_unary_op("desc", _desc_doc)) _isNull_doc = """ -True if the current expression is null. Often combined with -:func:`DataFrame.filter` to select rows with null values. --- End diff -- `Often combined with :func:`DataFrame.filter` to select rows with null values.` was removed because it looks applying to many other APIs and look too much. It just follows Scala one now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r112861905 --- Diff: python/pyspark/sql/column.py --- @@ -185,17 +185,52 @@ def __contains__(self, item): "in a string column or 'array_contains' function for an array column.") # bitwise operators -bitwiseOR = _bin_op("bitwiseOR") -bitwiseAND = _bin_op("bitwiseAND") -bitwiseXOR = _bin_op("bitwiseXOR") +_bitwiseOR_doc = """ +Compute bitwise OR of this expression with another expression. + +:param other: a value or :class:`Column` to calculate bitwise or(|) against + this :class:`Column`. + +>>> from pyspark.sql import Row +>>> df3 = spark.createDataFrame([Row(a=170, b=75)]) +>>> df3.select(df3.a.bitwiseOR(df3.b)).collect() +[Row((a | b)=235)] +""" + +_bitwiseAND_doc = """ +Compute bitwise AND of this expression with another expression. + +:param other: a value or :class:`Column` to calculate bitwise and(&) against + this :class:`Column`. + +>>> from pyspark.sql import Row +>>> df3 = spark.createDataFrame([Row(a=170, b=75)]) +>>> df3.select(df3.a.bitwiseAND(df3.b)).collect() +[Row((a & b)=10)] +""" + +_bitwiseXOR_doc = """ +Compute bitwise XOR of this expression with another expression. + +:param other: a value or :class:`Column` to calculate bitwise xor(^) against + this :class:`Column`. + +>>> from pyspark.sql import Row +>>> df3 = spark.createDataFrame([Row(a=170, b=75)]) +>>> df3.select(df3.a.bitwiseXOR(df3.b)).collect() +[Row((a ^ b)=225)] +""" --- End diff -- This is matched with Scala one. > Compute bitwise XOR of this expression with another expression --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r112861876 --- Diff: python/pyspark/sql/column.py --- @@ -185,17 +185,52 @@ def __contains__(self, item): "in a string column or 'array_contains' function for an array column.") # bitwise operators -bitwiseOR = _bin_op("bitwiseOR") -bitwiseAND = _bin_op("bitwiseAND") -bitwiseXOR = _bin_op("bitwiseXOR") +_bitwiseOR_doc = """ +Compute bitwise OR of this expression with another expression. + +:param other: a value or :class:`Column` to calculate bitwise or(|) against + this :class:`Column`. + +>>> from pyspark.sql import Row +>>> df3 = spark.createDataFrame([Row(a=170, b=75)]) +>>> df3.select(df3.a.bitwiseOR(df3.b)).collect() +[Row((a | b)=235)] +""" --- End diff -- This is matched with Scala one. > Compute bitwise OR of this expression with another expression --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17736 **[Test build #76094 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76094/testReport)** for PR 17736 at commit [`f295782`](https://github.com/apache/spark/commit/f29578219d6eebc9913c359a360ff9eafcb513fc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r112861887 --- Diff: python/pyspark/sql/column.py --- @@ -185,17 +185,52 @@ def __contains__(self, item): "in a string column or 'array_contains' function for an array column.") # bitwise operators -bitwiseOR = _bin_op("bitwiseOR") -bitwiseAND = _bin_op("bitwiseAND") -bitwiseXOR = _bin_op("bitwiseXOR") +_bitwiseOR_doc = """ +Compute bitwise OR of this expression with another expression. + +:param other: a value or :class:`Column` to calculate bitwise or(|) against + this :class:`Column`. + +>>> from pyspark.sql import Row +>>> df3 = spark.createDataFrame([Row(a=170, b=75)]) +>>> df3.select(df3.a.bitwiseOR(df3.b)).collect() +[Row((a | b)=235)] +""" + +_bitwiseAND_doc = """ +Compute bitwise AND of this expression with another expression. + +:param other: a value or :class:`Column` to calculate bitwise and(&) against + this :class:`Column`. + +>>> from pyspark.sql import Row +>>> df3 = spark.createDataFrame([Row(a=170, b=75)]) +>>> df3.select(df3.a.bitwiseAND(df3.b)).collect() +[Row((a & b)=10)] +""" --- End diff -- This is matched with Scala one. > Compute bitwise AND of this expression with another expression --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r112860925 --- Diff: python/pyspark/sql/column.py --- @@ -185,17 +185,52 @@ def __contains__(self, item): "in a string column or 'array_contains' function for an array column.") # bitwise operators -bitwiseOR = _bin_op("bitwiseOR") -bitwiseAND = _bin_op("bitwiseAND") -bitwiseXOR = _bin_op("bitwiseXOR") +_bitwiseOR_doc = """ +Compute bitwise OR of this expression with another expression. + +:param other: a value or :class:`Column` to calculate bitwise or(|) against + this :class:`Column`. + +>>> from pyspark.sql import Row +>>> df3 = spark.createDataFrame([Row(a=170, b=75)]) +>>> df3.select(df3.a.bitwiseOR(df3.b)).collect() +[Row((a | b)=235)] +""" + +_bitwiseAND_doc = """ +Compute bitwise AND of this expression with another expression. + +:param other: a value or :class:`Column` to calculate bitwise and(&) against + this :class:`Column`. + +>>> from pyspark.sql import Row +>>> df3 = spark.createDataFrame([Row(a=170, b=75)]) +>>> df3.select(df3.a.bitwiseAND(df3.b)).collect() +[Row((a & b)=10)] +""" --- End diff -- ![2017-04-24 12 43 26](https://cloud.githubusercontent.com/assets/6477701/25321715/b64d798a-28eb-11e7-9e0f-96563c9717b4.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r112861613 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/bitwiseExpressions.scala --- @@ -86,7 +86,7 @@ case class BitwiseOr(left: Expression, right: Expression) extends BinaryArithmet } /** - * A function that calculates bitwise xor of two numbers. + * A function that calculates bitwise xor({@literal ^}) of two numbers. --- End diff -- Matching it up with `BitwiseAnd` and `BitwiseOr` where > A function that calculates bitwise and(&) of two numbers. > A function that calculates bitwise or(|) of two numbers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r112860980 --- Diff: python/pyspark/sql/column.py --- @@ -251,15 +286,16 @@ def __iter__(self): # string methods _rlike_doc = """ -Return a Boolean :class:`Column` based on a regex match. +SQL RLIKE expression (LIKE with Regex). Returns a boolean :class:`Column` based on a regex +match. :param other: an extended regex expression >>> df.filter(df.name.rlike('ice$')).collect() [Row(age=2, name=u'Alice')] """ --- End diff -- ![2017-04-24 12 44 44](https://cloud.githubusercontent.com/assets/6477701/25321726/ce6f8f62-28eb-11e7-9fbe-dc6321e00e77.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r112861744 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala --- @@ -1008,7 +1009,7 @@ class Column(val expr: Expression) extends Logging { def cast(to: String): Column = cast(CatalystSqlParser.parseDataType(to)) /** - * Returns an ordering used in sorting. + * Returns a sort expression based on the descending order of the column. --- End diff -- This and the similar instances below are matched with `functions.scala`. They look calling the same ones. > Returns a sort expression based on the descending order of the column. > Returns a sort expression based on the descending order of the column, > and null values appear before non-null values. > Returns a sort expression based on the descending order of the column, > and null values appear after non-null values. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r112861531 --- Diff: python/pyspark/sql/column.py --- @@ -337,26 +381,39 @@ def isin(self, *cols): return Column(jc) # order -asc = _unary_op("asc", "Returns a sort expression based on the" - " ascending order of the given column name.") -desc = _unary_op("desc", "Returns a sort expression based on the" - " descending order of the given column name.") +_asc_doc = """ +Returns a sort expression based on the ascending order of the given column name + +>>> from pyspark.sql import Row +>>> df2 = spark.createDataFrame([Row(name=u'Tom', height=80), Row(name=u'Alice', height=None)]) +>>> df2.select(df2.name).orderBy(df2.name.asc()).collect() +[Row(name=u'Alice'), Row(name=u'Tom')] +""" --- End diff -- ![2017-04-24 12 54 55](https://cloud.githubusercontent.com/assets/6477701/25321941/5903bdbe-28ed-11e7-8d08-5fbab1411f02.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r112861035 --- Diff: python/pyspark/sql/column.py --- @@ -288,8 +324,16 @@ def __iter__(self): >>> df.filter(df.name.endswith('ice$')).collect() [] """ +_contains_doc = """ +Contains the other element. Returns a boolean :class:`Column` based on a string match. + +:param other: string in line + +>>> df.filter(df.name.contains('o')).collect() +[Row(age=5, name=u'Bob')] +""" --- End diff -- ![2017-04-24 12 45 57](https://cloud.githubusercontent.com/assets/6477701/25321748/fba744ca-28eb-11e7-9e40-534cda541f90.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r112861532 --- Diff: python/pyspark/sql/column.py --- @@ -337,26 +381,39 @@ def isin(self, *cols): return Column(jc) # order -asc = _unary_op("asc", "Returns a sort expression based on the" - " ascending order of the given column name.") -desc = _unary_op("desc", "Returns a sort expression based on the" - " descending order of the given column name.") +_asc_doc = """ +Returns a sort expression based on the ascending order of the given column name + +>>> from pyspark.sql import Row +>>> df2 = spark.createDataFrame([Row(name=u'Tom', height=80), Row(name=u'Alice', height=None)]) +>>> df2.select(df2.name).orderBy(df2.name.asc()).collect() +[Row(name=u'Alice'), Row(name=u'Tom')] +""" +_desc_doc = """ +Returns a sort expression based on the descending order of the given column name. + +>>> from pyspark.sql import Row +>>> df2 = spark.createDataFrame([Row(name=u'Tom', height=80), Row(name=u'Alice', height=None)]) +>>> df2.select(df2.name).orderBy(df2.name.desc()).collect() +[Row(name=u'Tom'), Row(name=u'Alice')] +""" --- End diff -- ![2017-04-24 12 55 17](https://cloud.githubusercontent.com/assets/6477701/25321944/5d1dfa4a-28ed-11e7-9fa3-e8741e492b36.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r112861399 --- Diff: python/pyspark/sql/column.py --- @@ -269,17 +305,17 @@ def __iter__(self): [Row(age=2, name=u'Alice')] """ _startswith_doc = """ -Return a Boolean :class:`Column` based on a string match. +String starts with. Returns a boolean :class:`Column` based on a string match. -:param other: string at end of line (do not use a regex `^`) +:param other: string at start of line (do not use a regex `^`) >>> df.filter(df.name.startswith('Al')).collect() [Row(age=2, name=u'Alice')] >>> df.filter(df.name.startswith('^Al')).collect() [] """ _endswith_doc = """ -Return a Boolean :class:`Column` based on matching end of string. +String ends with. Returns a boolean :class:`Column` based on a string match. --- End diff -- ![2017-04-24 12 45 36](https://cloud.githubusercontent.com/assets/6477701/25321740/edfb521c-28eb-11e7-833d-975bf59091bf.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r112861566 --- Diff: python/pyspark/sql/column.py --- @@ -527,7 +584,7 @@ def _test(): .appName("sql.column tests")\ .getOrCreate() sc = spark.sparkContext -globs['sc'] = sc +globs['spark'] = spark --- End diff -- I removed `sc` and replaced it to `spark` as I think we promote this way up to my knowledge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r112860981 --- Diff: python/pyspark/sql/column.py --- @@ -269,17 +305,17 @@ def __iter__(self): [Row(age=2, name=u'Alice')] """ --- End diff -- ![2017-04-24 12 45 10](https://cloud.githubusercontent.com/assets/6477701/25321732/de5784ca-28eb-11e7-8084-ac5a26b6a5a6.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r112860906 --- Diff: python/pyspark/sql/column.py --- @@ -185,17 +185,52 @@ def __contains__(self, item): "in a string column or 'array_contains' function for an array column.") # bitwise operators -bitwiseOR = _bin_op("bitwiseOR") -bitwiseAND = _bin_op("bitwiseAND") -bitwiseXOR = _bin_op("bitwiseXOR") +_bitwiseOR_doc = """ +Compute bitwise OR of this expression with another expression. + +:param other: a value or :class:`Column` to calculate bitwise or(|) against + this :class:`Column`. + +>>> from pyspark.sql import Row +>>> df3 = spark.createDataFrame([Row(a=170, b=75)]) +>>> df3.select(df3.a.bitwiseOR(df3.b)).collect() +[Row((a | b)=235)] +""" --- End diff -- ![2017-04-24 12 43 22](https://cloud.githubusercontent.com/assets/6477701/25321711/abaf659c-28eb-11e7-9289-e548489e0b27.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r112860927 --- Diff: python/pyspark/sql/column.py --- @@ -185,17 +185,52 @@ def __contains__(self, item): "in a string column or 'array_contains' function for an array column.") # bitwise operators -bitwiseOR = _bin_op("bitwiseOR") -bitwiseAND = _bin_op("bitwiseAND") -bitwiseXOR = _bin_op("bitwiseXOR") +_bitwiseOR_doc = """ +Compute bitwise OR of this expression with another expression. + +:param other: a value or :class:`Column` to calculate bitwise or(|) against + this :class:`Column`. + +>>> from pyspark.sql import Row +>>> df3 = spark.createDataFrame([Row(a=170, b=75)]) +>>> df3.select(df3.a.bitwiseOR(df3.b)).collect() +[Row((a | b)=235)] +""" + +_bitwiseAND_doc = """ +Compute bitwise AND of this expression with another expression. + +:param other: a value or :class:`Column` to calculate bitwise and(&) against + this :class:`Column`. + +>>> from pyspark.sql import Row +>>> df3 = spark.createDataFrame([Row(a=170, b=75)]) +>>> df3.select(df3.a.bitwiseAND(df3.b)).collect() +[Row((a & b)=10)] +""" + +_bitwiseXOR_doc = """ +Compute bitwise XOR of this expression with another expression. + +:param other: a value or :class:`Column` to calculate bitwise xor(^) against + this :class:`Column`. + +>>> from pyspark.sql import Row +>>> df3 = spark.createDataFrame([Row(a=170, b=75)]) +>>> df3.select(df3.a.bitwiseXOR(df3.b)).collect() +[Row((a ^ b)=225)] +""" --- End diff -- ![2017-04-24 12 43 31](https://cloud.githubusercontent.com/assets/6477701/25321719/bac73726-28eb-11e7-829a-1675f51dd6b6.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17737 **[Test build #76093 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76093/testReport)** for PR 17737 at commit [`af8ac74`](https://github.com/apache/spark/commit/af8ac74b624d54b16339083319e33e8af098655e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17649: [SPARK-20380][SQL] Output table comment for DESC FORMATT...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/17649 @gatorsmile Hive treats comment simply as a key in the string-string parameter map, while spark extracts comment from the map as a field in `CatalogTable`. So the question is, should spark consider both `comment` and `COMMENT` as table comment? Here's the results of hive: ``` 0: jdbc:hive2://.../> create table src (key int , value string) comment "initial comment"; No rows affected (1.055 seconds) 0: jdbc:hive2://.../> desc formatted src; +---+---+---+--+ | col_name| data_type |comment| +---+---+---+--+ | # col_name| data_type | comment | | | NULL | NULL | | key | int | | | value | string | | | | NULL | NULL | | # Detailed Table Information | NULL | NULL | | Database: | wzh | NULL | | Owner:| spark | NULL | | CreateTime: | Mon Apr 24 11:43:40 CST 2017 | NULL | | LastAccessTime: | UNKNOWN | NULL | | Protect Mode: | None | NULL | | Retention:| 0 | NULL | | Location: | hdfs://hacluster/user/hive/warehouse/wzh.db/src | NULL | | Table Type: | MANAGED_TABLE | NULL | | Table Parameters: | NULL | NULL | | | comment | initial comment | | | transient_lastDdlTime | 1493005420| | | NULL | NULL | | # Storage Information | NULL | NULL | | SerDe Library:| org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe | NULL | | InputFormat: | org.apache.hadoop.hive.ql.io.RCFileInputFormat| NULL | | OutputFormat: | org.apache.hadoop.hive.ql.io.RCFileOutputFormat | NULL | | Compressed: | No | NULL | | Num Buckets: | -1 | NULL | | Bucket Columns: | [] | NULL | | Sort Columns: | [] | NULL | | Storage Desc Params: | NULL | NULL | | | serialization.format | 1 | +---+---+---+--+ 28 rows selected (0.525 seconds) 0: jdbc:hive2://.../> alter table src set tblproperties("comment"="new comment", "COMMENT"="NEW COMMENT"); No rows affected (0.62 seconds) 0: jdbc:hive2://.../> desc formatted src; +---+---+---+--+ | col_name| data_type |comment| +---+---+---+--+ | # col_name
[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17737 **[Test build #76092 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76092/testReport)** for PR 17737 at commit [`bb5de1f`](https://github.com/apache/spark/commit/bb5de1f2ef66a4775c8d8bc4f632535d45b3f0b4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/17737 [SPARK-20442][PYTHON][DOCS] Fill up documentations for functions in Column API in PySpark ## What changes were proposed in this pull request? This PR proposes to fill up the documentation with examples for `bitwiseOR`, `bitwiseAND`, `bitwiseXOR`. `contains`, `asc`, `desc` in `Column` API. Also, this PR fixes minor types in the documentation and matches some of the contents between Scala doc and Python doc. Lastly, this PR suggest to use `spark` rather than `sc` in doc tests. ## How was this patch tested? Doc tests were added and manually tested with the commands below: `./python/run-tests.py --module pyspark-sql` `./dev/lint-python` Output was checked via `make html` under `./python/docs`. The snapshots will be left on the codes with comments. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-20442 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17737.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17737 commit bb5de1f2ef66a4775c8d8bc4f632535d45b3f0b4 Author: hyukjinkwon Date: 2017-04-24T01:48:06Z Fill up documentations for functions in Column API in PySpark --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17480: [SPARK-20079][Core][yarn] Re registration of AM hangs sp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17480 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76089/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17480: [SPARK-20079][Core][yarn] Re registration of AM hangs sp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17480 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17480: [SPARK-20079][Core][yarn] Re registration of AM hangs sp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17480 **[Test build #76089 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76089/testReport)** for PR 17480 at commit [`d3e69cf`](https://github.com/apache/spark/commit/d3e69cf66d77ba02cfa13e8e27273e59248885f1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15125 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76088/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15125 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15125 **[Test build #76088 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76088/testReport)** for PR 15125 at commit [`ec62659`](https://github.com/apache/spark/commit/ec6265986cb91585c0a6fdbc0c9675ec9fbba613). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17736 **[Test build #76091 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76091/testReport)** for PR 17736 at commit [`a0f4a13`](https://github.com/apache/spark/commit/a0f4a13763c077e57c2dcb5fff12d81f3bb2ceb9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17736 Let's see if it breaks any existing tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17736 cc @dbtsai @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17736: [SPARK-20399][SQL][WIP] Can't use same regex patt...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/17736 [SPARK-20399][SQL][WIP] Can't use same regex pattern between 1.6 and 2.x due to unescaped sql string in parser ## What changes were proposed in this pull request? The new SQL parser is introduced into Spark 2.0. Seems it bring an issue regarding the regex pattern string. The following codes can reproduce it: val data = Seq("\u0020\u0021\u0023", "abc") val df = data.toDF() // 1st usage: let parser parse pattern string: works in 1.6 val rlike1 = df.filter("value rlike '^\\x20[\\x20-\\x23]+$'") // 2nd usage: call Column.rlike so the pattern string is a literal which doesn't go through parser val rlike2 = df.filter($"value".rlike("^\\x20[\\x20-\\x23]+$")) // 2: works in 1.6, 2.x // To make 1st usage work, we need to add backslashes like this in 2.x: val rlike3 = df.filter("value rlike '^x20[x20-x23]+$'") Due to unescaping SQL String in parser, the first usage working in 1.6 can't work in 2.0. To make it work, we need to add additional backslashes. It is quite weird that we can't use the same regex pattern string in the 2 usages. We should not do unescaping on regex pattern string. ## How was this patch tested? Jenkins tests. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 rlike-regex Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17736.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17736 commit a0f4a13763c077e57c2dcb5fff12d81f3bb2ceb9 Author: Liang-Chi Hsieh Date: 2017-04-19T01:49:47Z Don't unescape regex pattern string. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17708: [SPARK-20413] Add new query hint NO_COLLAPSE.
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17708 I have the same question as Reynold asked in the mailing list. Doesn't common sub expression elimination already address this issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17540 This PR will change the Spark UI. For a simple query `Seq(1 -> "a").toDF("i", "j").write.parquet("/tmp/a")`, previously the SQL tab of Spark UI will show https://cloud.githubusercontent.com/assets/3182036/25320581/fd74467e-28da-11e7-80ec-efb4af8a2cdb.png";> After this PR it's https://cloud.githubusercontent.com/assets/3182036/25320591/116864a8-28db-11e7-9115-cf0bac552fdf.png";> I'm not sure which one is better, it depends on how users expect the Spark SQL UI for a write operation. cc @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17733: [SPARK-20425][SQL] Support an extended display mode for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17733 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76087/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17733: [SPARK-20425][SQL] Support an extended display mode for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17733 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17733: [SPARK-20425][SQL] Support an extended display mode for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17733 **[Test build #76087 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76087/testReport)** for PR 17733 at commit [`ca8bfbd`](https://github.com/apache/spark/commit/ca8bfbd4f55962773b037c804f827d4f06d95cdd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17623: [SPARK-20292][SQL] Clean up string representation...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17623#discussion_r112856025 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala --- @@ -111,6 +111,11 @@ case class GetStructField(child: Expression, ordinal: Int, name: Option[String] override def dataType: DataType = childSchema(ordinal).dataType override def nullable: Boolean = child.nullable || childSchema(ordinal).nullable + override def verboseString: String = { --- End diff -- We rarely call directly `Expression.verboseString`. It is mostly called by `treeString` to show individual nodes in this tree. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17623: [SPARK-20292][SQL] Clean up string representation...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17623#discussion_r112854488 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala --- @@ -111,6 +111,11 @@ case class GetStructField(child: Expression, ordinal: Int, name: Option[String] override def dataType: DataType = childSchema(ordinal).dataType override def nullable: Boolean = child.nullable || childSchema(ordinal).nullable + override def verboseString: String = { --- End diff -- I don't think the `verboseString` here provides better string representation than `toString`, when will we call `verboseString`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17728 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17728 **[Test build #76090 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76090/testReport)** for PR 17728 at commit [`a320327`](https://github.com/apache/spark/commit/a3203272c5ce9dc1a9f923180dcfe00e6665d102). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17728 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76090/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17730: [SPARK-20439] [SQL] Fix Catalog API listTables an...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17730#discussion_r112854341 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala --- @@ -197,7 +211,11 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog { * `AnalysisException` when no `Table` can be found. */ override def getTable(dbName: String, tableName: String): Table = { -makeTable(TableIdentifier(tableName, Option(dbName))) +if (tableExists(dbName, tableName)) { + makeTable(TableIdentifier(tableName, Option(dbName))) +} else { + throw new AnalysisException(s"Table or view '$tableName' not found in database '$dbName'") --- End diff -- The doc says `This throws an AnalysisException when no Table can be found.`, I think we should not change this behavior --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17729#discussion_r112853834 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -1546,6 +1546,40 @@ test_that("string operators", { expect_equal(collect(select(df3, substring_index(df3$a, ".", 2)))[1, 1], "a.b") expect_equal(collect(select(df3, substring_index(df3$a, ".", -3)))[1, 1], "b.c.d") expect_equal(collect(select(df3, translate(df3$a, "bc", "12")))[1, 1], "a.1.2.d") + + l4 <- list(list(a = "a.b@c.d 1\\b")) + df4 <- createDataFrame(l4) + expect_equal( +collect(select(df4, split_string(df4$a, "\\s+")))[1, 1], +list(list("a.b@c.d", "1\\b")) + ) + expect_equal( +collect(select(df4, split_string(df4$a, "\\.")))[1, 1], +list(list("a", "b@c", "d 1\\b")) + ) + expect_equal( +collect(select(df4, split_string(df4$a, "@")))[1, 1], +list(list("a.b", "c.d 1\\b")) + ) + expect_equal( +collect(select(df4, split_string(df4$a, "")))[1, 1], +list(list("a.b@c.d 1", "b")) + ) + + l5 <- list(list(a = "abc")) + df5 <- createDataFrame(l5) + expect_equal( +collect(select(df5, repeat_string(df5$a, 1L)))[1, 1], +"abc" + ) + expect_equal( +collect(select(df5, repeat_string(df5$a, 3)))[1, 1], +"abcabcabc" + ) + expect_equal( +collect(select(df5, repeat_string(df5$a, -1)))[1, 1], --- End diff -- Right? I think we should keep it this way to avoid any confusion when users switch between SQL and DSL. If anything changes it will cause test failure and then we can add R side checks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17729#discussion_r112853719 --- Diff: R/pkg/R/functions.R --- @@ -3745,3 +3745,55 @@ setMethod("collect_set", jc <- callJStatic("org.apache.spark.sql.functions", "collect_set", x@jc) column(jc) }) + +#' split_string +#' +#' Splits string on regular expression. +#' +#' @param x Column to compute on +#' @param pattern Java regular expression +#' +#' @rdname split_string +#' @family string_funcs +#' @aliases split_string,Column-method +#' @export +#' @examples \dontrun{ +#' df <- read.text("README.md") +#' +#' head(select(split_string(df$value, "\\s+"))) +#' } +#' @note split_string 2.3.0 +#' @note equivalent to \code{split} SQL function +setMethod("split_string", + signature(x = "Column", pattern = "character"), + function(x, pattern) { +jc <- callJStatic("org.apache.spark.sql.functions", "split", x@jc, pattern) +column(jc) + }) + +#' repeat_string +#' +#' Repeats string n times. +#' +#' @param x Column to compute on +#' @param n Number of repetitions +#' +#' @rdname repeat_string +#' @family string_funcs +#' @aliases repeat_string,Column-method +#' @export +#' @examples \dontrun{ +#' df <- createDataFame(data.frame( +#' text = c("foo", "bar") +#' )) +#' +#' head(select(repeat_string(df$text, 3))) +#' } +#' @note repeat_string 2.3.0 +#' @note equivalent to \code{repeat} SQL function +setMethod("repeat_string", + signature(x = "Column", n = "numeric"), + function(x, n) { +jc <- callJStatic("org.apache.spark.sql.functions", "repeat", x@jc, as.integer(n)) --- End diff -- That's useful. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17729#discussion_r112853686 --- Diff: R/pkg/R/functions.R --- @@ -3745,3 +3745,55 @@ setMethod("collect_set", jc <- callJStatic("org.apache.spark.sql.functions", "collect_set", x@jc) column(jc) }) + +#' split_string +#' +#' Splits string on regular expression. +#' +#' @param x Column to compute on +#' @param pattern Java regular expression +#' +#' @rdname split_string +#' @family string_funcs +#' @aliases split_string,Column-method +#' @export +#' @examples \dontrun{ +#' df <- read.text("README.md") +#' +#' head(select(split_string(df$value, "\\s+"))) +#' } +#' @note split_string 2.3.0 +#' @note equivalent to \code{split} SQL function --- End diff -- Thats cool :) I am not convince about the linking though. Scala docs are not very useful. I considered adding `expr` or `selectExpr` version to examples: ```r selectExpr(df, "split(value, '@')") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17729#discussion_r112853256 --- Diff: R/pkg/R/functions.R --- @@ -3745,3 +3745,55 @@ setMethod("collect_set", jc <- callJStatic("org.apache.spark.sql.functions", "collect_set", x@jc) column(jc) }) + +#' split_string +#' +#' Splits string on regular expression. +#' +#' @param x Column to compute on +#' @param pattern Java regular expression +#' +#' @rdname split_string +#' @family string_funcs +#' @aliases split_string,Column-method +#' @export +#' @examples \dontrun{ +#' df <- read.text("README.md") +#' +#' head(select(split_string(df$value, "\\s+"))) +#' } +#' @note split_string 2.3.0 +#' @note equivalent to \code{split} SQL function +setMethod("split_string", + signature(x = "Column", pattern = "character"), + function(x, pattern) { +jc <- callJStatic("org.apache.spark.sql.functions", "split", x@jc, pattern) +column(jc) + }) + +#' repeat_string +#' +#' Repeats string n times. +#' +#' @param x Column to compute on +#' @param n Number of repetitions +#' +#' @rdname repeat_string +#' @family string_funcs +#' @aliases repeat_string,Column-method +#' @export +#' @examples \dontrun{ +#' df <- createDataFame(data.frame( +#' text = c("foo", "bar") +#' )) --- End diff -- I thought about this but it is hard to find a good source at hand. We could use `data/streaming/AFINN-111.txt` which has nice and short lines, or `README.md` and just take `head(., 1)` (the rest is empty or longish. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17728 **[Test build #76090 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76090/testReport)** for PR 17728 at commit [`a320327`](https://github.com/apache/spark/commit/a3203272c5ce9dc1a9f923180dcfe00e6665d102). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17463: [SPARK-20131][DStream][Test] Flaky Test: org.apac...
Github user uncleGen closed the pull request at: https://github.com/apache/spark/pull/17463 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17480: [SPARK-20079][Core][yarn] Re registration of AM hangs sp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17480 **[Test build #76089 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76089/testReport)** for PR 17480 at commit [`d3e69cf`](https://github.com/apache/spark/commit/d3e69cf66d77ba02cfa13e8e27273e59248885f1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15125 **[Test build #76088 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76088/testReport)** for PR 15125 at commit [`ec62659`](https://github.com/apache/spark/commit/ec6265986cb91585c0a6fdbc0c9675ec9fbba613). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112849184 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala --- @@ -135,7 +139,8 @@ class KinesisSequenceRangeIterator( endpointUrl: String, regionId: String, range: SequenceNumberRange, -retryTimeoutMs: Int) extends NextIterator[Record] with Logging { +retryTimeoutMs: Int, +sparkConf: SparkConf) extends NextIterator[Record] with Logging { --- End diff -- I prefer the latter. Create it in `KinesisInputDStream` and pass it down to `KinesisBackedBlockRDD` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17733: [SPARK-20425][SQL] Support an extended display mode for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17733 **[Test build #76087 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76087/testReport)** for PR 17733 at commit [`ca8bfbd`](https://github.com/apache/spark/commit/ca8bfbd4f55962773b037c804f827d4f06d95cdd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user yssharma commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112848855 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala --- @@ -135,7 +139,8 @@ class KinesisSequenceRangeIterator( endpointUrl: String, regionId: String, range: SequenceNumberRange, -retryTimeoutMs: Int) extends NextIterator[Record] with Logging { +retryTimeoutMs: Int, +sparkConf: SparkConf) extends NextIterator[Record] with Logging { --- End diff -- Or we can pass then via spark conf and construct the KinesisReadConfigurations object in `KinesisInputDStream` and pass it down to `KinesisBackedBlockRDD `. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user yssharma commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112848762 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala --- @@ -135,7 +139,8 @@ class KinesisSequenceRangeIterator( endpointUrl: String, regionId: String, range: SequenceNumberRange, -retryTimeoutMs: Int) extends NextIterator[Record] with Logging { +retryTimeoutMs: Int, +sparkConf: SparkConf) extends NextIterator[Record] with Logging { --- End diff -- And would you expect it to be passed directly to the`KinesisInputDStream` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112848595 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala --- @@ -135,7 +139,8 @@ class KinesisSequenceRangeIterator( endpointUrl: String, regionId: String, range: SequenceNumberRange, -retryTimeoutMs: Int) extends NextIterator[Record] with Logging { +retryTimeoutMs: Int, +sparkConf: SparkConf) extends NextIterator[Record] with Logging { --- End diff -- I would prefer a specialized case class, something like: ```scala case class KinesisReadConfigurations( maxRetries: Int, retryWaitTimeMs: Long, retryTimeoutMs: Long) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user yssharma commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112848401 --- Diff: external/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDDSuite.scala --- @@ -101,6 +103,36 @@ abstract class KinesisBackedBlockRDDTests(aggregateTestData: Boolean) } } + testIfEnabled("Basic reading from Kinesis with modified configurations") { --- End diff -- I wasn't able to test the actual waiting of Kinesis. I haven't looked at the `PrivateMethodTester ` yet and check how that can help us to test how the vars are picked. I used this testcase to debug and verify that all the values are passed correctly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user yssharma commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112848373 --- Diff: external/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDDSuite.scala --- @@ -101,6 +103,36 @@ abstract class KinesisBackedBlockRDDTests(aggregateTestData: Boolean) } } + testIfEnabled("Basic reading from Kinesis with modified configurations") { +// Add Kinesis retry configurations +sc.conf.set(RETRY_WAIT_TIME_KEY, "1000ms") --- End diff -- +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user yssharma commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112848363 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala --- @@ -135,7 +139,8 @@ class KinesisSequenceRangeIterator( endpointUrl: String, regionId: String, range: SequenceNumberRange, -retryTimeoutMs: Int) extends NextIterator[Record] with Logging { +retryTimeoutMs: Int, +sparkConf: SparkConf) extends NextIterator[Record] with Logging { --- End diff -- @brkyvz - I was thinking not to pass individual configs to the constructor because that would just cause the list to grow. Using SparkConf or a Map would enable us to add new configs without any code changes. I was using a Map earlier for this so that its easy to pass more configs. What are your thoughts on Map vs Case class ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17695: [SPARK-20400][DOCS] Remove References to 3rd Party Vendo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17695 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17695: [SPARK-20400][DOCS] Remove References to 3rd Party Vendo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17695 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76086/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17695: [SPARK-20400][DOCS] Remove References to 3rd Party Vendo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17695 **[Test build #76086 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76086/testReport)** for PR 17695 at commit [`e74c2d6`](https://github.com/apache/spark/commit/e74c2d6bcb2f8a2dc841b8b79d9200710f0dbd4c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17695: [SPARK-20400][DOCS] Remove References to 3rd Party Vendo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17695 **[Test build #76086 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76086/testReport)** for PR 17695 at commit [`e74c2d6`](https://github.com/apache/spark/commit/e74c2d6bcb2f8a2dc841b8b79d9200710f0dbd4c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17695: [SPARK-20400][DOCS] Remove References to 3rd Party Vendo...
Github user anabranch commented on the issue: https://github.com/apache/spark/pull/17695 Thanks for the info @srowen - this should be better now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17728#discussion_r112845286 --- Diff: R/pkg/R/DataFrame.R --- @@ -3642,3 +3642,58 @@ setMethod("checkpoint", df <- callJMethod(x@sdf, "checkpoint", as.logical(eager)) dataFrame(df) }) + + +#' cube +#' +#' Create a multi-dimensional cube for the SparkDataFrame using the specified columns. +#' +#' @param x a SparkDataFrame. +#' @param ... variable(s) (character names(s) or Column(s)) to group on. +#' @return A GroupedData. +#' @family SparkDataFrame functions +#' @aliases cube,SparkDataFrame-method +#' @rdname cube +#' @name cube +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(mtcars) +#' mean(cube(df, "cyl", "gear", "am"), "mpg") +#' } +#' @note cube since 2.3.0 +setMethod("cube", + signature(x = "SparkDataFrame"), + function(x, ...) { +cols <- list(...) --- End diff -- hmm, it's a bit odd to call rollup or cube that way but ok if other languages leave that open too. but I'd say we should add a line to explain "rollup or cube without column is the same as group_by" (or something better) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17728#discussion_r112844527 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -308,6 +308,21 @@ numCyl <- summarize(groupBy(carsDF, carsDF$cyl), count = n(carsDF$cyl)) head(numCyl) ``` +`groupBy` can be replaced with `cube` or `rollup` to compute subtotals across multiple dimensions. --- End diff -- I keep forgetting there is one. I think we can add a few lines. This is actually a pretty neat feature. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17728#discussion_r112844471 --- Diff: R/pkg/R/DataFrame.R --- @@ -3642,3 +3642,58 @@ setMethod("checkpoint", df <- callJMethod(x@sdf, "checkpoint", as.logical(eager)) dataFrame(df) }) + + +#' cube +#' +#' Create a multi-dimensional cube for the SparkDataFrame using the specified columns. +#' +#' @param x a SparkDataFrame. +#' @param ... variable(s) (character names(s) or Column(s)) to group on. +#' @return A GroupedData. +#' @family SparkDataFrame functions +#' @aliases cube,SparkDataFrame-method +#' @rdname cube +#' @name cube +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(mtcars) +#' mean(cube(df, "cyl", "gear", "am"), "mpg") +#' } +#' @note cube since 2.3.0 +setMethod("cube", + signature(x = "SparkDataFrame"), + function(x, ...) { +cols <- list(...) +jcol <- lapply(cols, function(x) if (is.character(x)) column(x)@jc else x@jc) +sgd <- callJMethod(x@sdf, "cube", jcol) +groupedData(sgd) + }) + +#' rollup +#' +#' Create a multi-dimensional rollup for the SparkDataFrame using the specified columns. +#' +#' @param x a SparkDataFrame. +#' @param ... variable(s) (character names(s) or Column(s)) to group on. --- End diff -- Sounds good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/17728#discussion_r112844452 --- Diff: R/pkg/R/DataFrame.R --- @@ -3642,3 +3642,58 @@ setMethod("checkpoint", df <- callJMethod(x@sdf, "checkpoint", as.logical(eager)) dataFrame(df) }) + + +#' cube +#' +#' Create a multi-dimensional cube for the SparkDataFrame using the specified columns. +#' +#' @param x a SparkDataFrame. +#' @param ... variable(s) (character names(s) or Column(s)) to group on. +#' @return A GroupedData. +#' @family SparkDataFrame functions +#' @aliases cube,SparkDataFrame-method +#' @rdname cube +#' @name cube +#' @export +#' @examples +#' \dontrun{ +#' df <- createDataFrame(mtcars) +#' mean(cube(df, "cyl", "gear", "am"), "mpg") +#' } +#' @note cube since 2.3.0 +setMethod("cube", + signature(x = "SparkDataFrame"), + function(x, ...) { +cols <- list(...) --- End diff -- If think we can skip that. `rollup(df)` and `cube(df)` are valid function calls equivalent to `group_by(df)` and arguably can be useful in some cases (like aggregations based on user input). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17649: [SPARK-20380][SQL] Output table comment for DESC FORMATT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17649 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76085/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17649: [SPARK-20380][SQL] Output table comment for DESC FORMATT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17649 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17649: [SPARK-20380][SQL] Output table comment for DESC FORMATT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17649 **[Test build #76085 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76085/testReport)** for PR 17649 at commit [`50deed9`](https://github.com/apache/spark/commit/50deed9959da1ae5d4f7ce647248e2f8c813e125). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112841794 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala --- @@ -135,7 +139,8 @@ class KinesisSequenceRangeIterator( endpointUrl: String, regionId: String, range: SequenceNumberRange, -retryTimeoutMs: Int) extends NextIterator[Record] with Logging { +retryTimeoutMs: Int, +sparkConf: SparkConf) extends NextIterator[Record] with Logging { --- End diff -- I wouldn't pass in the `SparkConf` all the way in here. See how `retryTimeoutMs` has been passed in specifically above. You can do two things: 1. Pass each of them one by one 2. Evaluate all the configurations in `KinesisBackedBlockRDD` or one level higher and use a `case class` such as `KinesisReadConfigurations` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112841862 --- Diff: external/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDDSuite.scala --- @@ -101,6 +103,36 @@ abstract class KinesisBackedBlockRDDTests(aggregateTestData: Boolean) } } + testIfEnabled("Basic reading from Kinesis with modified configurations") { --- End diff -- I don't see how this test actually tests the configuration setting. It just tests if things work, not that the configurations are actually picked up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112841727 --- Diff: docs/streaming-kinesis-integration.md --- @@ -216,3 +216,7 @@ de-aggregate records during consumption. - If no Kinesis checkpoint info exists when the input DStream starts, it will start either from the oldest record available (`InitialPositionInStream.TRIM_HORIZON`) or from the latest tip (`InitialPositionInStream.LATEST`). This is configurable. - `InitialPositionInStream.LATEST` could lead to missed records if data is added to the stream while no input DStreams are running (and no checkpoint info is being stored). - `InitialPositionInStream.TRIM_HORIZON` may lead to duplicate processing of records where the impact is dependent on checkpoint frequency and processing idempotency. + + Kinesis retry configurations + - `spark.streaming.kinesis.retry.waitTime` : SparkConf for wait time between Kinesis retries (in milliseconds). Default is "100ms". --- End diff -- Example: `Wait time between Kinesis retries as a duration string. When reading from Amazon Kinesis, users may hit 'ThroughputExceededExceptions', when consuming faster than 2 mb/s. This configuration can be tweaked to increase the sleep between fetches when a fetch fails to reduce these exceptions.` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/17467#discussion_r112841869 --- Diff: external/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDDSuite.scala --- @@ -101,6 +103,36 @@ abstract class KinesisBackedBlockRDDTests(aggregateTestData: Boolean) } } + testIfEnabled("Basic reading from Kinesis with modified configurations") { +// Add Kinesis retry configurations +sc.conf.set(RETRY_WAIT_TIME_KEY, "1000ms") --- End diff -- we need to clean these up after the test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org