[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16028 **[Test build #78960 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78960/testReport)** for PR 16028 at commit [`6bb1daf`](https://github.com/apache/spark/commit/6bb1daff8e16943f5ba5cac87bd60dde21cf03f3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18424: [SPARK-17091] Add rule to convert IN predicate to equiva...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18424 Have you done actual benchmarks to validate that this is a perf improvement? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18479: WIP - logical plan stat propagation using mixin and visi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18479 **[Test build #78959 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78959/testReport)** for PR 18479 at commit [`9c32d25`](https://github.com/apache/spark/commit/9c32d2507d3f4f269e17e841a4a4e4920b35a5e9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18469 Can we minimize the change by just adding this method to PlanTest? It's not that many lines of code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16028 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16028 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78956/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16028 **[Test build #78956 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78956/testReport)** for PR 16028 at commit [`39ea8e1`](https://github.com/apache/spark/commit/39ea8e15f427d2c27758d7ae058c440b6f76aec5). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18481: [SPARK-20889][SparkR] Grouped documentation for WINDOW c...
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/18481 Ahh, forgot about the window functions. This is actually the last set... @felixcheung @HyukjinKwon ![image](https://user-images.githubusercontent.com/11082368/27724147-55154b52-5d25-11e7-9a9c-2fa1e8ad120c.png) ![image](https://user-images.githubusercontent.com/11082368/27724151-571935bc-5d25-11e7-8379-e45418b27ffa.png) ![image](https://user-images.githubusercontent.com/11082368/27724152-58986d22-5d25-11e7-9910-ca4bd7d98cc5.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18481: [SPARK-20889][SparkR] Grouped documentation for WINDOW c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18481 **[Test build #78957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78957/testReport)** for PR 18481 at commit [`e7d19a3`](https://github.com/apache/spark/commit/e7d19a3da6c580575734e696d7be76bd08d4bae1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18480: [SPARK-21052][SQL][Follow-up] Add hash map metrics to jo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18480 **[Test build #78958 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78958/testReport)** for PR 18480 at commit [`d8df5e0`](https://github.com/apache/spark/commit/d8df5e08f20af44066e7cb08be5cb6dc15ac0427). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18470: [SPARK-21258][SQL] Fix WindowExec complex object aggrega...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18470 ah the spilling logic is not in 2.1, let me revert it, sorry for the trouble. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18481: [SPARK-20889][SparkR] Grouped documentation for W...
GitHub user actuaryzhang opened a pull request: https://github.com/apache/spark/pull/18481 [SPARK-20889][SparkR] Grouped documentation for WINDOW column methods ## What changes were proposed in this pull request? Grouped documentation for column window methods. You can merge this pull request into a Git repository by running: $ git pull https://github.com/actuaryzhang/spark sparkRDocWindow Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18481.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18481 commit e7d19a3da6c580575734e696d7be76bd08d4bae1 Author: actuaryzhang Date: 2017-06-30T06:44:44Z update doc for window functions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16028 **[Test build #78953 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78953/testReport)** for PR 16028 at commit [`b907314`](https://github.com/apache/spark/commit/b907314a759696a8b9cb100acc5b8bd05b6c0ecf). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18480: [SPARK-21052][SQL][Follow-up] Add hash map metrics to jo...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18480 cc @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17758: [SPARK-20460][SPARK-21144][SQL] Make it more consistent ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17758 It's a little messy about where we need to add this check, @maropu can you give a summary about it? thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18480: [SPARK-21052][SQL][Follow-up] Add hash map metric...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/18480 [SPARK-21052][SQL][Follow-up] Add hash map metrics to join ## What changes were proposed in this pull request? Remove `numHashCollisions` in `BytesToBytesMap`. And change `getAverageProbesPerLookup()` to `getAverageProbesPerLookup` as suggested. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-21052-followup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18480.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18480 commit d8df5e08f20af44066e7cb08be5cb6dc15ac0427 Author: Liang-Chi Hsieh Date: 2017-06-30T06:40:19Z Remove numHashCollisions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18470: [SPARK-21258][SQL] Fix WindowExec complex object aggrega...
Github user zzcclp commented on the issue: https://github.com/apache/spark/pull/18470 Hi, @cloud-fan , @hvanhovell , after merging this pr into **branch-2.1**, there are some errors: 1. `value WINDOW_EXEC_BUFFER_SPILL_THRESHOLD is not a member of object org.apache.spark.sql.internal.SQLConf` 2. `overloaded method value json with alternatives: (jsonRDD: org.apache.spark.rdd.RDD[String])org.apache.spark.sql.DataFrame (jsonRDD: org.apache.spark.api.java.JavaRDD[String])org.apache.spark.sql.DataFrame (paths: String*)org.apache.spark.sql.DataFrame(path: String)org.apache.spark.sql.DataFrame cannot be applied to (org.apache.spark.sql.Dataset[String])` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16028 **[Test build #78956 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78956/testReport)** for PR 16028 at commit [`39ea8e1`](https://github.com/apache/spark/commit/39ea8e15f427d2c27758d7ae058c440b6f76aec5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17758: [SPARK-20460][SPARK-21144][SQL] Make it more cons...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17758#discussion_r124973610 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -353,7 +341,10 @@ class SessionCatalog( val tableIdentifier = TableIdentifier(table, Some(db)) requireDbExists(db) requireTableExists(tableIdentifier) -checkDuplication(newSchema) + +SchemaUtils.checkSchemaColumnNameDuplication( --- End diff -- ok, I'll update --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18023 **[Test build #78955 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78955/testReport)** for PR 18023 at commit [`4e36ed9`](https://github.com/apache/spark/commit/4e36ed903973dcf637348825b5726892f2c13f77). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16028 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16028 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78953/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18479: WIP - stat propagation code using mixin and visitor patt...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18479 **[Test build #78954 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78954/testReport)** for PR 18479 at commit [`94fb669`](https://github.com/apache/spark/commit/94fb6694461a4c8a144e751f2ad1a59cd8c860e0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r124973097 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -307,6 +311,28 @@ case class UnresolvedStar(target: Option[Seq[String]]) extends Star with Unevalu } /** + * Represents all of the input attributes to a given relational operator, for example in + * "SELECT `(id)?+.+` FROM ...". + * + * @param table an optional table that should be the target of the expansion. If omitted all + * tables' columns are produced. + */ +case class UnresolvedRegex(regexPattern: String, table: Option[String]) + extends Star with Unevaluable { + override def expand(input: LogicalPlan, resolver: Resolver): Seq[NamedExpression] = { +table match { + // If there is no table specified, use all input attributes that match expr + case None => input.output.filter(_.name.matches(s"(?i)$regexPattern")) --- End diff -- Updated the code with conf caseSensitiveAnalysis --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17758: [SPARK-20460][SPARK-21144][SQL] Make it more cons...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17758#discussion_r124972759 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -353,7 +341,10 @@ class SessionCatalog( val tableIdentifier = TableIdentifier(table, Some(db)) requireDbExists(db) requireTableExists(tableIdentifier) -checkDuplication(newSchema) + +SchemaUtils.checkSchemaColumnNameDuplication( --- End diff -- shall we do this check in `SessionCatalog.createTable`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18479: WIP - stat propagation code using mixin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18479 WIP - stat propagation code using mixin ## What changes were proposed in this pull request? TBD ## How was this patch tested? Should be covered by existing test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark stats-trait Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18479.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18479 commit 94fb6694461a4c8a144e751f2ad1a59cd8c860e0 Author: Reynold Xin Date: 2017-06-30T06:32:04Z WIP - stat propagation code using mixin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16028 **[Test build #78953 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78953/testReport)** for PR 16028 at commit [`b907314`](https://github.com/apache/spark/commit/b907314a759696a8b9cb100acc5b8bd05b6c0ecf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18334: [SPARK-21127] [SQL] Update statistics after data ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18334#discussion_r124971615 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -774,6 +774,12 @@ object SQLConf { .doubleConf .createWithDefault(0.05) + val AUTO_UPDATE_SIZE = +buildConf("spark.sql.statistics.autoUpdate.size") + .doc("Enables automatic update for table size once table's data is changed.") --- End diff -- This flag could slow down the whole data change commands. We need to clearly explain the potential performance regression. Based on the current description, all the users are willing to turn it on. Normally, the users expect we are doing the incremental updates, instead of recalculating it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18334: [SPARK-21127] [SQL] Update statistics after data ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18334#discussion_r124971431 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -437,7 +437,20 @@ case class AlterTableAddPartitionCommand( } catalog.createPartitions(table.identifier, parts, ignoreIfExists = ifNotExists) -CommandUtils.updateTableStats(sparkSession, table) +if (table.stats.nonEmpty) { + if (sparkSession.sessionState.conf.autoUpdateSize) { +val addedSize = parts.map { part => + CommandUtils.calculateLocationSize(sparkSession.sessionState, table.identifier, --- End diff -- In the function `calculateLocationSize`, please add log messages when starting/finishing the statistics collection and --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...
Github user falaki commented on the issue: https://github.com/apache/spark/pull/14431 @NarineK how about adding this as a new API e.g., `gapplyWithKeys()`. I am extremely worried about the semantic change. It can break existing SparkR applications and will be confusing for users. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18445: [Spark-19726][SQL] Faild to insert null timestamp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18445#discussion_r124970124 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -266,10 +266,14 @@ object JdbcUtils extends Logging { /** * Takes a [[ResultSet]] and returns its Catalyst schema. * + * @param alwaysNullable If true, all the columns are nullable. * @return A [[StructType]] giving the Catalyst schema. * @throws SQLException if the schema contains an unsupported type. */ - def getSchema(resultSet: ResultSet, dialect: JdbcDialect): StructType = { + def getSchema( + resultSet: ResultSet, + dialect: JdbcDialect, + alwaysNullable: Boolean = true): StructType = { --- End diff -- You should change the value to `true` in the caller side. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18445: [Spark-19726][SQL] Faild to insert null timestamp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18445#discussion_r124970067 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -266,10 +266,14 @@ object JdbcUtils extends Logging { /** * Takes a [[ResultSet]] and returns its Catalyst schema. * + * @param alwaysNullable If true, all the columns are nullable. * @return A [[StructType]] giving the Catalyst schema. * @throws SQLException if the schema contains an unsupported type. */ - def getSchema(resultSet: ResultSet, dialect: JdbcDialect): StructType = { + def getSchema( + resultSet: ResultSet, + dialect: JdbcDialect, + alwaysNullable: Boolean = true): StructType = { --- End diff -- the default value should be `false` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18445: [Spark-19726][SQL] Faild to insert null timestamp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18445#discussion_r124970022 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -290,7 +294,11 @@ object JdbcUtils extends Logging { rsmd.getClass.getName == "org.apache.hive.jdbc.HiveResultSetMetaData" => true } } - val nullable = rsmd.isNullable(i + 1) != ResultSetMetaData.columnNoNulls + val nullable = if (alwaysNullable) { +alwaysNullable --- End diff -- Nit: -> `true`. Conceptually, they are different. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18465 > isn't being "terminated" enough? that we have to pskill it again? Indeed, this is the point. I really tested so many times but `exit` does not terminate (I think I might have to change the term used in the comments in my PR, maybe to 'kill'). This can be tested as below: ``` vi tmp.R ``` ```r for(i in 0:50) { p <- parallel:::mcfork() if (inherits(p, "masterProcess")) { # Send no data parallel:::mcexit(0L) } } Sys.sleep(10) ``` ``` Rscript tmp.R ``` and, `ps -fe | grep /exec/R` shows the children processes. > the original issue with leaking was thought to be related to child process getting stuck and not terminated properly? I think yes, it is. It looked they are killed but somehow pipes (file descriptors) were left open. This hit the ulimit of open files and then it went an error like `Resource temporarily unavailable fork`. This looked because the children were killed by itself before or while calling `exit()` and left incomplete. > would that manifest again under this new behavior? in other words, would it get into a state where integer is never returned to the master? In the original behaviour, I think the problem is `exit`is not being properly called or terminated during `exit`. The new change here I think it makes sure the children call `exit` properly. I think this test makes sure what I said: ```r read.pids <- function(children) { lapply(children, function(child) { print(parallel:::readChild(child)) }) } kill <- function(children) { lapply(children, function(child) { tools::pskill(child, tools::SIGUSR1) }) } p <- parallel:::mcfork() if (inherits(p, "masterProcess")) { parallel:::mcexit(0L) } Sys.sleep(3) print("Children exited - new behaviour here") children <- parallel:::selectChildren(timeout = 0) if (is.integer(children)) { read.pids(children) kill(children) } p <- parallel:::mcfork() if (inherits(p, "masterProcess")) { tools::pskill(Sys.getpid(), tools::SIGUSR1) parallel:::mcexit(0L) } Sys.sleep(3) print("Children killed itself - old behaviour here") children <- parallel:::selectChildren(timeout = 0) if (is.integer(children)) { read.pids(children) } p <- parallel:::mcfork() if (inherits(p, "masterProcess")) { Sys.sleep(1) } Sys.sleep(3) print("Children killed by parent without exiting") children <- parallel:::selectChildren(timeout = 0) if (is.integer(children)) { kill(children) # wait for kill Sys.sleep(1) read.pids(children) } ``` In my local, this prints the pid when `mcexit` is called properly. ```r [1] "Children exited - new behaviour here" [1] 12992 [[1]] [1] FALSE [1] "Children killed itself - old behaviour here" [1] "Children killed by parent without exiting" ``` And here we kill it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18458: [SPARK-20889][SparkR] Grouped documentation for C...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18458 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18464: [SPARK-21250][WEB-UI]Add a url in the table of 'R...
Github user guoxiaolongzte commented on a diff in the pull request: https://github.com/apache/spark/pull/18464#discussion_r124969031 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/ui/WorkerPage.scala --- @@ -112,7 +112,15 @@ private[ui] class WorkerPage(parent: WorkerWebUI) extends WebUIPage("") { ID: {executor.appId} - Name: {executor.appDesc.name} + Name: + { +if ({executor.state == ExecutorState.RUNNING}) { + {executor.appDesc.name} --- End diff -- yes, I also think so. @dongjoon-hyun Is it necessary to make a non - empty judgment? { if ({executor.state == ExecutorState.RUNNING && **executor.appDesc.appUiUrl.nonEmpty**}) { {executor.appDesc.name} } else { {executor.appDesc.name} } } --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18458: [SPARK-20889][SparkR] Grouped documentation for COLLECTI...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18458 merged to master. thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18458: [SPARK-20889][SparkR] Grouped documentation for COLLECTI...
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/18458 @felixcheung This is the last set of this doc update. Once it gets in, I will do another pass to fix any styles or consistency issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18478: [SPARK-21253][Core][HOTFIX]Fix Scala 2.10 build
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18478 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18478: [SPARK-21253][Core][HOTFIX]Fix Scala 2.10 build
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18478 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78947/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18430: [SPARK-21223]:Thread-safety issue in FsHistoryProvider
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/18430 let's do this in another PR, seems a different threading issue. Also would you please change the PR title like: Change fileToAppInfo in FsHistoryProvider to fix concurrent issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18478: [SPARK-21253][Core][HOTFIX]Fix Scala 2.10 build
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18478 **[Test build #78947 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78947/testReport)** for PR 18478 at commit [`8295b3c`](https://github.com/apache/spark/commit/8295b3cb5cbce43f53d4e0c48dfc1eba6049c4ba). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18436: [SPARK-20073][SQL] Prints an explicit warning message in...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18436 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18430: [SPARK-21223]:Thread-safety issue in FsHistoryProvider
Github user zenglinxi0615 commented on the issue: https://github.com/apache/spark/pull/18430 sorry, it's a typing error, i mean the related JIRA: SPARK-21078. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to transp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r124966795 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -610,6 +611,25 @@ object CollapseWindow extends Rule[LogicalPlan] { } /** + * Transpose Adjacent Window Expressions. + * - If the partition spec of the parent Window expression is compatible with the partition spec + * of the child window expression, transpose them. + */ +object TransposeWindow extends Rule[LogicalPlan] { + private def compatibleParititions(ps1 : Seq[Expression], ps2: Seq[Expression]): Boolean = { +ps1.length < ps2.length && ps2.take(ps1.length).permutations.exists(ps1.zip(_).forall { + case (l, r) => l.semanticEquals(r) +}) + } + + def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case w1 @ Window(we1, ps1, os1, w2 @ Window(we2, ps2, os2, grandChild)) --- End diff -- The expressions in both `we1` and `we2` must be deterministic. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17935: [SPARK-20690][SQL] Subqueries in FROM should have alias ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17935 The reason I found out about this is because the one of the widely circulated TPC-DS benchmark harness online uses this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to transp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r124966455 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -610,6 +611,25 @@ object CollapseWindow extends Rule[LogicalPlan] { } /** + * Transpose Adjacent Window Expressions. + * - If the partition spec of the parent Window expression is compatible with the partition spec + * of the child window expression, transpose them. + */ +object TransposeWindow extends Rule[LogicalPlan] { + private def compatibleParititions(ps1 : Seq[Expression], ps2: Seq[Expression]): Boolean = { +ps1.length < ps2.length && ps2.take(ps1.length).permutations.exists(ps1.zip(_).forall { + case (l, r) => l.semanticEquals(r) +}) + } + + def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case w1 @ Window(we1, ps1, os1, w2 @ Window(we2, ps2, os2, grandChild)) +if w1.references.intersect(w2.windowOutputSet).isEmpty && compatibleParititions(ps1, ps2) => --- End diff -- No test case covers the condition `w1.references.intersect(w2.windowOutputSet).isEmpty` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17935: [SPARK-20690][SQL] Subqueries in FROM should have alias ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17935 Sorry, to be accurate, for the syntax of derived table in SQL, the databases I listed above are commonly seen in the market, and they don't support it without alias name. SQL 2003 grammar also doesn't support it. Based on the above, I'd tend to think that derived table without alias name is not a widely used syntax among database users. But we know there's exception such as CockroachDB, as @JoshRosen pointed out. This isn't an argument to object reverting, rather just a reference for you to decide on this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18458: [SPARK-20889][SparkR] Grouped documentation for COLLECTI...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18458 **[Test build #78949 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78949/testReport)** for PR 18458 at commit [`8be3e49`](https://github.com/apache/spark/commit/8be3e49a2cd8dc1c5f5f524cd58728fcd23e0327). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18458: [SPARK-20889][SparkR] Grouped documentation for COLLECTI...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18458 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18458: [SPARK-20889][SparkR] Grouped documentation for COLLECTI...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18458 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78949/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/17681 @felixcheung Thank you very much! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18436: [SPARK-20073][SQL] Prints an explicit warning message in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18436 **[Test build #78952 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78952/testReport)** for PR 18436 at commit [`12347e6`](https://github.com/apache/spark/commit/12347e66c1193fe2fb0af439de08d43919baed60). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18430: [SPARK-21223]:Thread-safety issue in FsHistoryProvider
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/18430 What's the issue of SPARK-13988? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18436: [SPARK-20073][SQL] Prints an explicit warning mes...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/18436#discussion_r124965300 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala --- @@ -464,7 +464,15 @@ class Column(val expr: Expression) extends Logging { * @group expr_ops * @since 1.3.0 */ - def <=> (other: Any): Column = withExpr { EqualNullSafe(expr, lit(other).expr) } + def <=> (other: Any): Column = withExpr { +val right = lit(other).expr +if (this.expr == right) { + logWarning( +s"Constructing trivially true equals predicate, '${this.expr} = $right'. " + --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17681 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17681 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78946/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18471: [SPARK-21259] More rules for scalastyle
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18471 **[Test build #78951 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78951/testReport)** for PR 18471 at commit [`fd5bde5`](https://github.com/apache/spark/commit/fd5bde5e1c79c68bebd667828d773f63165117cb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18471: [SPARK-21259] More rules for scalastyle
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/18471#discussion_r124964773 --- Diff: scalastyle-config.xml --- @@ -245,7 +245,13 @@ This file is divided into 3 sections: - COMMA + COLON, COMMA, RPAREN + + + + --- End diff -- Thanks, I have updated the xml file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17681 **[Test build #78946 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78946/testReport)** for PR 17681 at commit [`b49280a`](https://github.com/apache/spark/commit/b49280aad51b1fbb44b992c03beb6c18325baa84). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17935: [SPARK-20690][SQL] Subqueries in FROM should have alias ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17935 I don't think that argument is useful at all. For example, none of the other databases support the DataFrame API. Does that mean few users will write DataFrame code? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r124964462 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -307,6 +311,28 @@ case class UnresolvedStar(target: Option[Seq[String]]) extends Star with Unevalu } /** + * Represents all of the input attributes to a given relational operator, for example in + * "SELECT `(id)?+.+` FROM ...". + * + * @param table an optional table that should be the target of the expansion. If omitted all + * tables' columns are produced. + */ +case class UnresolvedRegex(regexPattern: String, table: Option[String]) + extends Star with Unevaluable { + override def expand(input: LogicalPlan, resolver: Resolver): Seq[NamedExpression] = { +table match { + // If there is no table specified, use all input attributes that match expr + case None => input.output.filter(_.name.matches(s"(?i)$regexPattern")) --- End diff -- You need to check the conf `sparkSession.sessionState.conf.caseSensitiveAnalysis` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18436: [SPARK-20073][SQL] Prints an explicit warning mes...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18436#discussion_r124963887 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala --- @@ -464,7 +464,15 @@ class Column(val expr: Expression) extends Logging { * @group expr_ops * @since 1.3.0 */ - def <=> (other: Any): Column = withExpr { EqualNullSafe(expr, lit(other).expr) } + def <=> (other: Any): Column = withExpr { +val right = lit(other).expr +if (this.expr == right) { + logWarning( +s"Constructing trivially true equals predicate, '${this.expr} = $right'. " + --- End diff -- `=` => `<=>` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18436: [SPARK-20073][SQL] Prints an explicit warning message in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18436 **[Test build #78950 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78950/testReport)** for PR 18436 at commit [`be56898`](https://github.com/apache/spark/commit/be568984247988ae50784cb7e4550c656aab6513). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18436: [SPARK-20073][SQL] Prints an explicit warning message in...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18436 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18464: [SPARK-21250][WEB-UI]Add a url in the table of 'R...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18464#discussion_r124963695 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/ui/WorkerPage.scala --- @@ -112,7 +112,15 @@ private[ui] class WorkerPage(parent: WorkerWebUI) extends WebUIPage("") { ID: {executor.appId} - Name: {executor.appDesc.name} + Name: + { +if ({executor.state == ExecutorState.RUNNING}) { + {executor.appDesc.name} --- End diff -- This was originally set by: ``` val webUrl = sc.ui.map(_.webUrl).getOrElse("") ``` The webUrl is only valid after bind(). But I think normally this should not be empty. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r124963673 --- Diff: docs/configuration.md --- @@ -614,6 +614,34 @@ Apart from these, the following properties are also available, and may be useful + spark.network.netty.memCostWaterMark --- End diff -- cc @rxin @JoshRosen @zsxwing any suggestion for the config name? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16056: [SPARK-18623][SQL] Add `returnNullable` to `Stati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16056#discussion_r124963500 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala --- @@ -96,28 +96,32 @@ object RowEncoder { DateTimeUtils.getClass, TimestampType, "fromJavaTimestamp", -inputObject :: Nil) +inputObject :: Nil, +returnNullable = false) case DateType => StaticInvoke( DateTimeUtils.getClass, DateType, "fromJavaDate", -inputObject :: Nil) +inputObject :: Nil, +returnNullable = false) case d: DecimalType => StaticInvoke( Decimal.getClass, d, "fromDecimal", -inputObject :: Nil) +inputObject :: Nil, +returnNullable = false) case StringType => StaticInvoke( classOf[UTF8String], StringType, "fromString", -inputObject :: Nil) +inputObject :: Nil, +returnNullable = true) --- End diff -- `UTF8String.fromString` only returns null if input string is null, and here `propogateNull` is true, which means `retutnNullabe` can be false, because when we invoke this method, the input string must be non-null. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18266 I am fine to support customized schema for read path of JDBC relation. However, we need to check whether the user-specified schema matches the underlying the table schema. If not matched, we need to capture it earlier and issue a proper error message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18458: [SPARK-20889][SparkR] Grouped documentation for COLLECTI...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18458 **[Test build #78949 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78949/testReport)** for PR 18458 at commit [`8be3e49`](https://github.com/apache/spark/commit/8be3e49a2cd8dc1c5f5f524cd58728fcd23e0327). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16056: [SPARK-18623][SQL] Add `returnNullable` to `Stati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16056#discussion_r124963200 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala --- @@ -216,23 +216,26 @@ object JavaTypeInference { ObjectType(c), "valueOf", getPath :: Nil, - propagateNull = true) + propagateNull = true, --- End diff -- nit: we can remove `propagateNull = true` as it's true by default --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18266: [SPARK-20427][SQL] Read JDBC table use custom sch...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18266#discussion_r124963184 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala --- @@ -120,6 +120,7 @@ class JDBCOptions( // TODO: to reuse the existing partition parameters for those partition specific options val createTableOptions = parameters.getOrElse(JDBC_CREATE_TABLE_OPTIONS, "") val createTableColumnTypes = parameters.get(JDBC_CREATE_TABLE_COLUMN_TYPES) + val customSchema = parameters.get(JDBC_CUSTOM_SCHEMA) --- End diff -- convert it to `StructType` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18474: [SPARK-21235][SPARKR] UTest should clear temp results wh...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/18474 We should not need this, because in `afterEach()` we will stop each store, which should clear both the memoryStore and the disk blocks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18266: [SPARK-20427][SQL] Read JDBC table use custom sch...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18266#discussion_r124962985 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -907,7 +907,7 @@ class JDBCSuite extends SparkFunSuite assert(new JDBCOptions(CaseInsensitiveMap(parameters)).asConnectionProperties.isEmpty) } - test("SPARK-16848: jdbc API throws an exception for user specified schema") { + ignore("SPARK-16848: jdbc API throws an exception for user specified schema") { --- End diff -- Then, we should remove this test case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18448: [SPARK-20889][SparkR] Grouped documentation for M...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18448 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18470: [SPARK-21258][SQL] Fix WindowExec complex object ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18470 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18470: [SPARK-21258][SQL] Fix WindowExec complex object aggrega...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18470 thanks, merging to master/2.2/2.1! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18448: [SPARK-20889][SparkR] Grouped documentation for MISC col...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18448 merged to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18464: [SPARK-21250][WEB-UI]Add a url in the table of 'R...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18464#discussion_r124962728 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/ui/WorkerPage.scala --- @@ -112,7 +112,15 @@ private[ui] class WorkerPage(parent: WorkerWebUI) extends WebUIPage("") { ID: {executor.appId} - Name: {executor.appDesc.name} + Name: + { +if ({executor.state == ExecutorState.RUNNING}) { + {executor.appDesc.name} --- End diff -- When will `executor.appDesc.appUiUrl` be empty? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20873][SQL] Enhance ColumnVector to support compr...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18468 @cloud-fan Could you review this? As we discussed at Spark Summit, I prepared a new ColumnVector for compressed column using the current schemes. Any comments are appreciated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20873][SQL] Enhance ColumnVector to support compr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18468 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78944/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20873][SQL] Enhance ColumnVector to support compr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18468 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18465 hmm, this is a fairly small but crucial change. One point we were discussing, earlier, we are saying that we don't know if the child is going away, or merely slow starting, so we explicitly send and look for a exit code. I guess this condition is covered in the new change. According to the doc ``` readChild and readChildren return a raw vector with a "pid" attribute if data were available, integer vector of length one with the process ID if a child terminated ``` So I guess it is safer to act only if we get an integer (which, according to above, the child terminated), but to clarify - isn't being "terminated" enough? that we have to pskill it again? - the original issue with leaking was thought to be related to child process getting stuck and *not* terminated properly? would that manifest again under this new behavior? in other words, would it get into a state where integer is never returned to the master? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18468: [SPARK-20873][SQL] Enhance ColumnVector to support compr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18468 **[Test build #78944 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78944/testReport)** for PR 18468 at commit [`514400c`](https://github.com/apache/spark/commit/514400c68ac048e73ada1bdc3473123fb0acea74). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18474: [SPARK-21235][SPARKR] UTest should clear temp results wh...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/18474 We should not need this, because in `afterEach()` we will stop each store. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18307: [SPARK-21100][SQL] describe should give quartiles...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18307#discussion_r124961333 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2205,37 +2205,170 @@ class Dataset[T] private[sql]( * // max 92.0 192.0 * }}} * + * See also [[describeExtended]] and [[describeAdvanced]] + * + * @param cols Columns to compute statistics on. + * * @group action * @since 1.6.0 */ @scala.annotation.varargs - def describe(cols: String*): DataFrame = withPlan { + def describe(cols: String*): DataFrame = +describeAdvanced(Array("count", "mean", "stddev", "min", "max"), cols: _*) + + /** + * Computes statistics for numeric and string columns, including count, mean, stddev, min, + * approximate quartiles, and max. If no columns are given, this function computes + * statistics for all numerical or string columns. + * + * This function is meant for exploratory data analysis, as we make no guarantee about the + * backward compatibility of the schema of the resulting Dataset. If you want to + * programmatically compute summary statistics, use the `agg` function instead. + * + * {{{ + * ds.describeExtended("age", "height").show() + * + * // output: + * // summary age height + * // count 10.0 10.0 + * // mean53.3 178.05 + * // stddev 11.6 15.7 + * // min 18.0 163.0 + * // 25% 24.0 176.0 + * // 50% 24.0 176.0 + * // 75% 32.0 180.0 + * // max 92.0 192.0 + * }}} + * + * To specify which statistics or percentiles are desired see [[describeAdvanced]] + * + * @param cols Columns to compute statistics on. + * + * @group action + * @since 2.3.0 + */ + @scala.annotation.varargs + def describeExtended(cols: String*): DataFrame = +describeAdvanced(Array("count", "mean", "stddev", "min", "25%", "50%", "75%", "max"), cols: _*) + + /** + * Computes specified statistics for numeric and string columns. Available statistics are: + * + * - count + * - mean + * - stddev + * - min + * - max + * - arbitrary approximate percentiles specifid as a percentage (eg, 75%) + * + * If no columns are given, this function computes statistics for all numerical or string + * columns. + * + * This function is meant for exploratory data analysis, as we make no guarantee about the + * backward compatibility of the schema of the resulting Dataset. If you want to + * programmatically compute summary statistics, use the `agg` function instead. + * + * {{{ + * ds.describeAdvanced(Array("count", "min", "25%", "75%", "max"), "age", "height").show() + * + * // output: + * // summary age height + * // count 10.0 10.0 + * // min 18.0 163.0 + * // 25% 24.0 176.0 + * // 75% 32.0 180.0 + * // max 92.0 192.0 + * }}} + * + * @param statistics Statistics from above list to be computed. + * @param cols Columns to compute statistics on. + * + * @group action + * @since 2.3.0 + */ + @scala.annotation.varargs + def describeAdvanced(statistics: Array[String], cols: String*): DataFrame = withPlan { --- End diff -- though R has summarize http://spark.apache.org/docs/latest/api/R/summarize.html and also a very popular R package has as well (which we model on) https://cran.r-project.org/web/packages/dplyr/dplyr.pdf --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14431 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala#L51 You just need to change the above line ``` val groupingExprs: Seq[Expression], ``` You can access `groupingExprs` in `SQLUtils.scala`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18473: [SPARK-21260][SQL][MINOR] Remove the unused OutputFakerE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18473 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78945/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18473: [SPARK-21260][SQL][MINOR] Remove the unused OutputFakerE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18473 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17681 there's an issue causing `fails due to an unknown error code` - if you haven't rebase, rebase to master should eliminate it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18473: [SPARK-21260][SQL][MINOR] Remove the unused OutputFakerE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18473 **[Test build #78945 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78945/testReport)** for PR 18473 at commit [`c5084a3`](https://github.com/apache/spark/commit/c5084a3876d9d18073a74c19d4a11f9fa828f451). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18474: [SPARK-21235][SPARKR] UTest should clear temp results wh...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18474 hi - I don't think `[SPARKR]` in the title is right. Sounds like this should be `[core]` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18477: [SPARK-21261][DOCS]SQL Regex document fix
Github user gf53520 commented on the issue: https://github.com/apache/spark/pull/18477 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18471: [SPARK-21259] More rules for scalastyle
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18471 **[Test build #78948 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78948/testReport)** for PR 18471 at commit [`05084a9`](https://github.com/apache/spark/commit/05084a9706b2c3a9cf6dc1c733d7d16b8bac0f41). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18430: [SPARK-21223]:Thread-safety issue in FsHistoryProvider
Github user zenglinxi0615 commented on the issue: https://github.com/apache/spark/pull/18430 @srowen thanks for your suggestions again! and should I address the problem of SPARK-13988 in this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18478: [SPARK-21253][Core][HOTFIX]Fix Scala 2.10 build
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18478 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18478: [SPARK-21253][Core][HOTFIX]Fix Scala 2.10 build
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/18478 Verified both Scala 2.10 and 2.11 build locally. Since Jenkins PR build doesn't use Scala 2.10, I'm going to merge directly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18467: [SPARK-19659][Core]Disable spark.reducer.maxReqSizeShuff...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18467 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78943/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18467: [SPARK-19659][Core]Disable spark.reducer.maxReqSizeShuff...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18467 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18467: [SPARK-19659][Core]Disable spark.reducer.maxReqSizeShuff...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18467 **[Test build #78943 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78943/testReport)** for PR 18467 at commit [`b8f022e`](https://github.com/apache/spark/commit/b8f022e82653f756b199fe6c88d362a9861e3825). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org