[GitHub] spark issue #14165: [SPARK-16503] SparkSession should provide Spark version
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14165 **[Test build #62216 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62216/consoleFull)** for PR 14165 at commit [`1ea0247`](https://github.com/apache/spark/commit/1ea0247cfd68823ce6175cec42e2027334d31451). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14106: [SPARK-16448] RemoveAliasOnlyProject should not r...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14106#discussion_r70585442 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -165,36 +165,48 @@ object PushProjectThroughSample extends Rule[LogicalPlan] { * but can also benefit other operators. */ object RemoveAliasOnlyProject extends Rule[LogicalPlan] { - // Check if projectList in the Project node has the same attribute names and ordering - // as its child node. + /** + * Returns true if the project list is semantically same with child output, after strip alias on + * attribute. + */ private def isAliasOnly( projectList: Seq[NamedExpression], childOutput: Seq[Attribute]): Boolean = { -if (!projectList.forall(_.isInstanceOf[Alias]) || projectList.length != childOutput.length) { +if (projectList.length != childOutput.length) { false } else { - projectList.map(_.asInstanceOf[Alias]).zip(childOutput).forall { case (a, o) => -a.child match { - case attr: Attribute if a.name == attr.name && attr.semanticEquals(o) => true - case _ => false -} + stripAliasOnAttribute(projectList).zip(childOutput).forall { +case (a: Attribute, o) if a semanticEquals o => true +case _ => false } } } + private def stripAliasOnAttribute(projectList: Seq[NamedExpression]) = { +projectList.map { + // Alias with metadata can not be striped, or the metadata will be lost. + // If the alias name is different from attribute name, we can't strip it either, or we may + // accidentally change the output schema name of the root plan. + case a @ Alias(attr: Attribute, name) if a.metadata == Metadata.empty && name == attr.name => +attr + case other => other +} + } + def apply(plan: LogicalPlan): LogicalPlan = { -val aliasOnlyProject = plan.find { - case Project(pList, child) if isAliasOnly(pList, child.output) => true - case _ => false +val aliasOnlyProject = plan.collectFirst { + case p @ Project(pList, child) if isAliasOnly(pList, child.output) => p } -aliasOnlyProject.map { case p: Project => - val aliases = p.projectList.map(_.asInstanceOf[Alias]) - val attrMap = AttributeMap(aliases.map(a => (a.toAttribute, a.child))) +aliasOnlyProject.map { case proj => + val attributesToReplace = proj.output.zip(proj.child.output).filterNot { +case (a1, a2) => a1 semanticEquals a2 + } + val attrMap = AttributeMap(attributesToReplace) plan.transformAllExpressions { case a: Attribute if attrMap.contains(a) => attrMap(a) }.transform { -case op: Project if op.eq(p) => op.child +case plan: Project if plan eq proj => plan.child } }.getOrElse(plan) } --- End diff -- Can we use a `plan.transform` to implement this rule? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14106: [SPARK-16448] RemoveAliasOnlyProject should not r...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14106#discussion_r70584787 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -165,36 +165,48 @@ object PushProjectThroughSample extends Rule[LogicalPlan] { * but can also benefit other operators. */ object RemoveAliasOnlyProject extends Rule[LogicalPlan] { - // Check if projectList in the Project node has the same attribute names and ordering - // as its child node. + /** + * Returns true if the project list is semantically same with child output, after strip alias on + * attribute. + */ private def isAliasOnly( projectList: Seq[NamedExpression], childOutput: Seq[Attribute]): Boolean = { -if (!projectList.forall(_.isInstanceOf[Alias]) || projectList.length != childOutput.length) { +if (projectList.length != childOutput.length) { false } else { - projectList.map(_.asInstanceOf[Alias]).zip(childOutput).forall { case (a, o) => -a.child match { - case attr: Attribute if a.name == attr.name && attr.semanticEquals(o) => true - case _ => false -} + stripAliasOnAttribute(projectList).zip(childOutput).forall { +case (a: Attribute, o) if a semanticEquals o => true +case _ => false } } } + private def stripAliasOnAttribute(projectList: Seq[NamedExpression]) = { +projectList.map { + // Alias with metadata can not be striped, or the metadata will be lost. --- End diff -- Nit: striped => stripped --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14106: [SPARK-16448] RemoveAliasOnlyProject should not r...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14106#discussion_r70584778 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -165,36 +165,48 @@ object PushProjectThroughSample extends Rule[LogicalPlan] { * but can also benefit other operators. */ object RemoveAliasOnlyProject extends Rule[LogicalPlan] { - // Check if projectList in the Project node has the same attribute names and ordering - // as its child node. + /** + * Returns true if the project list is semantically same with child output, after strip alias on --- End diff -- Nit: "... same with ..." => "... same as ..." --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14173: [SPARKR][SPARK-16507] Add a CRAN checker, fix Rd ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14173#discussion_r70583775 --- Diff: R/pkg/R/column.R --- @@ -235,20 +248,16 @@ setMethod("cast", function(x, dataType) { if (is.character(dataType)) { column(callJMethod(x@jc, "cast", dataType)) -} else if (is.list(dataType)) { --- End diff -- breaking change? if intended, remove example on L243? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14173: [SPARKR][SPARK-16507] Add a CRAN checker, fix Rd ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14173#discussion_r70583496 --- Diff: R/pkg/R/column.R --- @@ -44,6 +44,9 @@ setMethod("initialize", "Column", function(.Object, jc) { .Object }) +#' @rdname column +#' @name column +#' @aliases column,jobj-method --- End diff -- I thought we don't export this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14173: [SPARKR][SPARK-16507] Add a CRAN checker, fix Rd ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14173#discussion_r70583340 --- Diff: R/pkg/R/SQLContext.R --- @@ -267,6 +267,10 @@ as.DataFrame.default <- function(data, schema = NULL, samplingRatio = 1.0) { createDataFrame(data, schema, samplingRatio) } +#' @rdname createDataFrame +#' @aliases createDataFrame --- End diff -- should aliases here be as.DataFrame so it could be find via `?`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14178: [SPARKR][DOCS][MINOR] R programming guide to include csv...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14178 **[Test build #62228 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62228/consoleFull)** for PR 14178 at commit [`30c7c81`](https://github.com/apache/spark/commit/30c7c81de962e8cc577b1c9786939521fe1c6899). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14173: [SPARKR][SPARK-16507] Add a CRAN checker, fix Rd ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14173#discussion_r70583224 --- Diff: R/pkg/R/DataFrame.R --- @@ -2950,6 +3038,10 @@ setMethod("drop", }) # Expose base::drop +#' @name drop +#' @rdname drop --- End diff -- this would add a fairly empty Rd page for drop.. I wonder if there is a way to avoid that? Perhaps add a link to base::drop? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14176 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62227/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14176 **[Test build #62227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62227/consoleFull)** for PR 14176 at commit [`a3360e0`](https://github.com/apache/spark/commit/a3360e0ab1223dd43f891e755e648680a402b7df). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14176 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14178: [SPARKR][DOCS][MINOR] R programming guide to incl...
GitHub user felixcheung opened a pull request: https://github.com/apache/spark/pull/14178 [SPARKR][DOCS][MINOR] R programming guide to include csv data source example ## What changes were proposed in this pull request? Minor documentation update for code example, code style, and missed reference to "sparkR.init" ## How was this patch tested? manual @shivaram You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixcheung/spark rcsvprogrammingguide Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14178.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14178 commit 30c7c81de962e8cc577b1c9786939521fe1c6899 Author: Felix Cheung Date: 2016-07-13T06:42:26Z update --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14177: [SPARK-16027][SPARKR] Fix R tests SparkSession init/stop
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14177 **[Test build #62226 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62226/consoleFull)** for PR 14177 at commit [`1a86e85`](https://github.com/apache/spark/commit/1a86e857ab954620fb33dde8667f3a2a7d5138dc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14176 **[Test build #62227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62227/consoleFull)** for PR 14176 at commit [`a3360e0`](https://github.com/apache/spark/commit/a3360e0ab1223dd43f891e755e648680a402b7df). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGener...
Github user ooq commented on the issue: https://github.com/apache/spark/pull/14174 cc @sameeragarwal @davies @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14176: [SPARK-16525][SQL] Enable Row Based HashMap in HashAggre...
Github user ooq commented on the issue: https://github.com/apache/spark/pull/14176 cc @sameeragarwal @davies @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14177: [SPARK-16027][SPARKR] Fix R tests SparkSession in...
GitHub user felixcheung opened a pull request: https://github.com/apache/spark/pull/14177 [SPARK-16027][SPARKR] Fix R tests SparkSession init/stop ## What changes were proposed in this pull request? Fix R SparkSession init/stop, and warnings of reusing existing Spark Context ## How was this patch tested? unit tests @shivaram You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixcheung/spark rsessiontest Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14177.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14177 commit 72fffbb593de289fb4434c730c592e04b50fb13f Author: Felix Cheung Date: 2016-07-13T05:42:01Z fix session start/stop in tests commit 614a63e091a8164696a4316564bdae53257953de Author: Felix Cheung Date: 2016-07-13T06:56:56Z fix test commit 1a86e857ab954620fb33dde8667f3a2a7d5138dc Author: Felix Cheung Date: 2016-07-13T07:56:09Z fix style --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14176: [SPARK-16525][SQL] Enable Row Based HashMap in Ha...
GitHub user ooq opened a pull request: https://github.com/apache/spark/pull/14176 [SPARK-16525][SQL] Enable Row Based HashMap in HashAggregateExec ## What changes were proposed in this pull request? This PR is the second step for the following feature: For hash aggregation in Spark SQL, we use a fast aggregation hashmap to act as a "cache" in order to boost aggregation performance. Previously, the hashmap is backed by a `ColumnarBatch`. This has performance issues when we have wide schema for the aggregation table (large number of key fields or value fields). In this JIRA, we support another implementation of fast hashmap, which is backed by a `RowBatch`. We then automatically pick between the two implementations based on certain knobs. In this second-step PR, we enable `RowBasedHashMapGenerator` in `HashAggregateExec`. ## How was this patch tested? Tests and benchmarks will be added in a separate PR in the series. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ooq/spark rowbasedfastaggmap-pr2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14176.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14176 commit c87f26b318b5d673ac95454df5c1cb9a56c677eb Author: Qifan Pu Date: 2016-07-13T07:35:06Z add RowBatch and RowBasedHashMapGenerator commit a3360e0ab1223dd43f891e755e648680a402b7df Author: Qifan Pu Date: 2016-07-13T08:08:35Z enable row based hashmap --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL pr...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14119 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14175: [SPARK-16522][MESOS] Spark application throws exception ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14175 **[Test build #62225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62225/consoleFull)** for PR 14175 at commit [`6fe96e5`](https://github.com/apache/spark/commit/6fe96e5879fd97aa630839e670e3d8b17de785be). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14119: [SPARK-16303][DOCS][EXAMPLES][WIP] Updated SQL programmi...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14119 LGTM, I've merged this to master and branch-2.0. Thanks for working on this! I only observed one weird rendering caused by the blank lines before `{% include_example %}`, maybe my local Jekyll version is too low. I think it's fine to leave other lines as is. The exceeded lines should be OK. Could you please remove the WIP tag from the PR title? (I've removed it manually while merging this PR.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14175: [SPARK-16522][MESOS] Spark application throws exc...
GitHub user sun-rui opened a pull request: https://github.com/apache/spark/pull/14175 [SPARK-16522][MESOS] Spark application throws exception on exit. ## What changes were proposed in this pull request? Spark applications running on Mesos throw exception upon exit. For details, refer to https://issues.apache.org/jira/browse/SPARK-16522. I am not sure if there is any better fix, so wait for review comments. ## How was this patch tested? Manual test. Observed that the exception is gone upon application exit. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sun-rui/spark SPARK-16522 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14175.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14175 commit 6fe96e5879fd97aa630839e670e3d8b17de785be Author: Sun Rui Date: 2016-07-13T07:43:38Z [SPARK-16522][MESOS] Spark application throws exception on exit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGener...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14174 **[Test build #6 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/6/consoleFull)** for PR 14174 at commit [`c87f26b`](https://github.com/apache/spark/commit/c87f26b318b5d673ac95454df5c1cb9a56c677eb). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public final class RowBatch extends MemoryConsumer` * `class RowBasedHashMapGenerator(` * ` case class Buffer(dataType: DataType, name: String)` * ` |public class $generatedClassName extends org.apache.spark.memory.MemoryConsumer` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14165: [SPARK-16503] SparkSession should provide Spark version
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14165 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14165: [SPARK-16503] SparkSession should provide Spark version
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14165 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62223/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14165: [SPARK-16503] SparkSession should provide Spark version
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14165 **[Test build #62223 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62223/consoleFull)** for PR 14165 at commit [`b4372f7`](https://github.com/apache/spark/commit/b4372f75dea7d486c03a4d35b48d65779c316831). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGener...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14174 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGener...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14174 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/6/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGener...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14174 **[Test build #6 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/6/consoleFull)** for PR 14174 at commit [`c87f26b`](https://github.com/apache/spark/commit/c87f26b318b5d673ac95454df5c1cb9a56c677eb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14036 **[Test build #62224 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62224/consoleFull)** for PR 14036 at commit [`16eff20`](https://github.com/apache/spark/commit/16eff2071a1ce2f532000e61f6990eb9d77c173f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14165: [SPARK-16503] SparkSession should provide Spark version
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14165 **[Test build #62223 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62223/consoleFull)** for PR 14165 at commit [`b4372f7`](https://github.com/apache/spark/commit/b4372f75dea7d486c03a4d35b48d65779c316831). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashM...
GitHub user ooq opened a pull request: https://github.com/apache/spark/pull/14174 [SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGenerator ## What changes were proposed in this pull request? This PR is the first step for the following feature: For hash aggregation in Spark SQL, we use a fast aggregation hashmap to act as a "cache" in order to boost aggregation performance. Previously, the hashmap is backed by a `ColumnarBatch`. This has performance issues when we have wide schema for the aggregation table (large number of key fields or value fields). In this JIRA, we support another implementation of fast hashmap, which is backed by a `RowBatch`. We then automatically pick between the two implementations based on certain knobs. In this first-step PR, implementations for `RowBatch` and `RowBasedHashMapGenerator` are added. ## How was this patch tested? `RowBatch` could be tested through unit tests (added later). Otherwise, tests and benchmarks will be added in a separate PR in the series. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ooq/spark rowbasedfastaggmap-pr1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14174.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14174 commit c87f26b318b5d673ac95454df5c1cb9a56c677eb Author: Qifan Pu Date: 2016-07-13T07:35:06Z add RowBatch and RowBasedHashMapGenerator --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/14036 @cloud-fan Done ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14165: [SPARK-16503] SparkSession should provide Spark version
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14165 **[Test build #62221 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62221/consoleFull)** for PR 14165 at commit [`1ea0247`](https://github.com/apache/spark/commit/1ea0247cfd68823ce6175cec42e2027334d31451). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14165: [SPARK-16503] SparkSession should provide Spark version
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14165 **[Test build #62220 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62220/consoleFull)** for PR 14165 at commit [`b0a724e`](https://github.com/apache/spark/commit/b0a724e65a947a0becda3fd17370acd0e695e42a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14036 LGTM except 2 naming comments, thanks for working on it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14036#discussion_r70580188 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -207,20 +207,12 @@ case class Multiply(left: Expression, right: Expression) protected override def nullSafeEval(input1: Any, input2: Any): Any = numeric.times(input1, input2) } -@ExpressionDescription( - usage = "a _FUNC_ b - Divides a by b.", - extended = "> SELECT 3 _FUNC_ 2;\n 1.5") -case class Divide(left: Expression, right: Expression) -extends BinaryArithmetic with NullIntolerant { - - override def inputType: AbstractDataType = TypeCollection(DoubleType, DecimalType) - - override def symbol: String = "/" - override def decimalMethod: String = "$div" +abstract class DivisionArithmetic extends BinaryArithmetic with NullIntolerant { --- End diff -- how about `DivideBase`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14036#discussion_r70580079 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -285,6 +278,28 @@ case class Divide(left: Expression, right: Expression) } @ExpressionDescription( + usage = "a _FUNC_ b - Fraction Division a by b.", + extended = "> SELECT 3 _FUNC_ 2;\n 1.5") +case class Divide(left: Expression, right: Expression) +extends DivisionArithmetic { + + override def inputType: AbstractDataType = TypeCollection(DoubleType, DecimalType) + + override def symbol: String = "/" +} + +@ExpressionDescription( + usage = "a _FUNC_ b - Divides a by b.", + extended = "> SELECT 3 _FUNC_ 2;\n 1") +case class IntegerDivide(left: Expression, right: Expression) --- End diff -- `IntegralDivide`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14111: [SPARK-16456][SQL] Reuse the uncorrelated scalar subquer...
Github user lianhuiwang commented on the issue: https://github.com/apache/spark/pull/14111 @cloud-fan At firstly I have implemented it with you said. But the following situation that has broadcast join will have a error 'ScalarSubquery has not finished', example (from SPARK-14791): val df = (1 to 3).map(i => (i, i)).toDF("key", "value") df.createOrReplaceTempView("t1") df.createOrReplaceTempView("t2") df.createOrReplaceTempView("t3") val q = sql("select * from t1 join (select key, value from t2 " + " where key > (select avg (key) from t3))t on (t1.key = t.key)") Before: ''' *BroadcastHashJoin [key#5], [key#26], Inner, BuildRight :- *Project [_1#2 AS key#5, _2#3 AS value#6] : +- *Filter (cast(_1#2 as double) > subquery#13) : : +- Subquery subquery#13 : : +- *HashAggregate(keys=[], functions=[avg(cast(key#5 as bigint))], output=[avg(key)#25]) : :+- Exchange SinglePartition : : +- *HashAggregate(keys=[], functions=[partial_avg(cast(key#5 as bigint))], output=[sum#30, count#31L]) : : +- LocalTableScan [key#5] : +- LocalTableScan [_1#2, _2#3] +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) +- *Project [_1#2 AS key#26, _2#3 AS value#27] +- *Filter (cast(_1#2 as double) > subquery#13) : +- Subquery subquery#13 : +- *HashAggregate(keys=[], functions=[avg(cast(key#5 as bigint))], output=[avg(key)#25]) :+- Exchange SinglePartition : +- *HashAggregate(keys=[], functions=[partial_avg(cast(key#5 as bigint))], output=[sum#30, count#31L]) : +- LocalTableScan [key#5] +- LocalTableScan [_1#2, _2#3] ''' The steps are as follows: 1. BroadcastHashJoin.prepare() 2. t1.Filter.prepareSubqueries, it will prepare subquery. 3. BroadcastExchange.prepare() 4. t2.Filter.prepareSubqueries, it will prepare subquery. 5. BroadcastExchange.doPrepare() 6. t2.Filter.execute() 7. t2.Filter.waitForSubqueries(), it will wait for subquery. 8. BroadcastHashJoin.doExecute() 9. BroadcastExchange.executeBroadcast() 10. t1.Filter.execute() 11. t1.Filter.waitForSubqueries(), it will wait for subquery. because before that there are two different subqueries, they cannot wait for other's results. But after this PR, they are the same subquery, the steps are as follows: 1. t1.Filter.prepareSubqueries, it will prepare subquery. 2. t2.Filter.prepareSubqueries, it will do not submit subquery's execute(). 3. t2.Filter.waitForSubqueries(), it will can wait for subquery that step-1 have submitted before. 4. t1.Filter.waitForSubqueries(), it do not await subquery's results because step-3 have updated. So I make some logical codes to ScalarSubquery in order to deal with it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14036 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14036 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62213/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14148 It's easy to infer the schema once when we create the table and store it into external catalog. However, it's a breaking change which means users can't change the underlying data file schema after the table is created. It's a bad design we need to fix, but we also need to go through the code path to make sure we don't break other things. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14036 **[Test build #62213 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62213/consoleFull)** for PR 14036 at commit [`8d9a04d`](https://github.com/apache/spark/commit/8d9a04d61a155f5bc131cc7a06a1f9378ceb1cbe). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14148: [SPARK-16482] [SQL] Describe Table Command for Ta...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14148#discussion_r70578153 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -413,38 +413,36 @@ case class DescribeTableCommand(table: TableIdentifier, isExtended: Boolean, isF } else { val metadata = catalog.getTableMetadata(table) + if (DDLUtils.isDatasourceTable(metadata)) { +DDLUtils.getSchemaFromTableProperties(metadata) match { + case Some(userSpecifiedSchema) => describeSchema(userSpecifiedSchema, result) + case None => describeSchema(catalog.lookupRelation(table).schema, result) +} + } else { +describeSchema(metadata.schema, result) + } --- End diff -- @yhuai I just did a try. We have to pass `CatalogTable` for avoiding another call of `getTableMetadata`. We also need to pass `SessionCatalog` for calling `lookupRelation`. Do you like this function? or keep the existing one? Thanks! ```Scala private def describeSchema( tableDesc: CatalogTable, catalog: SessionCatalog, buffer: ArrayBuffer[Row]): Unit = { if (DDLUtils.isDatasourceTable(tableDesc)) { DDLUtils.getSchemaFromTableProperties(tableDesc) match { case Some(userSpecifiedSchema) => describeSchema(userSpecifiedSchema, buffer) case None => describeSchema(catalog.lookupRelation(table).schema, buffer) } } else { describeSchema(tableDesc.schema, buffer) } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14036 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62212/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14036 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13701 @yhuai OK. Thanks for letting me know that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14036 **[Test build #62212 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62212/consoleFull)** for PR 14036 at commit [`ab6858c`](https://github.com/apache/spark/commit/ab6858cac3f8f53a3437038b7cd767e73d170eaa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13658: [SPARK-15937] [yarn] Improving the logic to wait for an ...
Github user subrotosanyal commented on the issue: https://github.com/apache/spark/pull/13658 hi @vanzin Even I am surprised to see that notify was not triggered somehow. > Is your code perhaps setting "spark.master" to "local" or something that is not "yarn-cluster" before you create the SparkContext? I would say we don't set it to local. Further the issue was happening once in a while though the client code remained the same. Though for time being I have applied the patch and built a custom spark distribution to get rid of this random failure but, in long run I won't prefer to use any custom distribution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14165: [SPARK-16503] SparkSession should provide Spark v...
Github user lins05 commented on a diff in the pull request: https://github.com/apache/spark/pull/14165#discussion_r70575753 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -79,6 +79,9 @@ class SparkSession private( sparkContext.assertNotStopped() + /** The version of Spark on which this application is running. */ + def version: String = SPARK_VERSION --- End diff -- @rxin May I ask when should we use the `@Since` java annotation and when use `@since` in javadoc? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14172: [SPARK-16516][SQL] Support for pushing down filters for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14172 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62214/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14172: [SPARK-16516][SQL] Support for pushing down filters for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14172 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14172: [SPARK-16516][SQL] Support for pushing down filters for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14172 **[Test build #62214 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62214/consoleFull)** for PR 14172 at commit [`ade0ad2`](https://github.com/apache/spark/commit/ade0ad27459248d3db1c7e453cbf724596a50a2a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14152: [SPARK-16395] [STREAMING] Fail if too many Checkp...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14152#discussion_r70575075 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala --- @@ -18,8 +18,8 @@ package org.apache.spark.streaming import java.io._ -import java.util.concurrent.Executors -import java.util.concurrent.RejectedExecutionException +import java.util.concurrent.{ArrayBlockingQueue, RejectedExecutionException, --- End diff -- Yeah it's the style I see elsewhere, like https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L39 Arguably it's time for a _ import at this stage; I'm indifferent here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14148 Tomorrow, I will try to dig it deeper and check whether schema evolution could be an issue if the schema is fixed when creating tables. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14148 uh... I see what you mean. Agree. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14148 I was not talking about caching here. Caching is transient. I want the behavior to be the same regardless of how many times I'm restarting Spark ... And this has nothing to do with refresh. For tables in the catalog, NEVER change the schema implicitly, only do it when it is specified by the user. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14148: [SPARK-16482] [SQL] Describe Table Command for Ta...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14148#discussion_r70573373 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -413,38 +413,36 @@ case class DescribeTableCommand(table: TableIdentifier, isExtended: Boolean, isF } else { val metadata = catalog.getTableMetadata(table) + if (DDLUtils.isDatasourceTable(metadata)) { +DDLUtils.getSchemaFromTableProperties(metadata) match { + case Some(userSpecifiedSchema) => describeSchema(userSpecifiedSchema, result) + case None => describeSchema(catalog.lookupRelation(table).schema, result) +} + } else { +describeSchema(metadata.schema, result) + } --- End diff -- Sure. Let me do it now BTW, previously, `describeExtended` and `describeFormatted` also contain the schema. Both call the original function `describe`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14148: [SPARK-16482] [SQL] Describe Table Command for Tables Re...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14148 @rxin Currently, we do not run schema inference every time when metadata cache contains the plan. Based on my understanding, that is the major reason why we introduced the metadata cache at the very beginning. I think it is not hard to store the schema of data source tables in the external catalog (Hive metastore). However, `Refresh Table` only refreshes the metadata cache and the data cache. It does not update the schema stored in the external catalog. If we do not store the schema in the external catalog, it works well. Otherwise, we have to refresh the schema info in the external catalog. To implement your idea, I can submit a PR for the release 2.1 tomorrow. We can discuss it in a separate PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org