[GitHub] spark pull request: [Minor] Alter description of some configuratio...
GitHub user ArcherShao opened a pull request: https://github.com/apache/spark/pull/5519 [Minor] Alter description of some configuration in yarn and mesos The value of these configurations are calculate by 'math.max(a, b)', but description is 'a with minimum of b', alter it to 'a with maximum of b''. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ArcherShao/spark conf-des Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5519.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5519 commit 7d23311dffb919a44bb8e0559159fb616771b59c Author: ArcherShao Date: 2015-04-15T06:50:17Z [Minor] Alter description of some configuration in yarn and mesos --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-93224019 [Test build #30304 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30304/consoleFull) for PR 5208 at commit [`ec8061b`](https://github.com/apache/spark/commit/ec8061b7f36b87c883af111438ac9ff0304050d7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Exchange(` * `case class SortMergeJoin(` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-93224051 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30304/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5350#issuecomment-93223685 [Test build #30303 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30303/consoleFull) for PR 5350 at commit [`3b7bfa8`](https://github.com/apache/spark/commit/3b7bfa8f37e7f2b9aefdfd0e5e57d7b5c6b516ce). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait CaseConversionExpression ` * `final class UTF8String extends Ordered[UTF8String] with Serializable ` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5350#issuecomment-93223712 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30303/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5358#issuecomment-93222352 [Test build #30322 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30322/consoleFull) for PR 5358 at commit [`6014acc`](https://github.com/apache/spark/commit/6014acc11e570c880657238dc4a444ba6335bc13). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN] SPARK-6470. Add support for YARN node l...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5242#issuecomment-93222375 [Test build #30323 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30323/consoleFull) for PR 5242 at commit [`e377ed6`](https://github.com/apache/spark/commit/e377ed61e398dbbbda976ba2e61eb0c8488f4c7f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][Minor] Fix foreachUp of treenode
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/5518#issuecomment-93222096 Please add a test in [TreeNodeSuite](https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2973] [SQL] Avoid spark job for take on...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/5247#discussion_r28397087 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala --- @@ -139,11 +139,13 @@ class DataFrame private[sql]( @transient protected[sql] val logicalPlan: LogicalPlan = queryExecution.logical match { // For various commands (like DDL) and queries with side effects, we force query optimization to // happen right away to let these side effects take place eagerly. -case _: Command | - _: InsertIntoTable | - _: CreateTableAsSelect[_] | - _: CreateTableUsingAsSelect | - _: WriteToFile => +case _ : Command => + queryExecution.sparkPlan.executeCollect() + queryExecution.analyzed --- End diff -- This will leads to executed command twice when we do action operator on dataframe, such as `sql(s"CREATE DATABASE xxx").count()` first execution is when constructing dataframe second is to execute count. So maybe we still need construct LocalRelation here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][Minor] Fix foreachUp of treenode
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5518#issuecomment-93221829 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30318/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][Minor] Fix foreachUp of treenode
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5518#issuecomment-93221818 [Test build #30318 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30318/consoleFull) for PR 5518 at commit [`1ccbfa8`](https://github.com/apache/spark/commit/1ccbfa8ef27b284ace64e605b21f0e4915b53393). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6871][SQL] WITH clause in CTE can not f...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5480 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-93220011 [Test build #30321 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30321/consoleFull) for PR 5208 at commit [`2493b9f`](https://github.com/apache/spark/commit/2493b9f9548c4a63a3d31dc600588ac65968b611). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-93219531 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30319/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-93219524 [Test build #30319 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30319/consoleFull) for PR 5208 at commit [`5049d88`](https://github.com/apache/spark/commit/5049d882fbfcf9b7c63e95ec20d3a15310068752). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Exchange(` * `case class SortMergeJoin(` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4723#issuecomment-93219403 [Test build #30320 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30320/consoleFull) for PR 4723 at commit [`aaf4c5a`](https://github.com/apache/spark/commit/aaf4c5a4a06cd3fe9cf44e48dbfa6d209a4e75f1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][Minor] Fix foreachUp of treenode
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5518#issuecomment-93219321 [Test build #30318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30318/consoleFull) for PR 5518 at commit [`1ccbfa8`](https://github.com/apache/spark/commit/1ccbfa8ef27b284ace64e605b21f0e4915b53393). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-93219080 [Test build #30319 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30319/consoleFull) for PR 5208 at commit [`5049d88`](https://github.com/apache/spark/commit/5049d882fbfcf9b7c63e95ec20d3a15310068752). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-93218845 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30315/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-93218835 [Test build #30315 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30315/consoleFull) for PR 5208 at commit [`f91a2ae`](https://github.com/apache/spark/commit/f91a2aecf795b2a2b2b834bf69b21875ef6f0b6f). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Exchange(` * `case class SortMergeJoin(` * This patch **adds the following new dependencies:** * `commons-math3-3.4.1.jar` * `snappy-java-1.1.1.7.jar` * This patch **removes the following dependencies:** * `commons-math3-3.1.1.jar` * `snappy-java-1.1.1.6.jar` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][Minor] Fix foreachUp of treenode
GitHub user scwf opened a pull request: https://github.com/apache/spark/pull/5518 [SQL][Minor] Fix foreachUp of treenode `foreachUp` should runs the given function recursively on [[children]] then on this node(just like transformUp). The current implementation does not follow this. This will leads to checkanalysis do not check from bottom of logical tree. You can merge this pull request into a Git repository by running: $ git pull https://github.com/scwf/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5518.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5518 commit 1ccbfa8ef27b284ace64e605b21f0e4915b53393 Author: Fei Wang Date: 2015-04-15T06:31:21Z fix foreachUp --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2973] [SQL] Avoid spark job for take on...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5247#issuecomment-93218343 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30316/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28396290 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala --- @@ -157,28 +205,61 @@ private[sql] case class AddExchange(sqlContext: SQLContext) extends Rule[SparkPl case Seq(a,b) => a compatibleWith b }.exists(!_) - // Check if the partitioning we want to ensure is the same as the child's output - // partitioning. If so, we do not need to add the Exchange operator. - def addExchangeIfNecessary(partitioning: Partitioning, child: SparkPlan): SparkPlan = -if (child.outputPartitioning != partitioning) Exchange(partitioning, child) else child + // Adds Exchange or Sort operators as required + def addOperatorsIfNecessary( + partitioning: Partitioning, + rowOrdering: Seq[SortOrder], + child: SparkPlan): SparkPlan = { +val needSort = rowOrdering.nonEmpty && child.outputOrdering != rowOrdering +val needsShuffle = child.outputPartitioning != partitioning +val canSortWithShuffle = Exchange.canSortWithShuffle(partitioning, rowOrdering) + +if (needSort && needsShuffle && canSortWithShuffle) { + Exchange(partitioning, rowOrdering, child) +} else { + val withShuffle = if (needsShuffle) { +Exchange(partitioning, Nil, child) + } else { +child + } - if (meetsRequirements && compatible) { + val withSort = if (needSort) { +Sort(rowOrdering, global = false, withShuffle) + } else { +withShuffle + } + + withSort +} + } + + if (meetsRequirements && compatible && !needsAnySort) { operator } else { // At least one child does not satisfies its required data distribution or // at least one child's outputPartitioning is not compatible with another child's // outputPartitioning. In this case, we need to add Exchange operators. -val repartitionedChildren = operator.requiredChildDistribution.zip(operator.children).map { - case (AllTuples, child) => -addExchangeIfNecessary(SinglePartition, child) - case (ClusteredDistribution(clustering), child) => -addExchangeIfNecessary(HashPartitioning(clustering, numPartitions), child) - case (OrderedDistribution(ordering), child) => -addExchangeIfNecessary(RangePartitioning(ordering, numPartitions), child) - case (UnspecifiedDistribution, child) => child - case (dist, _) => sys.error(s"Don't know how to ensure $dist") +val requirements = + (operator.requiredChildDistribution, operator.requiredChildOrdering, operator.children) + +val fixedChildren = requirements.zipped.map { + case (AllTuples, rowOrdering, child) => +addOperatorsIfNecessary(SinglePartition, rowOrdering, child) + case (ClusteredDistribution(clustering), rowOrdering, child) => +addOperatorsIfNecessary(HashPartitioning(clustering, numPartitions), rowOrdering, child) + case (OrderedDistribution(ordering), rowOrdering, child) => +addOperatorsIfNecessary(RangePartitioning(ordering, numPartitions), Nil, child) --- End diff -- OK, let's add it then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2973] [SQL] Avoid spark job for take on...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5247#issuecomment-93217874 [Test build #30317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30317/consoleFull) for PR 5247 at commit [`7f51f7e`](https://github.com/apache/spark/commit/7f51f7e7c3406611b20b5570e71872cea44f93e8). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2973] [SQL] Avoid spark job for take on...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5247#issuecomment-93217876 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30317/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2973] [SQL] Avoid spark job for take on...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/5247#issuecomment-93217820 Here is a problem need to be fixed: the ddl command will be executed twice --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4435#issuecomment-93217736 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30301/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4435#issuecomment-93217730 **[Test build #30301 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30301/consoleFull)** for PR 4435 at commit [`c22b11f`](https://github.com/apache/spark/commit/c22b11f0a808135e492cb50c5b5bdebcfd73b1a5) after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28396071 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala --- @@ -87,7 +126,12 @@ case class Exchange(newPartitioning: Partitioning, child: SparkPlan) extends Una implicit val ordering = new RowOrdering(sortingExpressions, child.output) --- End diff -- oh, I see... For RangePartitioner.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6898][SQL] completely support special c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5511#issuecomment-93217496 [Test build #30313 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30313/consoleFull) for PR 5511 at commit [`48e3e57`](https://github.com/apache/spark/commit/48e3e57e2dd7ac11002515bcb8906eb1215ab0cf). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class UnresolvedAttribute(nameParts: Seq[String])` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6898][SQL] completely support special c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5511#issuecomment-93217501 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30313/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2973] [SQL] Avoid spark job for take on...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5247#issuecomment-93217483 [Test build #30317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30317/consoleFull) for PR 5247 at commit [`7f51f7e`](https://github.com/apache/spark/commit/7f51f7e7c3406611b20b5570e71872cea44f93e8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28395998 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala --- @@ -87,7 +126,12 @@ case class Exchange(newPartitioning: Partitioning, child: SparkPlan) extends Una implicit val ordering = new RowOrdering(sortingExpressions, child.output) --- End diff -- maybe this line is redundant? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28395922 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala --- @@ -157,28 +205,61 @@ private[sql] case class AddExchange(sqlContext: SQLContext) extends Rule[SparkPl case Seq(a,b) => a compatibleWith b }.exists(!_) - // Check if the partitioning we want to ensure is the same as the child's output - // partitioning. If so, we do not need to add the Exchange operator. - def addExchangeIfNecessary(partitioning: Partitioning, child: SparkPlan): SparkPlan = -if (child.outputPartitioning != partitioning) Exchange(partitioning, child) else child + // Adds Exchange or Sort operators as required + def addOperatorsIfNecessary( + partitioning: Partitioning, + rowOrdering: Seq[SortOrder], + child: SparkPlan): SparkPlan = { +val needSort = rowOrdering.nonEmpty && child.outputOrdering != rowOrdering +val needsShuffle = child.outputPartitioning != partitioning +val canSortWithShuffle = Exchange.canSortWithShuffle(partitioning, rowOrdering) + +if (needSort && needsShuffle && canSortWithShuffle) { + Exchange(partitioning, rowOrdering, child) +} else { + val withShuffle = if (needsShuffle) { +Exchange(partitioning, Nil, child) + } else { +child + } - if (meetsRequirements && compatible) { + val withSort = if (needSort) { +Sort(rowOrdering, global = false, withShuffle) + } else { +withShuffle + } + + withSort +} + } + + if (meetsRequirements && compatible && !needsAnySort) { operator } else { // At least one child does not satisfies its required data distribution or // at least one child's outputPartitioning is not compatible with another child's // outputPartitioning. In this case, we need to add Exchange operators. -val repartitionedChildren = operator.requiredChildDistribution.zip(operator.children).map { - case (AllTuples, child) => -addExchangeIfNecessary(SinglePartition, child) - case (ClusteredDistribution(clustering), child) => -addExchangeIfNecessary(HashPartitioning(clustering, numPartitions), child) - case (OrderedDistribution(ordering), child) => -addExchangeIfNecessary(RangePartitioning(ordering, numPartitions), child) - case (UnspecifiedDistribution, child) => child - case (dist, _) => sys.error(s"Don't know how to ensure $dist") +val requirements = + (operator.requiredChildDistribution, operator.requiredChildOrdering, operator.children) + +val fixedChildren = requirements.zipped.map { + case (AllTuples, rowOrdering, child) => +addOperatorsIfNecessary(SinglePartition, rowOrdering, child) + case (ClusteredDistribution(clustering), rowOrdering, child) => +addOperatorsIfNecessary(HashPartitioning(clustering, numPartitions), rowOrdering, child) + case (OrderedDistribution(ordering), rowOrdering, child) => +addOperatorsIfNecessary(RangePartitioning(ordering, numPartitions), Nil, child) --- End diff -- @yhuai Good catch. @adrian-wang I already modified `addOperatorsIfNecessary` and `Exchange` so that they could handle ordering for `RangePartitioning`. We just need to pass the information from the `match` into the function call. The problem with not propagating the information here is that we will silently fail to order correctly, instead of throwing an error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6443][Spark Submit]Could not submit app...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/5116#issuecomment-93216337 ping? @andrewor14 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28395812 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala --- @@ -157,28 +205,61 @@ private[sql] case class AddExchange(sqlContext: SQLContext) extends Rule[SparkPl case Seq(a,b) => a compatibleWith b }.exists(!_) - // Check if the partitioning we want to ensure is the same as the child's output - // partitioning. If so, we do not need to add the Exchange operator. - def addExchangeIfNecessary(partitioning: Partitioning, child: SparkPlan): SparkPlan = -if (child.outputPartitioning != partitioning) Exchange(partitioning, child) else child + // Adds Exchange or Sort operators as required + def addOperatorsIfNecessary( + partitioning: Partitioning, + rowOrdering: Seq[SortOrder], + child: SparkPlan): SparkPlan = { +val needSort = rowOrdering.nonEmpty && child.outputOrdering != rowOrdering +val needsShuffle = child.outputPartitioning != partitioning +val canSortWithShuffle = Exchange.canSortWithShuffle(partitioning, rowOrdering) + +if (needSort && needsShuffle && canSortWithShuffle) { + Exchange(partitioning, rowOrdering, child) +} else { + val withShuffle = if (needsShuffle) { +Exchange(partitioning, Nil, child) + } else { +child + } - if (meetsRequirements && compatible) { + val withSort = if (needSort) { +Sort(rowOrdering, global = false, withShuffle) + } else { +withShuffle + } + + withSort +} + } + + if (meetsRequirements && compatible && !needsAnySort) { operator } else { // At least one child does not satisfies its required data distribution or // at least one child's outputPartitioning is not compatible with another child's // outputPartitioning. In this case, we need to add Exchange operators. -val repartitionedChildren = operator.requiredChildDistribution.zip(operator.children).map { - case (AllTuples, child) => -addExchangeIfNecessary(SinglePartition, child) - case (ClusteredDistribution(clustering), child) => -addExchangeIfNecessary(HashPartitioning(clustering, numPartitions), child) - case (OrderedDistribution(ordering), child) => -addExchangeIfNecessary(RangePartitioning(ordering, numPartitions), child) - case (UnspecifiedDistribution, child) => child - case (dist, _) => sys.error(s"Don't know how to ensure $dist") +val requirements = + (operator.requiredChildDistribution, operator.requiredChildOrdering, operator.children) + +val fixedChildren = requirements.zipped.map { + case (AllTuples, rowOrdering, child) => +addOperatorsIfNecessary(SinglePartition, rowOrdering, child) + case (ClusteredDistribution(clustering), rowOrdering, child) => +addOperatorsIfNecessary(HashPartitioning(clustering, numPartitions), rowOrdering, child) + case (OrderedDistribution(ordering), rowOrdering, child) => +addOperatorsIfNecessary(RangePartitioning(ordering, numPartitions), Nil, child) --- End diff -- Since we have already have all of the needed functions, why not put the `rowOrdering` back (if it is indeed wrong to ignore it)? If we leave it as is, in future maybe a new `SparkPlan` requires both OrderedDistribution and some kind of row ordering (for example, using range partitioner to handle data skew), then the physical plan will be wrong. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6879][HistoryServer]check if app is com...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/5491#issuecomment-93216107 @vanzin Now I use an extra global ListBuffer to store the apps to clean. Update its content and delete its dirs/files in every clean round. I know the elements in this ListBuffer could be type of `Path` or `String` for less space occupied. But for simple logic I just leave it as `FsApplicationHistoryInfo`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-93215769 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30309/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-93215758 [Test build #30309 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30309/consoleFull) for PR 5208 at commit [`f515cd2`](https://github.com/apache/spark/commit/f515cd29bbe7765eefbb185ad26b5dbb9e2d7380). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Exchange(` * `case class SortMergeJoin(` * This patch **adds the following new dependencies:** * `snappy-java-1.1.1.7.jar` * This patch **removes the following dependencies:** * `snappy-java-1.1.1.6.jar` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6352] [SQL] Add DirectParquetOutputComm...
Github user ypcat commented on the pull request: https://github.com/apache/spark/pull/5042#issuecomment-93214702 I cannot find a way to unset a config value in hadoop 1.x API. The closest thing is to set it to a default value, which I think should be fine in test code. And I found I cannot add more commits to this PR since it is closed. Should we reopen it or use a new PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6898][SQL] completely support special c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5511#issuecomment-93214000 [Test build #30313 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30313/consoleFull) for PR 5511 at commit [`48e3e57`](https://github.com/apache/spark/commit/48e3e57e2dd7ac11002515bcb8906eb1215ab0cf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6879][HistoryServer]check if app is com...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5491#issuecomment-93213684 [Test build #30314 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30314/consoleFull) for PR 5491 at commit [`d7455d8`](https://github.com/apache/spark/commit/d7455d8df310d690d8104663dc39508011726d12). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-93213692 [Test build #30315 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30315/consoleFull) for PR 5208 at commit [`f91a2ae`](https://github.com/apache/spark/commit/f91a2aecf795b2a2b2b834bf69b21875ef6f0b6f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28395453 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala --- @@ -157,28 +205,61 @@ private[sql] case class AddExchange(sqlContext: SQLContext) extends Rule[SparkPl case Seq(a,b) => a compatibleWith b }.exists(!_) - // Check if the partitioning we want to ensure is the same as the child's output - // partitioning. If so, we do not need to add the Exchange operator. - def addExchangeIfNecessary(partitioning: Partitioning, child: SparkPlan): SparkPlan = -if (child.outputPartitioning != partitioning) Exchange(partitioning, child) else child + // Adds Exchange or Sort operators as required + def addOperatorsIfNecessary( + partitioning: Partitioning, + rowOrdering: Seq[SortOrder], + child: SparkPlan): SparkPlan = { +val needSort = rowOrdering.nonEmpty && child.outputOrdering != rowOrdering +val needsShuffle = child.outputPartitioning != partitioning +val canSortWithShuffle = Exchange.canSortWithShuffle(partitioning, rowOrdering) + +if (needSort && needsShuffle && canSortWithShuffle) { + Exchange(partitioning, rowOrdering, child) +} else { + val withShuffle = if (needsShuffle) { +Exchange(partitioning, Nil, child) + } else { +child + } - if (meetsRequirements && compatible) { + val withSort = if (needSort) { +Sort(rowOrdering, global = false, withShuffle) + } else { +withShuffle + } + + withSort +} + } + + if (meetsRequirements && compatible && !needsAnySort) { operator } else { // At least one child does not satisfies its required data distribution or // at least one child's outputPartitioning is not compatible with another child's // outputPartitioning. In this case, we need to add Exchange operators. -val repartitionedChildren = operator.requiredChildDistribution.zip(operator.children).map { - case (AllTuples, child) => -addExchangeIfNecessary(SinglePartition, child) - case (ClusteredDistribution(clustering), child) => -addExchangeIfNecessary(HashPartitioning(clustering, numPartitions), child) - case (OrderedDistribution(ordering), child) => -addExchangeIfNecessary(RangePartitioning(ordering, numPartitions), child) - case (UnspecifiedDistribution, child) => child - case (dist, _) => sys.error(s"Don't know how to ensure $dist") +val requirements = + (operator.requiredChildDistribution, operator.requiredChildOrdering, operator.children) + +val fixedChildren = requirements.zipped.map { + case (AllTuples, rowOrdering, child) => +addOperatorsIfNecessary(SinglePartition, rowOrdering, child) + case (ClusteredDistribution(clustering), rowOrdering, child) => +addOperatorsIfNecessary(HashPartitioning(clustering, numPartitions), rowOrdering, child) + case (OrderedDistribution(ordering), rowOrdering, child) => +addOperatorsIfNecessary(RangePartitioning(ordering, numPartitions), Nil, child) --- End diff -- Currently only sort merge join will require `childOrdering`, and in that case it could not be `RangePartitioning`, so it doesn't matter if we not handle rowOrdering for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6879][HistoryServer]check if app is com...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5491#issuecomment-93210950 [Test build #30312 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30312/consoleFull) for PR 5491 at commit [`b0abca5`](https://github.com/apache/spark/commit/b0abca54d693399ec2ebd966309b0aded735dd06). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6692][YARN] Add an option for client to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5343#issuecomment-93210723 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30302/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6692][YARN] Add an option for client to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5343#issuecomment-93210645 [Test build #30302 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30302/consoleFull) for PR 5343 at commit [`2a3fa38`](https://github.com/apache/spark/commit/2a3fa381708ce5319ca3786a079c866b70467e81). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch **adds the following new dependencies:** * `commons-math3-3.4.1.jar` * `snappy-java-1.1.1.7.jar` * This patch **removes the following dependencies:** * `commons-math3-3.1.1.jar` * `snappy-java-1.1.1.6.jar` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5358#issuecomment-93208166 [Test build #30311 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30311/consoleFull) for PR 5358 at commit [`8909a5d`](https://github.com/apache/spark/commit/8909a5d14dccc1933a451261bc0e56a2cc876897). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28395272 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala --- @@ -102,6 +106,8 @@ case class Limit(limit: Int, child: SparkPlan) override def output: Seq[Attribute] = child.output override def outputPartitioning: Partitioning = SinglePartition + override def outputOrdering: Seq[SortOrder] = child.outputOrdering --- End diff -- I am not sure it is correct. We are merging rows from multiple partitions to a single partition and `outputOrdering` only guarantee the row ordering within a single partition. Seems without merge sort, we cannot use `child.outputOrdering`. How about we just remove it for now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28395119 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala --- @@ -157,28 +205,61 @@ private[sql] case class AddExchange(sqlContext: SQLContext) extends Rule[SparkPl case Seq(a,b) => a compatibleWith b }.exists(!_) - // Check if the partitioning we want to ensure is the same as the child's output - // partitioning. If so, we do not need to add the Exchange operator. - def addExchangeIfNecessary(partitioning: Partitioning, child: SparkPlan): SparkPlan = -if (child.outputPartitioning != partitioning) Exchange(partitioning, child) else child + // Adds Exchange or Sort operators as required + def addOperatorsIfNecessary( + partitioning: Partitioning, + rowOrdering: Seq[SortOrder], + child: SparkPlan): SparkPlan = { +val needSort = rowOrdering.nonEmpty && child.outputOrdering != rowOrdering +val needsShuffle = child.outputPartitioning != partitioning +val canSortWithShuffle = Exchange.canSortWithShuffle(partitioning, rowOrdering) + +if (needSort && needsShuffle && canSortWithShuffle) { + Exchange(partitioning, rowOrdering, child) +} else { + val withShuffle = if (needsShuffle) { +Exchange(partitioning, Nil, child) + } else { +child + } - if (meetsRequirements && compatible) { + val withSort = if (needSort) { +Sort(rowOrdering, global = false, withShuffle) + } else { +withShuffle + } + + withSort +} + } + + if (meetsRequirements && compatible && !needsAnySort) { operator } else { // At least one child does not satisfies its required data distribution or // at least one child's outputPartitioning is not compatible with another child's // outputPartitioning. In this case, we need to add Exchange operators. -val repartitionedChildren = operator.requiredChildDistribution.zip(operator.children).map { - case (AllTuples, child) => -addExchangeIfNecessary(SinglePartition, child) - case (ClusteredDistribution(clustering), child) => -addExchangeIfNecessary(HashPartitioning(clustering, numPartitions), child) - case (OrderedDistribution(ordering), child) => -addExchangeIfNecessary(RangePartitioning(ordering, numPartitions), child) - case (UnspecifiedDistribution, child) => child - case (dist, _) => sys.error(s"Don't know how to ensure $dist") +val requirements = + (operator.requiredChildDistribution, operator.requiredChildOrdering, operator.children) + +val fixedChildren = requirements.zipped.map { + case (AllTuples, rowOrdering, child) => +addOperatorsIfNecessary(SinglePartition, rowOrdering, child) + case (ClusteredDistribution(clustering), rowOrdering, child) => +addOperatorsIfNecessary(HashPartitioning(clustering, numPartitions), rowOrdering, child) + case (OrderedDistribution(ordering), rowOrdering, child) => +addOperatorsIfNecessary(RangePartitioning(ordering, numPartitions), Nil, child) --- End diff -- @marmbrus Seems we should not ignore the `rowOrdering`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6350][Mesos] Make mesosExecutorCores co...
Github user jongyoul commented on the pull request: https://github.com/apache/spark/pull/5063#issuecomment-93206397 @andrewor14 Thanks for overall reviewing. I'll handle what you issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6735:[YARN] Adding properties to disable...
Github user twinkle-sachdeva commented on a diff in the pull request: https://github.com/apache/spark/pull/5449#discussion_r28394987 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -94,6 +98,14 @@ private[yarn] class YarnAllocator( // Additional memory overhead. protected val memoryOverhead: Int = sparkConf.getInt("spark.yarn.executor.memoryOverhead", math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toInt, MEMORY_OVERHEAD_MIN)) + + // Make the maximum executor failure check to be relative with respect to duration + private val relativeMaxExecutorFailureCheck = --- End diff -- Sounds reasonable. Added the property as spark.yarn.max.executor.failuresPerMinute --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-93205970 [Test build #30309 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30309/consoleFull) for PR 5208 at commit [`f515cd2`](https://github.com/apache/spark/commit/f515cd29bbe7765eefbb185ad26b5dbb9e2d7380). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28394994 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala --- @@ -157,28 +205,61 @@ private[sql] case class AddExchange(sqlContext: SQLContext) extends Rule[SparkPl case Seq(a,b) => a compatibleWith b }.exists(!_) - // Check if the partitioning we want to ensure is the same as the child's output - // partitioning. If so, we do not need to add the Exchange operator. - def addExchangeIfNecessary(partitioning: Partitioning, child: SparkPlan): SparkPlan = -if (child.outputPartitioning != partitioning) Exchange(partitioning, child) else child + // Adds Exchange or Sort operators as required + def addOperatorsIfNecessary( + partitioning: Partitioning, + rowOrdering: Seq[SortOrder], + child: SparkPlan): SparkPlan = { +val needSort = rowOrdering.nonEmpty && child.outputOrdering != rowOrdering +val needsShuffle = child.outputPartitioning != partitioning +val canSortWithShuffle = Exchange.canSortWithShuffle(partitioning, rowOrdering) + +if (needSort && needsShuffle && canSortWithShuffle) { + Exchange(partitioning, rowOrdering, child) +} else { + val withShuffle = if (needsShuffle) { +Exchange(partitioning, Nil, child) + } else { +child + } - if (meetsRequirements && compatible) { + val withSort = if (needSort) { +Sort(rowOrdering, global = false, withShuffle) + } else { +withShuffle + } + + withSort +} + } + + if (meetsRequirements && compatible && !needsAnySort) { operator } else { // At least one child does not satisfies its required data distribution or // at least one child's outputPartitioning is not compatible with another child's // outputPartitioning. In this case, we need to add Exchange operators. -val repartitionedChildren = operator.requiredChildDistribution.zip(operator.children).map { - case (AllTuples, child) => -addExchangeIfNecessary(SinglePartition, child) - case (ClusteredDistribution(clustering), child) => -addExchangeIfNecessary(HashPartitioning(clustering, numPartitions), child) - case (OrderedDistribution(ordering), child) => -addExchangeIfNecessary(RangePartitioning(ordering, numPartitions), child) - case (UnspecifiedDistribution, child) => child - case (dist, _) => sys.error(s"Don't know how to ensure $dist") +val requirements = + (operator.requiredChildDistribution, operator.requiredChildOrdering, operator.children) + +val fixedChildren = requirements.zipped.map { + case (AllTuples, rowOrdering, child) => +addOperatorsIfNecessary(SinglePartition, rowOrdering, child) + case (ClusteredDistribution(clustering), rowOrdering, child) => +addOperatorsIfNecessary(HashPartitioning(clustering, numPartitions), rowOrdering, child) + case (OrderedDistribution(ordering), rowOrdering, child) => +addOperatorsIfNecessary(RangePartitioning(ordering, numPartitions), Nil, child) + + case (UnspecifiedDistribution, Seq(), child) => +child + case (UnspecifiedDistribution, rowOrdering, child) => +Sort(rowOrdering, global = false, child) --- End diff -- Use `execution.ExternalSort` when `sqlContext.conf.externalSortEnabled` is true. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4723#issuecomment-93205694 [Test build #30310 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30310/consoleFull) for PR 4723 at commit [`dc0cf6f`](https://github.com/apache/spark/commit/dc0cf6ffdd6f4c4c58a47f69ecef3f9103caef4f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28394950 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala --- @@ -157,28 +205,61 @@ private[sql] case class AddExchange(sqlContext: SQLContext) extends Rule[SparkPl case Seq(a,b) => a compatibleWith b }.exists(!_) - // Check if the partitioning we want to ensure is the same as the child's output - // partitioning. If so, we do not need to add the Exchange operator. - def addExchangeIfNecessary(partitioning: Partitioning, child: SparkPlan): SparkPlan = -if (child.outputPartitioning != partitioning) Exchange(partitioning, child) else child + // Adds Exchange or Sort operators as required + def addOperatorsIfNecessary( + partitioning: Partitioning, + rowOrdering: Seq[SortOrder], + child: SparkPlan): SparkPlan = { +val needSort = rowOrdering.nonEmpty && child.outputOrdering != rowOrdering +val needsShuffle = child.outputPartitioning != partitioning +val canSortWithShuffle = Exchange.canSortWithShuffle(partitioning, rowOrdering) + +if (needSort && needsShuffle && canSortWithShuffle) { + Exchange(partitioning, rowOrdering, child) +} else { + val withShuffle = if (needsShuffle) { +Exchange(partitioning, Nil, child) + } else { +child + } - if (meetsRequirements && compatible) { + val withSort = if (needSort) { +Sort(rowOrdering, global = false, withShuffle) --- End diff -- Like what we do in `SparkStrategies`, use `execution.ExternalSort` when `sqlContext.conf.externalSortEnabled` is `true`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28394813 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala --- @@ -19,24 +19,39 @@ package org.apache.spark.sql.execution import org.apache.spark.annotation.DeveloperApi import org.apache.spark.shuffle.sort.SortShuffleManager -import org.apache.spark.sql.catalyst.expressions import org.apache.spark.{SparkEnv, HashPartitioner, RangePartitioner, SparkConf} import org.apache.spark.rdd.{RDD, ShuffledRDD} import org.apache.spark.sql.{SQLContext, Row} import org.apache.spark.sql.catalyst.errors.attachTree -import org.apache.spark.sql.catalyst.expressions.{Attribute, RowOrdering} +import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.plans.physical._ import org.apache.spark.sql.catalyst.rules.Rule import org.apache.spark.util.MutablePair +object Exchange { + /** Returns true when the ordering expressions are a subset of the key. */ + def canSortWithShuffle(partitioning: Partitioning, desiredOrdering: Seq[SortOrder]): Boolean = { --- End diff -- It will be good to also explain that we need the ordering expressions to be a subset of the key because we are taking advantage of `ShuffledRDD`'s `KeyOrdering` for sorting. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6800][SQL] Update doc for JDBCRelation'...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5488#issuecomment-93203795 [Test build #30307 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30307/consoleFull) for PR 5488 at commit [`1dcc929`](https://github.com/apache/spark/commit/1dcc9294d0a5a6e9ac58536c0b39ccb433b89b1c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6898][SQL] completely support special c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5511#issuecomment-93203778 [Test build #30306 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30306/consoleFull) for PR 5511 at commit [`820dc45`](https://github.com/apache/spark/commit/820dc4515f968fbbee01dc073fc3813a4fc9d9d0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4723#issuecomment-93203535 [Test build #30308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30308/consoleFull) for PR 4723 at commit [`9da49be`](https://github.com/apache/spark/commit/9da49be0cf2e569a9c871dd7bbb3aee7820f9e0e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28394599 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala --- @@ -120,27 +161,34 @@ case class Exchange(newPartitioning: Partitioning, child: SparkPlan) extends Una * Ensures that the [[org.apache.spark.sql.catalyst.plans.physical.Partitioning Partitioning]] * of input data meets the * [[org.apache.spark.sql.catalyst.plans.physical.Distribution Distribution]] requirements for - * each operator by inserting [[Exchange]] Operators where required. + * each operator by inserting [[Exchange]] Operators where required. Also ensure that the + * required input partition ordering requirements are met. */ -private[sql] case class AddExchange(sqlContext: SQLContext) extends Rule[SparkPlan] { +private[sql] case class EnsureRequirements(sqlContext: SQLContext) extends Rule[SparkPlan] { // TODO: Determine the number of partitions. def numPartitions: Int = sqlContext.conf.numShufflePartitions def apply(plan: SparkPlan): SparkPlan = plan.transformUp { case operator: SparkPlan => - // Check if every child's outputPartitioning satisfies the corresponding + // True iff every child's outputPartitioning satisfies the corresponding // required data distribution. def meetsRequirements: Boolean = -!operator.requiredChildDistribution.zip(operator.children).map { +operator.requiredChildDistribution.zip(operator.children).forall { case (required, child) => val valid = child.outputPartitioning.satisfies(required) logDebug( s"${if (valid) "Valid" else "Invalid"} distribution," + s"required: $required current: ${child.outputPartitioning}") valid -}.exists(!_) +} - // Check if outputPartitionings of children are compatible with each other. + // True iff any of the children are incorrectly sorted. + def needsAnySort: Boolean = +operator.requiredChildOrdering.zip(operator.children).exists { + case (required, child) => required.nonEmpty && required != child --- End diff -- Seems you want `case (required, child) => required.nonEmpty && required != child. outputOrdering`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6899][SQL] Fix type mismatch when using...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5517#issuecomment-93202277 [Test build #30305 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30305/consoleFull) for PR 5517 at commit [`8ae5f65`](https://github.com/apache/spark/commit/8ae5f6505a68f6ef0bed2cd3fb3bd72a61156e22). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6899][SQL] Fix type mismatch when using...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/5517 [SPARK-6899][SQL] Fix type mismatch when using codegen with Average JIRA https://issues.apache.org/jira/browse/SPARK-6899 You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 fix_codegen_average Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5517.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5517 commit 8ae5f6505a68f6ef0bed2cd3fb3bd72a61156e22 Author: Liang-Chi Hsieh Date: 2015-04-15T05:31:04Z Add the case of DecimalType.Unlimited to Average. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28394315 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala --- @@ -91,6 +94,16 @@ class JoinSuite extends QueryTest with BeforeAndAfterEach { ("SELECT * FROM testData full JOIN testData2 ON (key * a != key + a)", classOf[BroadcastNestedLoopJoin]) ).foreach { case (query, joinClass) => assertJoin(query, joinClass) } +try { + conf.setConf("spark.sql.planner.sortMergeJoin", "true") + Seq( +("SELECT * FROM testData JOIN testData2 ON key = a", classOf[SortMergeJoin]), +("SELECT * FROM testData JOIN testData2 ON key = a and key = 2", classOf[SortMergeJoin]), +("SELECT * FROM testData JOIN testData2 ON key = a where key = 2", classOf[SortMergeJoin]) + ).foreach { case (query, joinClass) => assertJoin(query, joinClass) } +} finally { + conf.setConf("spark.sql.planner.sortMergeJoin", SORTMERGEJOIN_ENABLED.toString) +} } test("broadcasted hash join operator selection") { --- End diff -- Let's also add a test in this one to make sure broadcast join will be selected when sort merge join is on. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5208#discussion_r28394224 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala --- @@ -51,6 +51,8 @@ class JoinSuite extends QueryTest with BeforeAndAfterEach { case j: CartesianProduct => j case j: BroadcastNestedLoopJoin => j case j: BroadcastLeftSemiJoinHash => j + case j: ShuffledHashJoin => j --- End diff -- Seems it is the first `case`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/5350#issuecomment-93195996 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6800][SQL] Update doc for JDBCRelation'...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/5488#discussion_r28394137 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala --- @@ -50,9 +50,11 @@ private[sql] object JDBCRelation { * Given a partitioning schematic (a column of integral type, a number of * partitions, and upper and lower bounds on the column's value), generate * WHERE clauses for each partition so that each row in the table appears - * exactly once. The parameters minValue and maxValue are advisory in that + * exactly once. The parameters minValue and maxValue are advisory in that * incorrect values may cause the partitioning to be poor, but no data - * will fail to be represented. + * will fail to be represented. Note: the upper and lower bounds are just + * used to decide partition stride, not for filtering. So all the rows in + * table will be partitioned. --- End diff -- > The parameters minValue and maxValue are advisory in that incorrect values may cause the partitioning to be poor, but no data will fail to be represented. The sentence above already explains that the filters are only used for partitioning and that all data will always be returned. I think the best place to update would be in the [SQL programming guide](https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md), in the table under the section "JDBC To Other Databases". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5350#discussion_r28394119 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala --- @@ -76,6 +76,12 @@ case class DropTable( private[hive] case class AddJar(path: String) extends RunnableCommand { + override val output: Seq[Attribute] = { +val schema = StructType( + StructField("result", IntegerType, false) :: Nil) +schema.toAttributes + } --- End diff -- OK, the reason is to match the behavior of Hive... This change looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-93194500 [Test build #30304 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30304/consoleFull) for PR 5208 at commit [`ec8061b`](https://github.com/apache/spark/commit/ec8061b7f36b87c883af111438ac9ff0304050d7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5794] [SQL] fix add jar
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/4586#discussion_r28394095 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala --- @@ -79,7 +79,7 @@ case class AddJar(path: String) extends RunnableCommand { val hiveContext = sqlContext.asInstanceOf[HiveContext] hiveContext.runSqlHive(s"ADD JAR $path") hiveContext.sparkContext.addJar(path) -Seq.empty[Row] +Seq(Row(0)) --- End diff -- (I thought it may be better to comment at the original pr). OK, I see. In future, let's make sure we also update the `output` if the result of a command is not an empty Seq (#5350 will change the schema for `AddJar`). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6898][SQL] completely support special c...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/5511#discussion_r28394069 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -59,7 +64,7 @@ case class UnresolvedAttribute(name: String) extends Attribute with trees.LeafNo override def newInstance(): UnresolvedAttribute = this override def withNullability(newNullability: Boolean): UnresolvedAttribute = this override def withQualifiers(newQualifiers: Seq[String]): UnresolvedAttribute = this - override def withName(newName: String): UnresolvedAttribute = UnresolvedAttribute(name) --- End diff -- No, that seems wrong to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6898][SQL] completely support special c...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/5511#discussion_r28394051 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -59,7 +64,7 @@ case class UnresolvedAttribute(name: String) extends Attribute with trees.LeafNo override def newInstance(): UnresolvedAttribute = this override def withNullability(newNullability: Boolean): UnresolvedAttribute = this override def withQualifiers(newQualifiers: Seq[String]): UnresolvedAttribute = this - override def withName(newName: String): UnresolvedAttribute = UnresolvedAttribute(name) --- End diff -- Origin code ignore the `newName`. Is this intended? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5794] [SQL] fix add jar
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/4586#discussion_r28394029 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala --- @@ -79,7 +79,7 @@ case class AddJar(path: String) extends RunnableCommand { val hiveContext = sqlContext.asInstanceOf[HiveContext] hiveContext.runSqlHive(s"ADD JAR $path") hiveContext.sparkContext.addJar(path) -Seq.empty[Row] +Seq(Row(0)) --- End diff -- Hive would return a `0` for add jar command. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5350#discussion_r28393963 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala --- @@ -76,6 +76,12 @@ case class DropTable( private[hive] case class AddJar(path: String) extends RunnableCommand { + override val output: Seq[Attribute] = { +val schema = StructType( + StructField("result", IntegerType, false) :: Nil) +schema.toAttributes + } --- End diff -- I do not really know the reason that the result of AddJar is a `Row(0)` (see a few lines below.). But, we can figure it out after we merge it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5794] [SQL] fix add jar
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/4586#discussion_r28393926 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala --- @@ -79,7 +79,7 @@ case class AddJar(path: String) extends RunnableCommand { val hiveContext = sqlContext.asInstanceOf[HiveContext] hiveContext.runSqlHive(s"ADD JAR $path") hiveContext.sparkContext.addJar(path) -Seq.empty[Row] +Seq(Row(0)) --- End diff -- @adrian-wang Why we need a Row with a value of 0 at here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6898][SQL] completely support special c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5511#issuecomment-93192357 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30298/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6898][SQL] completely support special c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5511#issuecomment-93192335 [Test build #30298 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30298/consoleFull) for PR 5511 at commit [`d81ad43`](https://github.com/apache/spark/commit/d81ad43e5e07fe2227db7bb383c98c6d2c0fb875). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch **removes the following dependencies:** * `RoaringBitmap-0.4.5.jar` * `akka-actor_2.10-2.3.4-spark.jar` * `akka-remote_2.10-2.3.4-spark.jar` * `akka-slf4j_2.10-2.3.4-spark.jar` * `arpack_combined_all-0.1.jar` * `breeze-macros_2.10-0.3.1.jar` * `breeze_2.10-0.10.jar` * `chill-java-0.5.0.jar` * `chill_2.10-0.5.0.jar` * `commons-beanutils-1.7.0.jar` * `commons-beanutils-core-1.8.0.jar` * `commons-codec-1.5.jar` * `commons-collections-3.2.1.jar` * `commons-configuration-1.6.jar` * `commons-digester-1.8.jar` * `commons-el-1.0.jar` * `commons-httpclient-3.1.jar` * `commons-io-2.4.jar` * `commons-lang-2.4.jar` * `commons-lang3-3.3.2.jar` * `commons-math-2.1.jar` * `commons-math3-3.1.1.jar` * `commons-net-2.2.jar` * `compress-lzf-1.0.0.jar` * `config-1.2.1.jar` * `core-1.1.2.jar` * `curator-client-2.4.0.jar` * `curator-framework-2.4.0.jar` * `curator-recipes-2.4.0.jar` * `groovy-all-2.3.7.jar` * `guava-14.0.1.jar` * `hadoop-client-1.0.4.jar` * `hadoop-core-1.0.4.jar` * `hsqldb-1.8.0.10.jar` * `ivy-2.4.0.jar` * `jackson-annotations-2.3.0.jar` * `jackson-core-2.3.0.jar` * `jackson-core-asl-1.8.8.jar` * `jackson-databind-2.3.0.jar` * `jackson-mapper-asl-1.8.8.jar` * `jansi-1.4.jar` * `javax.servlet-3.0.0.v201112011016.jar` * `jblas-1.2.3.jar` * `jcl-over-slf4j-1.7.10.jar` * `jets3t-0.7.1.jar` * `jline-0.9.94.jar` * `jline-2.10.4.jar` * `jodd-core-3.6.3.jar` * `json4s-ast_2.10-3.2.10.jar` * `json4s-core_2.10-3.2.10.jar` * `json4s-jackson_2.10-3.2.10.jar` * `jsr305-1.3.9.jar` * `jtransforms-2.4.0.jar` * `jul-to-slf4j-1.7.10.jar` * `kryo-2.21.jar` * `log4j-1.2.17.jar` * `lz4-1.2.0.jar` * `mesos-0.21.0-shaded-protobuf.jar` * `metrics-core-3.1.0.jar` * `metrics-graphite-3.1.0.jar` * `metrics-json-3.1.0.jar` * `metrics-jvm-3.1.0.jar` * `minlog-1.2.jar` * `netty-3.8.0.Final.jar` * `netty-all-4.0.23.Final.jar` * `objenesis-1.2.jar` * `opencsv-2.3.jar` * `oro-2.0.8.jar` * `paranamer-2.6.jar` * `parquet-column-1.6.0rc3.jar` * `parquet-common-1.6.0rc3.jar` * `parquet-encoding-1.6.0rc3.jar` * `parquet-format-2.2.0-rc1.jar` * `parquet-generator-1.6.0rc3.jar` * `parquet-hadoop-1.6.0rc3.jar` * `parquet-jackson-1.6.0rc3.jar` * `protobuf-java-2.5.0-spark.jar` * `py4j-0.8.2.1.jar` * `pyrolite-2.0.1.jar` * `quasiquotes_2.10-2.0.1.jar` * `reflectasm-1.07-shaded.jar` * `scala-compiler-2.10.4.jar` * `scala-library-2.10.4.jar` * `scala-reflect-2.10.4.jar` * `scalap-2.10.4.jar` * `scalatest_2.10-2.2.1.jar` * `slf4j-api-1.7.10.jar` * `slf4j-log4j12-1.7.10.jar` * `snappy-java-1.1.1.6.jar` * `spark-bagel_2.10-1.3.0-SNAPSHOT.jar` * `spark-catalyst_2.10-1.3.0-SNAPSHOT.jar` * `spark-core_2.10-1.3.0-SNAPSHOT.jar` * `spark-graphx_2.10-1.3.0-SNAPSHOT.jar` * `spark-mllib_2.10-1.3.0-SNAPSHOT.jar` * `spark-network-common_2.10-1.3.0-SNAPSHOT.jar` * `spark-network-shuffle_2.10-1.3.0-SNAPSHOT.jar` * `spark-repl_2.10-1.3.0-SNAPSHOT.jar` * `spark-sql_2.10-1.3.0-SNAPSHOT.jar` * `spark-streaming_2.10-1.3.0-SNAPSHOT.jar` * `spire-macros_2.10-0.7.4.jar` * `spire_2.10-0.7.4.jar` * `stream-2.7.0.jar` * `tachyon-0.5.0.jar` * `tachyon-client-0.5.0.jar` * `uncommons-maths-1.2.2a.jar` * `unused-1.0.0.jar` * `xmlenc-0.52.jar` * `zookeeper-3.4.5.jar` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-93190389 @yhuai can you do another pass over `Exchange.scala`? I made several changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-93189576 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30299/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2213] [SQL] sort merge join for spark s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5208#issuecomment-93189565 [Test build #30299 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30299/consoleFull) for PR 5208 at commit [`413fd24`](https://github.com/apache/spark/commit/413fd24a53d3b86eed7a57c130973da4417e8393). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Exchange(` * `case class SortMergeJoin(` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5350#issuecomment-93188608 [Test build #30303 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30303/consoleFull) for PR 5350 at commit [`3b7bfa8`](https://github.com/apache/spark/commit/3b7bfa8f37e7f2b9aefdfd0e5e57d7b5c6b516ce). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5350#issuecomment-93188052 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30297/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5350#issuecomment-93188027 [Test build #30297 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30297/consoleFull) for PR 5350 at commit [`2772f0d`](https://github.com/apache/spark/commit/2772f0d8face2f9c634718fb8719fe56c5d8d676). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait CaseConversionExpression ` * `final class UTF8String extends Ordered[UTF8String] with Serializable ` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/5350#discussion_r28392750 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala --- @@ -284,9 +321,9 @@ object CatalystTypeConverters { row: Row, schema: StructType, converters: Array[Any => Any]): Row = { -val ar = new Array[Any](row.size) +val ar = new Array[Any](converters.size) var idx = 0 -while (idx < row.size) { +while (idx < converters.size && idx < row.size) { --- End diff -- It's a new test case in master, I had to merged with master to debug it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/5350#issuecomment-93184082 BTW - since this changes so many files, it'd be great to merge this as soon as possible. We can fix minor problems later in follow up PRs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user davies commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93184039 @shaananc It works fine here: ``` Using Python version 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014 00:54:21) SparkContext available as sc, SQLContext available as sqlContext. >>> data = (1, 2) >>> sc.parallelize(data).reduce(lambda a, b: a + b) 3 ``` What's is your environment? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6638] [SQL] Improve performance of Stri...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5350#discussion_r28392622 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala --- @@ -284,9 +321,9 @@ object CatalystTypeConverters { row: Row, schema: StructType, converters: Array[Any => Any]): Row = { -val ar = new Array[Any](row.size) +val ar = new Array[Any](converters.size) var idx = 0 -while (idx < row.size) { +while (idx < converters.size && idx < row.size) { --- End diff -- btw, where is "ADD JAR command 2"? I could not find it... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6871][SQL] WITH clause in CTE can not f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5480#issuecomment-93183523 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30295/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6871][SQL] WITH clause in CTE can not f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5480#issuecomment-93183518 [Test build #30295 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30295/consoleFull) for PR 5480 at commit [`4da3712`](https://github.com/apache/spark/commit/4da3712f34b4e8672bf143f90fe1279cd114daab). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch **adds the following new dependencies:** * `snappy-java-1.1.1.7.jar` * This patch **removes the following dependencies:** * `snappy-java-1.1.1.6.jar` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6800][SQL] Update doc for JDBCRelation'...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5488#issuecomment-93183475 [Test build #30296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30296/consoleFull) for PR 5488 at commit [`3eb74d6`](https://github.com/apache/spark/commit/3eb74d614a05d33a3071d586fedc20bc4f2e88d6). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5818][SQL] unable to use "add jar" in h...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/spark/pull/5393#discussion_r28392460 --- Diff: repl/pom.xml --- @@ -150,6 +150,16 @@ + hive + + + org.apache.spark + spark-hive_${scala.binary.version} + ${project.version} + + + + --- End diff -- No, dependency is added in hive profile only. So if assembly is built with -Phive option then hive dependency is added to repl, so that hive is available in class path for repl. If assembly is built without -Phive option then dependency is not added and the testcase is ignored. Testcase also checks runtime if hiveContext class is available, if not testcase is ignored. I have manually tested both the cases of building with -Phive and without hive. It will not impact assembly creation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6800][SQL] Update doc for JDBCRelation'...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5488#issuecomment-93183480 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30296/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5358#issuecomment-93183316 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30300/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5358#issuecomment-93183311 [Test build #30300 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30300/consoleFull) for PR 5358 at commit [`9e7aaec`](https://github.com/apache/spark/commit/9e7aaecc5d28914a86a4d8b8da47504efd68bde6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6865][SQL] DataFrame column names shoul...
Github user rxin closed the pull request at: https://github.com/apache/spark/pull/5505 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6865][SQL] DataFrame column names shoul...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/5505#issuecomment-93182797 I discussed with michael offline -- given this would break self-join, we've decided to treat [dot] (i.e. ".") as a special case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6919 Add asDict method to StatCounter
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5516#issuecomment-93181762 [Test build #30292 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30292/consoleFull) for PR 5516 at commit [`c933af7`](https://github.com/apache/spark/commit/c933af75aaeac641e71fead81f5d6804f882eff8). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch **adds the following new dependencies:** * `snappy-java-1.1.1.7.jar` * This patch **removes the following dependencies:** * `snappy-java-1.1.1.6.jar` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6919 Add asDict method to StatCounter
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5516#issuecomment-93181791 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30292/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6692][YARN] Add an option for client to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5343#issuecomment-93181658 [Test build #30302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30302/consoleFull) for PR 5343 at commit [`2a3fa38`](https://github.com/apache/spark/commit/2a3fa381708ce5319ca3786a079c866b70467e81). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6911] [SQL] improve accessor for nested...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5513#discussion_r28392080 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala --- @@ -515,14 +515,15 @@ class Column(protected[sql] val expr: Expression) extends Logging { def rlike(literal: String): Column = RLike(expr, lit(literal).expr) /** - * An expression that gets an item at position `ordinal` out of an array. + * An expression that gets an item at position `ordinal` out of an array, + * or gets a value by key `key` in a [[MapType]]. * * @group expr_ops */ - def getItem(ordinal: Int): Column = GetItem(expr, Literal(ordinal)) + def getItem(key: Any): Column = GetItem(expr, Literal(key)) --- End diff -- that makes sense. @davies can you add a unit test to scala? in ColumnExpressionSuite or DataFrameSuite. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org