[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-194568150 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-194568120 **[Test build #52775 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52775/consoleFull)** for PR 11514 at commit [`6cfa545`](https://github.com/apache/spark/commit/6cfa5450156ae0ad1ca4d5872dfdd2ef56a2e17b). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class In(attribute: String, values: Seq[Any]) extends Filter` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-194568153 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52775/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-194561800 **[Test build #52775 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52775/consoleFull)** for PR 11514 at commit [`6cfa545`](https://github.com/apache/spark/commit/6cfa5450156ae0ad1ca4d5872dfdd2ef56a2e17b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-193651401 **[Test build #52644 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52644/consoleFull)** for PR 11514 at commit [`0278fd9`](https://github.com/apache/spark/commit/0278fd94a230108c37e1e9c17365bd37b30a5288). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-193651407 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52644/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-193651406 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-193650798 **[Test build #2619 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2619/consoleFull)** for PR 11514 at commit [`d2d2062`](https://github.com/apache/spark/commit/d2d206249b16c1a019a294a42c3118e400a21da6). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-193650362 **[Test build #52644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52644/consoleFull)** for PR 11514 at commit [`0278fd9`](https://github.com/apache/spark/commit/0278fd94a230108c37e1e9c17365bd37b30a5288). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-193648651 **[Test build #2619 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2619/consoleFull)** for PR 11514 at commit [`d2d2062`](https://github.com/apache/spark/commit/d2d206249b16c1a019a294a42c3118e400a21da6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-193647377 **[Test build #2618 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2618/consoleFull)** for PR 11514 at commit [`d2d2062`](https://github.com/apache/spark/commit/d2d206249b16c1a019a294a42c3118e400a21da6). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-193642699 **[Test build #2618 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2618/consoleFull)** for PR 11514 at commit [`d2d2062`](https://github.com/apache/spark/commit/d2d206249b16c1a019a294a42c3118e400a21da6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-193511803 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-193511757 **[Test build #52610 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52610/consoleFull)** for PR 11514 at commit [`d2d2062`](https://github.com/apache/spark/commit/d2d206249b16c1a019a294a42c3118e400a21da6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-193511806 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52610/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-193506060 **[Test build #52610 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52610/consoleFull)** for PR 11514 at commit [`d2d2062`](https://github.com/apache/spark/commit/d2d206249b16c1a019a294a42c3118e400a21da6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11514#discussion_r55117715 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -101,15 +101,69 @@ private[sql] case class LogicalRDD( private[sql] case class PhysicalRDD( output: Seq[Attribute], rdd: RDD[InternalRow], -override val nodeName: String, -override val metadata: Map[String, String] = Map.empty, -isUnsafeRow: Boolean = false, -override val outputPartitioning: Partitioning = UnknownPartitioning(0)) +override val nodeName: String) extends LeafNode { + + private[sql] override lazy val metrics = Map( +"numOutputRows" -> SQLMetrics.createLongMetric(sparkContext, "number of output rows")) + + protected override def doExecute(): RDD[InternalRow] = { +val numOutputRows = longMetric("numOutputRows") +rdd.mapPartitionsInternal { iter => + val proj = UnsafeProjection.create(schema) + iter.map { r => +numOutputRows += 1 +proj(r) + } +} + } + + override def simpleString: String = { +s"RDD $nodeName${output.mkString("[", ",", "]")}" + } +} --- End diff -- If a partitioning is UnknownPartitioning, the number is meaningless, I think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/11514#discussion_r55017663 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -101,15 +101,69 @@ private[sql] case class LogicalRDD( private[sql] case class PhysicalRDD( output: Seq[Attribute], rdd: RDD[InternalRow], -override val nodeName: String, -override val metadata: Map[String, String] = Map.empty, -isUnsafeRow: Boolean = false, -override val outputPartitioning: Partitioning = UnknownPartitioning(0)) +override val nodeName: String) extends LeafNode { + + private[sql] override lazy val metrics = Map( +"numOutputRows" -> SQLMetrics.createLongMetric(sparkContext, "number of output rows")) + + protected override def doExecute(): RDD[InternalRow] = { +val numOutputRows = longMetric("numOutputRows") +rdd.mapPartitionsInternal { iter => + val proj = UnsafeProjection.create(schema) + iter.map { r => +numOutputRows += 1 +proj(r) + } +} + } + + override def simpleString: String = { +s"RDD $nodeName${output.mkString("[", ",", "]")}" + } +} --- End diff -- Should we override `outputPartitioning` and set it to `UnknownPartitioning(rdd.partitions.length)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-192209562 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-192209563 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52453/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-192209289 **[Test build #52453 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52453/consoleFull)** for PR 11514 at commit [`0e78b3a`](https://github.com/apache/spark/commit/0e78b3afde48f1a4d2d470ee715efe1031b89882). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/11514#discussion_r55002119 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -101,15 +101,69 @@ private[sql] case class LogicalRDD( private[sql] case class PhysicalRDD( output: Seq[Attribute], rdd: RDD[InternalRow], -override val nodeName: String, -override val metadata: Map[String, String] = Map.empty, -isUnsafeRow: Boolean = false, -override val outputPartitioning: Partitioning = UnknownPartitioning(0)) +override val nodeName: String) extends LeafNode { + + private[sql] override lazy val metrics = Map( +"numOutputRows" -> SQLMetrics.createLongMetric(sparkContext, "number of output rows")) + + protected override def doExecute(): RDD[InternalRow] = { +val numOutputRows = longMetric("numOutputRows") +rdd.mapPartitionsInternal { iter => + val proj = UnsafeProjection.create(schema) + iter.map { r => +numOutputRows += 1 +proj(r) + } +} + } + + override def simpleString: String = { +s"RDD $nodeName${output.mkString("[", ",", "]")}" + } +} + +/** Physical plan node for scanning data from a relation. */ +private[sql] case class PhysicalScan( --- End diff -- `DataSourceScan`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-192182579 will conflict with #11509 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11514#issuecomment-192174611 **[Test build #52453 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52453/consoleFull)** for PR 11514 at commit [`0e78b3a`](https://github.com/apache/spark/commit/0e78b3afde48f1a4d2d470ee715efe1031b89882). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13671] [SQL] Use different physical pla...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/11514 [SPARK-13671] [SQL] Use different physical plans for RDD and data sources ## What changes were proposed in this pull request? This PR split the PhysicalRDD into two classes, PhysicalRDD and PhysicalScan. PhysicalRDD is used for DataFrames that is created from existing RDD. PhysicalScan is used for DataFrame that is created from data sources. This enable use to apply different optimization on both of them. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark existing_rdd Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11514.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11514 commit 0e78b3afde48f1a4d2d470ee715efe1031b89882 Author: Davies LiuDate: 2016-03-04T07:58:11Z separate physical RDD and scan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org