[GitHub] spark pull request: [SPARK-12379][ML][MLLIB] Copy GBT implementati...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/10607#issuecomment-192170689 @sethah This looks fine to me though there are merge conflicts that need to be resolved. It would be good to get this in ASAP so the work (and clean up that can happen) in [SPARK-12381](https://issues.apache.org/jira/browse/SPARK-12381) and [SPARK-12382](https://issues.apache.org/jira/browse/SPARK-12382) can begin. @jkbradley can you take a quick pass? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13523] [SQL] WIP: reuse exchanges in a ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11403#issuecomment-192164174 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52452/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13523] [SQL] WIP: reuse exchanges in a ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11403#issuecomment-192164158 **[Test build #52452 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52452/consoleFull)** for PR 11403 at commit [`e2b9987`](https://github.com/apache/spark/commit/e2b998702a7f82bc5fdf41ab689efa56631af910). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanType] ` * `abstract class Exchange extends UnaryNode ` * `case class ReusedExchange(override val output: Seq[Attribute], child: Exchange) extends LeafNode ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13523] [SQL] WIP: reuse exchanges in a ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11403#issuecomment-192164166 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13523] [SQL] WIP: reuse exchanges in a ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11403#issuecomment-192163462 **[Test build #52452 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52452/consoleFull)** for PR 11403 at commit [`e2b9987`](https://github.com/apache/spark/commit/e2b998702a7f82bc5fdf41ab689efa56631af910). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12379][ML][MLLIB] Copy GBT implementati...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/10607#discussion_r54998395 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala --- @@ -0,0 +1,272 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.tree.impl + +import org.apache.spark.Logging +import org.apache.spark.mllib.impl.PeriodicRDDCheckpointer +import org.apache.spark.mllib.regression.LabeledPoint +import org.apache.spark.ml.regression.{DecisionTreeRegressionModel, DecisionTreeRegressor} +import org.apache.spark.mllib.tree.configuration.Algo._ +import org.apache.spark.mllib.tree.configuration.BoostingStrategy +import org.apache.spark.mllib.tree.impl.TimeTracker +import org.apache.spark.mllib.tree.impurity.Variance +import org.apache.spark.mllib.tree.loss.Loss +import org.apache.spark.rdd.RDD +import org.apache.spark.storage.StorageLevel + +private[ml] object GradientBoostedTrees extends Logging { + + /** + * Method to train a gradient boosting model + * @param input Training dataset: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]. + * @return a gradient boosted trees model that can be used for prediction + */ + def run(input: RDD[LabeledPoint], + boostingStrategy: BoostingStrategy): (Array[DecisionTreeRegressionModel], Array[Double]) = { +val algo = boostingStrategy.treeStrategy.algo +algo match { + case Regression => +GradientBoostedTrees.boost(input, input, boostingStrategy, validate = false) + case Classification => +// Map labels to -1, +1 so binary classification can be treated as regression. +val remappedInput = input.map(x => new LabeledPoint((x.label * 2) - 1, x.features)) +GradientBoostedTrees.boost(remappedInput, remappedInput, boostingStrategy, validate = false) + case _ => +throw new IllegalArgumentException(s"$algo is not supported by gradient boosting.") +} + } + + /** + * Method to validate a gradient boosting model + * @param input Training dataset: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]. + * @param validationInput Validation dataset. + *This dataset should be different from the training dataset, + *but it should follow the same distribution. + *E.g., these two datasets could be created from an original dataset + *by using [[org.apache.spark.rdd.RDD.randomSplit()]] + * @return a gradient boosted trees model that can be used for prediction + */ + def runWithValidation( + input: RDD[LabeledPoint], + validationInput: RDD[LabeledPoint], + boostingStrategy: BoostingStrategy): (Array[DecisionTreeRegressionModel], Array[Double]) = { +val algo = boostingStrategy.treeStrategy.algo +algo match { + case Regression => +GradientBoostedTrees.boost(input, validationInput, boostingStrategy, validate = true) + case Classification => +// Map labels to -1, +1 so binary classification can be treated as regression. +val remappedInput = input.map( + x => new LabeledPoint((x.label * 2) - 1, x.features)) +val remappedValidationInput = validationInput.map( + x => new LabeledPoint((x.label * 2) - 1, x.features)) +GradientBoostedTrees.boost(remappedInput, remappedValidationInput, boostingStrategy, + validate = true) + case _ => +throw new IllegalArgumentException(s"$algo is not supported by the gradient boosting.") +} + } + + /** + * Compute the initial predictions and errors for a dataset for the first + * iteration of gradient boosting. + * @param data: training data. + * @param initTreeWeight: learning rate assigned to the first tree. + * @param initT
[GitHub] spark pull request: [SPARK-13523] [SQL] WIP: reuse exchanges in a ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11403#issuecomment-192161609 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13523] [SQL] WIP: reuse exchanges in a ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11403#issuecomment-192161600 **[Test build #52451 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52451/consoleFull)** for PR 11403 at commit [`42096c8`](https://github.com/apache/spark/commit/42096c8f1707fd9a66a5dcc4a0df4f8d9d8f046e). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanType] ` * `abstract class Exchange extends UnaryNode ` * `case class ReusedExchange(override val output: Seq[Attribute], child: Exchange) extends LeafNode ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13523] [SQL] WIP: reuse exchanges in a ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11403#issuecomment-192161616 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52451/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13523] [SQL] WIP: reuse exchanges in a ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11403#issuecomment-192161109 **[Test build #52451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52451/consoleFull)** for PR 11403 at commit [`42096c8`](https://github.com/apache/spark/commit/42096c8f1707fd9a66a5dcc4a0df4f8d9d8f046e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/11484#issuecomment-192155333 @kiszk this is not just for Sort operator. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/11484#issuecomment-192154924 Is it better to add "in sort" in a title of this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [ML] testEstimatorAndModelReadWrite should cal...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11513#issuecomment-192152376 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52449/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [ML] testEstimatorAndModelReadWrite should cal...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11513#issuecomment-192152366 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [ML] testEstimatorAndModelReadWrite should cal...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11513#issuecomment-192152149 **[Test build #52449 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52449/consoleFull)** for PR 11513 at commit [`8f95454`](https://github.com/apache/spark/commit/8f95454176b289c886f3eeaa82af0401541663d1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11484#discussion_r54997367 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala --- @@ -109,7 +114,10 @@ trait CodegenSupport extends SparkPlan { * Consume the columns generated from current SparkPlan, call it's parent. */ final def consume(ctx: CodegenContext, input: Seq[ExprCode], row: String = null): String = { -if (input != null) { +// We check if input expressions has same length as output when: +// 1. parent can't consume UnsafeRow and input is not null. +// 2. parent consumes UnsafeRow and row is null. +if ((input != null && !parent.consumeUnsafeRow) || (parent.consumeUnsafeRow && row == null)) { --- End diff -- When the child knows its parent can consume UnsafeRow, it can choose to pass an UnsafeRow and empty input. If so, we don't need to check if `input.length == output.length` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7478][SQL] Added SQLContext.getOrCreate
Github user mwws commented on the pull request: https://github.com/apache/spark/pull/6006#issuecomment-192150357 @jelez you can create an HiveContextSingleton to workaround it. Refer to example "SqlNetWorkWordCount" @tdas Why you removed HiveContext.getOrCreate? I can't find obvious reasons from the conversation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/11484#issuecomment-192148712 @davies Yea. That will be good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13625][PYSPARK][ML] Added a check to se...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/11476#issuecomment-192148581 @BryanCutler does it perhaps make sense to add a little test case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-12925. Improve HiveInspectors.unwrap for...
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/11477#issuecomment-192146953 Thanks @srowen . Incorporated the changes. This was tested with HiveCompatibilitySuite, HiveQuerySuite. These tests ran fine in master branch without the changes as well. However, when tried with 1.6 branch, these test suites failed with the copy issues. Hence doing explicit bytes copy in master, so that this does not fail in future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11484#issuecomment-192146205 @viirya Can we wait for #11274 , then we could avoid some complicity. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11484#discussion_r54996591 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala --- @@ -125,14 +133,27 @@ trait CodegenSupport extends SparkPlan { row: String = null): String = { ctx.freshNamePrefix = variablePrefix if (row != null) { - ctx.currentVars = null - ctx.INPUT_ROW = row - val evals = child.output.zipWithIndex.map { case (attr, i) => -BoundReference(i, attr.dataType, attr.nullable).gen(ctx) + val evals: Seq[ExprCode] = if (!consumeUnsafeRow) { +// If this SparkPlan can't consume UnsafeRow and there is an UnsafeRow, +// we extract the columns from the row and call doConsume. +ctx.currentVars = null +ctx.INPUT_ROW = row +child.output.zipWithIndex.map { case (attr, i) => + BoundReference(i, attr.dataType, attr.nullable).gen(ctx) +} + } else { +// If this SparkPlan consumes UnsafeRow and there is an UnsafeRow, +// we don't need to unpack variables from the row. +Seq.empty + } + val evalCode = if (evals.isEmpty) { +"" + } else { +s"${evals.map(_.code).mkString("\n")}" } s""" - | ${evals.map(_.code).mkString("\n")} - | ${doConsume(ctx, evals)} + | $evalCode + | ${doConsume(ctx, evals, row)} """.stripMargin } else { doConsume(ctx, input) --- End diff -- pass `null` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11484#discussion_r54996609 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala --- @@ -125,14 +133,27 @@ trait CodegenSupport extends SparkPlan { row: String = null): String = { ctx.freshNamePrefix = variablePrefix if (row != null) { - ctx.currentVars = null - ctx.INPUT_ROW = row - val evals = child.output.zipWithIndex.map { case (attr, i) => -BoundReference(i, attr.dataType, attr.nullable).gen(ctx) + val evals: Seq[ExprCode] = if (!consumeUnsafeRow) { +// If this SparkPlan can't consume UnsafeRow and there is an UnsafeRow, +// we extract the columns from the row and call doConsume. +ctx.currentVars = null +ctx.INPUT_ROW = row +child.output.zipWithIndex.map { case (attr, i) => + BoundReference(i, attr.dataType, attr.nullable).gen(ctx) +} + } else { +// If this SparkPlan consumes UnsafeRow and there is an UnsafeRow, +// we don't need to unpack variables from the row. +Seq.empty + } + val evalCode = if (evals.isEmpty) { +"" + } else { +s"${evals.map(_.code).mkString("\n")}" } s""" - | ${evals.map(_.code).mkString("\n")} - | ${doConsume(ctx, evals)} + | $evalCode + | ${doConsume(ctx, evals, row)} """.stripMargin } else { doConsume(ctx, input) --- End diff -- pass null here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11484#discussion_r54996552 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Expand.scala --- @@ -93,7 +93,7 @@ case class Expand( child.asInstanceOf[CodegenSupport].produce(ctx, this) } - override def doConsume(ctx: CodegenContext, input: Seq[ExprCode]): String = { --- End diff -- I think we can remove the default value for row here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13255][SQL] Update vectorized reader to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11435#issuecomment-192145369 **[Test build #52450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52450/consoleFull)** for PR 11435 at commit [`ed79eee`](https://github.com/apache/spark/commit/ed79eee5daeab177c4350f6f111898f0e7339309). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11484#discussion_r54996447 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala --- @@ -109,7 +114,10 @@ trait CodegenSupport extends SparkPlan { * Consume the columns generated from current SparkPlan, call it's parent. */ final def consume(ctx: CodegenContext, input: Seq[ExprCode], row: String = null): String = { -if (input != null) { +// We check if input expressions has same length as output when: +// 1. parent can't consume UnsafeRow and input is not null. +// 2. parent consumes UnsafeRow and row is null. +if ((input != null && !parent.consumeUnsafeRow) || (parent.consumeUnsafeRow && row == null)) { --- End diff -- Why do we need to change this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11484#discussion_r54996174 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegen.scala --- @@ -67,7 +67,12 @@ trait CodegenSupport extends SparkPlan { /** * Which SparkPlan is calling produce() of this one. It's itself for the first SparkPlan. */ - private var parent: CodegenSupport = null + protected var parent: CodegenSupport = null + + /** +* Whether this SparkPlan accepts UnsafeRow as input in consumeChild. --- End diff -- consumeChild -> doConsume --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13652][Core]Copy ByteBuffer in sendRpcS...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11499 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13652][Core]Copy ByteBuffer in sendRpcS...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/11499#issuecomment-192142269 Merging to master and 1.6 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13652][Core]Copy ByteBuffer in sendRpcS...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11499#issuecomment-192141632 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52445/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13652][Core]Copy ByteBuffer in sendRpcS...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11499#issuecomment-192141626 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13668][SQL] Reorder filter/join predica...
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/11511#issuecomment-192141195 cc @nongli @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13652][Core]Copy ByteBuffer in sendRpcS...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11499#issuecomment-192140867 **[Test build #52445 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52445/consoleFull)** for PR 11499 at commit [`7199237`](https://github.com/apache/spark/commit/71992375d2d3ad6e1b2db2769e21facb6c7cfe8c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [ML] testEstimatorAndModelReadWrite should cal...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11513#issuecomment-192139327 **[Test build #52449 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52449/consoleFull)** for PR 11513 at commit [`8f95454`](https://github.com/apache/spark/commit/8f95454176b289c886f3eeaa82af0401541663d1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13668][SQL] Reorder filter/join predica...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11511#issuecomment-192138366 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13668][SQL] Reorder filter/join predica...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11511#issuecomment-192138372 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52447/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13668][SQL] Reorder filter/join predica...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11511#issuecomment-192137331 **[Test build #52447 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52447/consoleFull)** for PR 11511 at commit [`dfb33ec`](https://github.com/apache/spark/commit/dfb33ecd27bb65903dd4a0a2cd6bfcd0d8d912c3). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ReorderedPredicateSuite extends QueryTest with SharedSQLContext with PredicateHelper ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [ML] testEstimatorAndModelReadWrite should cal...
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/11513 [ML] testEstimatorAndModelReadWrite should call checkModelData ## What changes were proposed in this pull request? Although we defined ```checkModelData``` in ```read/write``` test of ML estimators/models and pass it to ```testEstimatorAndModelReadWrite```, ```testEstimatorAndModelReadWrite``` omits to call ```checkModelData``` to check the equality of model data. So actually we did not run the check of model data equality for all test cases currently, we should fix it. cc @jkbradley @mengxr ## How was this patch tested? No new unit test, should pass the exist ones. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark ml-check-model-data Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11513.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11513 commit 8f95454176b289c886f3eeaa82af0401541663d1 Author: Yanbo Liang Date: 2016-03-04T06:33:04Z testEstimatorAndModelReadWrite should call checkModelData --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13631] [CORE] Thread-safe getLocationsW...
Github user a1k0n commented on the pull request: https://github.com/apache/spark/pull/11505#issuecomment-192136179 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13631] [CORE] Thread-safe getLocationsW...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11505#issuecomment-192131987 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52444/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13631] [CORE] Thread-safe getLocationsW...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11505#issuecomment-192131985 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13642][Yarn] Properly handle signal kil...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11512#issuecomment-192131877 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13642][Yarn] Properly handle signal kil...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11512#issuecomment-192131879 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52448/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13631] [CORE] Thread-safe getLocationsW...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11505#issuecomment-192131731 **[Test build #52444 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52444/consoleFull)** for PR 11505 at commit [`4f78803`](https://github.com/apache/spark/commit/4f7880340f9c05e54b0758a308493b3d8dced83d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13642][Yarn] Properly handle signal kil...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11512#issuecomment-192131732 **[Test build #52448 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52448/consoleFull)** for PR 11512 at commit [`b461b71`](https://github.com/apache/spark/commit/b461b717ed51b532f823615bcb79f66b17635c4d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13603] [SQL] support SQL generation for...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11453#issuecomment-192128896 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52443/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13603] [SQL] support SQL generation for...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11453#issuecomment-192128892 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13603] [SQL] support SQL generation for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11453#issuecomment-192128187 **[Test build #52443 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52443/consoleFull)** for PR 11453 at commit [`5fbc714`](https://github.com/apache/spark/commit/5fbc714e3273ff5aadd347b53cc3af2d693db153). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13636][SQL] Directly consume UnsafeRow ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/11484#issuecomment-192128562 cc @davies @rxin @nongli --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13642][Yarn] Properly handle signal kil...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11512#issuecomment-192125457 **[Test build #52448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52448/consoleFull)** for PR 11512 at commit [`b461b71`](https://github.com/apache/spark/commit/b461b717ed51b532f823615bcb79f66b17635c4d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13642][Yarn] Properly handle signal kil...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/11512 [SPARK-13642][Yarn] Properly handle signal kill in ApplicationMaster ## What changes were proposed in this pull request? This patch is fixing the race condition in ApplicationMaster when receiving a signal. In the current implementation, if signal is received and with no any exception, this application will be finished with successful state in Yarn, and there's no another attempt. Actually the application is killed by signal in the runtime, so another attempt is expected. This patch adds a signal handler to handle the signal things, if signal is received, marking this application finished with failure, rather than success. ## How was this patch tested? This patch is tested with following situations: 1. Application is finished normally. 2. Application is finished by calling `System.exit(n)`. 3. Application is killed by yarn command. 4. ApplicationMaster is killed by "SIGTERM" send by `kill pid` command. 5. ApplicationMaster is killed by NM with "SIGTERM" in case of NM failure. All the scenarios return the expected states. CC @tgravescs , please help to review this fix, thanks a lot. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-13642 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11512.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11512 commit b461b717ed51b532f823615bcb79f66b17635c4d Author: jerryshao Date: 2016-03-04T05:52:18Z Properly handle signal kill in ApplicationMaster --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13255][SQL] Update vectorized reader to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11435#issuecomment-192122554 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13255][SQL] Update vectorized reader to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11435#issuecomment-192122559 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52446/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13255][SQL] Update vectorized reader to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11435#issuecomment-192122057 **[Test build #52446 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52446/consoleFull)** for PR 11435 at commit [`f5f1e2b`](https://github.com/apache/spark/commit/f5f1e2be578ad40daafe25c6cc1b09bb4f8bb71a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13626] [core] Avoid duplicate config de...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11510#issuecomment-192121421 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13626] [core] Avoid duplicate config de...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11510#issuecomment-192121425 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52442/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13626] [core] Avoid duplicate config de...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11510#issuecomment-192120956 **[Test build #52442 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52442/consoleFull)** for PR 11510 at commit [`c5338f6`](https://github.com/apache/spark/commit/c5338f6561d62ac4a869012f369df9339b1437cb). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13117] [Web UI] WebUI should use the lo...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11490#issuecomment-192118319 I agree @srowen, I see that SPARK_PUBLIC_DNS is not for binding purpose. I have changed the env var to SPARK_LOCAL_IP. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13633] [SQL] Move things into catalyst....
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/11506#issuecomment-192116711 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11487#issuecomment-192109035 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52441/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11487#issuecomment-192109034 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13640][SQL] Synchronize ScalaReflection...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11487#issuecomment-192108763 **[Test build #52441 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52441/consoleFull)** for PR 11487 at commit [`cee5896`](https://github.com/apache/spark/commit/cee58960dee030116ab5b027aaadc8203828d8cb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11502#issuecomment-192107904 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52438/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11283#issuecomment-192107849 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52440/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11283#issuecomment-192107847 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11502#issuecomment-192107901 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13659] Refactor BlockStore put*() APIs ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11502#issuecomment-192107709 **[Test build #52438 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52438/consoleFull)** for PR 11502 at commit [`6381b00`](https://github.com/apache/spark/commit/6381b00a94c7bf4ea0693fc4ae6868ef0f866dc4). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11283#issuecomment-192107596 **[Test build #52440 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52440/consoleFull)** for PR 11283 at commit [`9eaca51`](https://github.com/apache/spark/commit/9eaca515a3a86f07ed4ca85ba6da080ad605d1c0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8000][SQL] Support for auto-detecting d...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11270#issuecomment-192106315 @rxin If you think we should not list up not even once then, should we maybe then just detect the source only by given paths without listing up and then just leave the `sqlContext.conf.defaultDataSourceName` option? So, in other words, ```bash ├── iamjson.json # Detect success by the extension of `iamjson.json` │  ├── part-001 │  └── part-002 ├── iamjson # Try use `sqlContext.conf.defaultDataSourceName` and then │ │# throw an exception in Parquet-side. │  ├── part-001 │  └── part-002 ├── iamparquet.parquet # Detect success by the extension of `iamparquet.parquet` │  ├── part-001.parquet │  └── part-002.parquet └── iamparquet # Just use `sqlContext.conf.defaultDataSourceName` ├── part-001.parquet └── part-002.parquet ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/11372#discussion_r54990462 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NullFilteringSuite.scala --- @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules._ + +class NullFilteringSuite extends PlanTest { + + object Optimize extends RuleExecutor[LogicalPlan] { +val batches = Batch("NullFiltering", Once, NullFiltering) :: + Batch("CombineFilters", Once, CombineFilters) :: Nil + } + + val testRelation = LocalRelation('a.int, 'b.int, 'c.int) + + test("filter: filter out nulls in condition") { +val originalQuery = testRelation.where('a === 1) +val correctAnswer = testRelation.where(IsNotNull('a) && 'a === 1).analyze --- End diff -- We can do that in a follow up pr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/11372#discussion_r54990555 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NullFilteringSuite.scala --- @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules._ + +class NullFilteringSuite extends PlanTest { + + object Optimize extends RuleExecutor[LogicalPlan] { +val batches = Batch("NullFiltering", Once, NullFiltering) :: + Batch("CombineFilters", Once, CombineFilters) :: Nil + } + + val testRelation = LocalRelation('a.int, 'b.int, 'c.int) + + test("filter: filter out nulls in condition") { +val originalQuery = testRelation.where('a === 1) +val correctAnswer = testRelation.where(IsNotNull('a) && 'a === 1).analyze +val optimized = Optimize.execute(originalQuery.analyze) +comparePlans(optimized, correctAnswer) + } + + test("join: filter out nulls on either side") { +val x = testRelation.subquery('x) +val y = testRelation.subquery('y) +val originalQuery = x.join(y, + condition = Some("x.a".attr === "y.a".attr && "x.b".attr === 1 && "y.c".attr > 5)) +val left = x.where(IsNotNull('a) && IsNotNull('b)) +val right = y.where(IsNotNull('a) && IsNotNull('c)) +val correctAnswer = left.join(right, + condition = Some("x.a".attr === "y.a".attr && "x.b".attr === 1 && "y.c".attr > 5)).analyze +val optimized = Optimize.execute(originalQuery.analyze) +comparePlans(optimized, correctAnswer) + } + + test("join with pre-existing filters: filter out nulls on either side") { +val x = testRelation.subquery('x) +val y = testRelation.subquery('y) +val originalQuery = x.where('b > 5).join(y.where('c === 10), + condition = Some("x.a".attr === "y.a".attr)) +val left = x.where(IsNotNull('a) && IsNotNull('b) && 'b > 5) +val right = y.where(IsNotNull('a) && IsNotNull('c) && 'c === 10) +val correctAnswer = left.join(right, + condition = Some("x.a".attr === "y.a".attr)).analyze +val optimized = Optimize.execute(originalQuery.analyze) +comparePlans(optimized, correctAnswer) + } --- End diff -- I had a few more test cases when i tried to this. Can you see if any of them should be added? https://github.com/nongli/spark/commit/ea0edd46e080cd0a1c6a1d41374563c149a030f7 We should also have outer join tests to make sure they don't add the is not null filter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/11372#discussion_r54990320 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NullFilteringSuite.scala --- @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules._ + +class NullFilteringSuite extends PlanTest { + + object Optimize extends RuleExecutor[LogicalPlan] { +val batches = Batch("NullFiltering", Once, NullFiltering) :: + Batch("CombineFilters", Once, CombineFilters) :: Nil + } + + val testRelation = LocalRelation('a.int, 'b.int, 'c.int) + + test("filter: filter out nulls in condition") { +val originalQuery = testRelation.where('a === 1) +val correctAnswer = testRelation.where(IsNotNull('a) && 'a === 1).analyze --- End diff -- you haven't done anything with a === 1 right? There's still no logic that a === 1 has a not nullable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13668][SQL] Reorder filter/join predica...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11511#issuecomment-192105598 **[Test build #52447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52447/consoleFull)** for PR 11511 at commit [`dfb33ec`](https://github.com/apache/spark/commit/dfb33ecd27bb65903dd4a0a2cd6bfcd0d8d912c3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13668][SQL] Reorder filter/join predica...
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/11511 [SPARK-13668][SQL] Reorder filter/join predicates to short-circuit isNotNull checks ## What changes were proposed in this pull request? If a filter predicate or a join condition consists of `IsNotNull` checks, we should reorder these checks such that these non-nullability checks are evaluated before the rest of the predicates. For e.g., if a filter predicate is of the form `a > 5 && isNotNull(b)`, we should rewrite this as `isNotNull(b) && a > 5` during physical plan generation. ## How was this patch tested? new unit tests that verify the physical plan for both filters and joins in `ReorderedPredicateSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/sameeragarwal/spark reorder-isnotnull Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11511.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11511 commit 9341da6bc45c868d14c4f5d4c020e40a6b5ba593 Author: Sameer Agarwal Date: 2016-03-02T23:57:57Z Reorder conditions in join and filters commit dfb33ecd27bb65903dd4a0a2cd6bfcd0d8d912c3 Author: Sameer Agarwal Date: 2016-03-03T01:20:18Z unit tests: ReorderedPredicateSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/11372#discussion_r54990183 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -586,6 +587,52 @@ object NullPropagation extends Rule[LogicalPlan] { } /** + * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness + * by inserting isNotNull filters is the query plan. These filters are currently inserted beneath + * existing Filters and Join operators and are inferred based on their data constraints. + * + * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and + * LeftSemi joins. + */ +object NullFiltering extends Rule[LogicalPlan] with PredicateHelper { + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case filter @ Filter(condition, child: LogicalPlan) => + // We generate a list of additional isNotNull filters from the operator's existing constraints + // but remove those that are either already part of the filter condition or are part of the + // operator's child constraints. + val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) -- +(child.constraints ++ splitConjunctivePredicates(condition)) + val newCondition = if (newIsNotNullConstraints.nonEmpty) { +And(newIsNotNullConstraints.reduce(And), condition) + } else { +condition + } + Filter(newCondition, child) + +case join @ Join(left: LogicalPlan, right: LogicalPlan, joinType: JoinType, + condition: Option[Expression]) => +val leftIsNotNullConstraints = join.constraints +.filter(_.isInstanceOf[IsNotNull]) +.filter(_.references.subsetOf(left.outputSet)) -- left.constraints + val rightIsNotNullConstraints = +join.constraints + .filter(_.isInstanceOf[IsNotNull]) + .filter(_.references.subsetOf(right.outputSet)) -- right.constraints + val newLeftChild = if (leftIsNotNullConstraints.nonEmpty) { +Filter(leftIsNotNullConstraints.reduce(And), left) + } else { +left + } + val newRightChild = if (rightIsNotNullConstraints.nonEmpty) { +Filter(rightIsNotNullConstraints.reduce(And), right) + } else { +right + } + Join(newLeftChild, newRightChild, joinType, condition) --- End diff -- same here, would be nice to reuse `join` if it is not changed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/11372#discussion_r54990168 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -586,6 +587,52 @@ object NullPropagation extends Rule[LogicalPlan] { } /** + * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness + * by inserting isNotNull filters is the query plan. These filters are currently inserted beneath + * existing Filters and Join operators and are inferred based on their data constraints. + * + * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and + * LeftSemi joins. + */ +object NullFiltering extends Rule[LogicalPlan] with PredicateHelper { + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case filter @ Filter(condition, child: LogicalPlan) => + // We generate a list of additional isNotNull filters from the operator's existing constraints + // but remove those that are either already part of the filter condition or are part of the + // operator's child constraints. + val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) -- +(child.constraints ++ splitConjunctivePredicates(condition)) + val newCondition = if (newIsNotNullConstraints.nonEmpty) { +And(newIsNotNullConstraints.reduce(And), condition) + } else { +condition + } + Filter(newCondition, child) + +case join @ Join(left: LogicalPlan, right: LogicalPlan, joinType: JoinType, + condition: Option[Expression]) => +val leftIsNotNullConstraints = join.constraints --- End diff -- indenting --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/11372#discussion_r54989945 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -586,6 +587,52 @@ object NullPropagation extends Rule[LogicalPlan] { } /** + * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness + * by inserting isNotNull filters is the query plan. These filters are currently inserted beneath + * existing Filters and Join operators and are inferred based on their data constraints. + * + * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and + * LeftSemi joins. + */ +object NullFiltering extends Rule[LogicalPlan] with PredicateHelper { + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case filter @ Filter(condition, child: LogicalPlan) => + // We generate a list of additional isNotNull filters from the operator's existing constraints + // but remove those that are either already part of the filter condition or are part of the + // operator's child constraints. + val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) -- +(child.constraints ++ splitConjunctivePredicates(condition)) + val newCondition = if (newIsNotNullConstraints.nonEmpty) { --- End diff -- remove newConditino and just return filter if this doesn't do anything so we can reuse that filter subplan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/11372#discussion_r54989882 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -586,6 +587,52 @@ object NullPropagation extends Rule[LogicalPlan] { } /** + * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness + * by inserting isNotNull filters is the query plan. These filters are currently inserted beneath --- End diff -- "in the query plan" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13255][SQL] Update vectorized reader to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11435#issuecomment-192099892 **[Test build #52446 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52446/consoleFull)** for PR 11435 at commit [`f5f1e2b`](https://github.com/apache/spark/commit/f5f1e2be578ad40daafe25c6cc1b09bb4f8bb71a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12941][SQL][MASTER] Spark-SQL JDBC Orac...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11489 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12941][SQL][MASTER] Spark-SQL JDBC Orac...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/11489#issuecomment-192096905 thanks. Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13652][Core]Copy ByteBuffer in sendRpcS...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11499#issuecomment-192096385 **[Test build #52445 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52445/consoleFull)** for PR 11499 at commit [`7199237`](https://github.com/apache/spark/commit/71992375d2d3ad6e1b2db2769e21facb6c7cfe8c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13652][Core]Copy ByteBuffer in sendRpcS...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/11499#issuecomment-192096096 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13631] [CORE] Thread-safe getLocationsW...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11505#issuecomment-192095362 **[Test build #52444 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52444/consoleFull)** for PR 11505 at commit [`4f78803`](https://github.com/apache/spark/commit/4f7880340f9c05e54b0758a308493b3d8dced83d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13603] [SQL] support SQL generation for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11453#issuecomment-192094176 **[Test build #52443 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52443/consoleFull)** for PR 11453 at commit [`5fbc714`](https://github.com/apache/spark/commit/5fbc714e3273ff5aadd347b53cc3af2d693db153). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13631] [CORE] Thread-safe getLocationsW...
Github user a1k0n commented on the pull request: https://github.com/apache/spark/pull/11505#issuecomment-192094056 rebasing to pick up flaky test fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13647][SQL] also check if numeric value...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11492 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13637][SQL] use more information to sim...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11485#issuecomment-192092754 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52435/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13637][SQL] use more information to sim...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11485#issuecomment-192092752 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13637][SQL] use more information to sim...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11485#issuecomment-192092226 **[Test build #52435 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52435/consoleFull)** for PR 11485 at commit [`4f31c5c`](https://github.com/apache/spark/commit/4f31c5c8e1461a63a6e4ce9f74712b746ad098f4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13647][SQL] also check if numeric value...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11492#issuecomment-192092156 Merging this into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13647][SQL] also check if numeric value...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/11492#issuecomment-192091999 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12941][SQL][MASTER] Spark-SQL JDBC Orac...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11489#issuecomment-192091588 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52434/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12941][SQL][MASTER] Spark-SQL JDBC Orac...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11489#issuecomment-192091584 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12941][SQL][MASTER] Spark-SQL JDBC Orac...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11489#issuecomment-192091121 **[Test build #52434 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52434/consoleFull)** for PR 11489 at commit [`3ba7dc5`](https://github.com/apache/spark/commit/3ba7dc52e1980eef320faea07cc12eef7863a621). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11283#issuecomment-192090338 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52433/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11283#issuecomment-192090337 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11283#issuecomment-192090048 **[Test build #52433 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52433/consoleFull)** for PR 11283 at commit [`6f609fb`](https://github.com/apache/spark/commit/6f609fb2d844e2aaf4c809ef8c0fcd9e6eca38bb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/11486#issuecomment-192089154 It's a good question! It's possible that the label of input dataset is not 0 based or not continuous. So we should use ```StringIndexer``` to index label in [0, numLabels), and after training we use ```IndexToString``` to map index label to the original ones. We have already store the label map in the metadata of label column. All the models under ML package will follow this rule. For examples, if you train ```LogisticRegression``` with the input label ```"-1, +1"``` will produce erroneous results, you should use ```StringIndexer``` to transform labels to ```"0, 1"``` firstly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13174][SparkR] Add read.csv and write.c...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11457#discussion_r54987993 --- Diff: R/pkg/inst/tests/testthat/test_context.R --- @@ -26,7 +26,7 @@ test_that("Check masked functions", { maskedBySparkR <- masked[funcSparkROrEmpty] namesOfMasked <- c("describe", "cov", "filter", "lag", "na.omit", "predict", "sd", "var", "colnames", "colnames<-", "intersect", "rank", "rbind", "sample", "subset", - "summary", "transform", "drop") + "summary", "transform", "drop", "read.csv", "write.csv") --- End diff -- @felixcheung Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org