[GitHub] spark issue #13634: [SPARK-15913][CORE] Dispatcher.stopped should be enclose...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13634 Hi, @vanzin . Could you review this when you have some time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13634: [SPARK-15913][CORE] Dispatcher.stopped should be ...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13634 [SPARK-15913][CORE] Dispatcher.stopped should be enclosed by synchronized block. ## What changes were proposed in this pull request? `Dispatcher.stopped` is guarded by `this`, but it is used without synchronization in `postMessage` function. This PR fixes this and also the exception message became more accurate. ## How was this patch tested? Pass the existing Jenkins tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-15913 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13634.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13634 commit 75a5254371374faf66f166e1b2683d3f9803cb8e Author: Dongjoon HyunDate: 2016-06-13T05:53:47Z [SPARK-15913][CORE] Dispatcher.stopped should be enclosed by synchronized block. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13623: [SPARK-15895][SQL] Filters out metadata files while doin...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13623 In this file PartitioningAwareFileCatalog.scala, we have multiple places that we filter out files that start with underscore. Should we also filter dot in those places? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...
Github user yangw1234 commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r66742485 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -92,6 +92,36 @@ object PhysicalOperation extends PredicateHelper { .map(Alias(_, a.name)(a.exprId, a.qualifier, isGenerated = a.isGenerated)).getOrElse(a) } } + + /** + * Drop the non-partition key expression in the disjunctions, to optimize the partition pruning. + * For instances: (We assume part1 & part2 are the partition keys) + * (part1 == 1 and a > 3) or (part2 == 2 and a < 5) ==> (part1 == 1 or part1 == 2) + * (part1 == 1 and a > 3) or (a < 100) => None + * (a > 100 && b < 100) or (part1 = 10) => None + * (a > 100 && b < 100 and part1 = 10) or (part1 == 2) => (part1 = 10 or part1 == 2) + * @param predicate disjunctions + * @param partitionKeyIds partition keys in attribute set + * @return + */ + def partitionPrunningFromDisjunction( +predicate: Expression, partitionKeyIds: AttributeSet): Option[Expression] = { +// ignore the pure non-partition key expression in conjunction of the expression tree +val additionalPartPredicate = predicate transformUp { + case a @ And(left, right) if a.deterministic && +left.references.intersect(partitionKeyIds).isEmpty => right + case a @ And(left, right) if a.deterministic && +right.references.intersect(partitionKeyIds).isEmpty => left --- End diff -- The problem is here. Imagine a record `a = 2` in `partition = 1`. Such a record satisfies the above expression (`!(partition = 1 && a > 3)`), but if we simply drop `a > 3` and push `!(partition = 1)` down to the table scan, partition =1 will be discarded and the record won't appear in the result. The test case passed because the `BooleanSimplification` optimizer rule will transform `!(partition =1 && a > 3)` to `(!(partition=1) || (a <= 3))`, such an expression will be dropped entirely by your `partitionPrunningFromDisjunction`, in which case "partition = 1" will not be discarded. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13633: [SPARK-15912] [SQL] Replace getPartitionsByFilter by get...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13633 **[Test build #60382 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60382/consoleFull)** for PR 13633 at commit [`77c8808`](https://github.com/apache/spark/commit/77c8808e5325b04c54d5f7d7c043ad34a8b09477). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13633: [SPARK-15912] [SQL] Replace getPartitionsByFilter...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/13633 [SPARK-15912] [SQL] Replace getPartitionsByFilter by getPartitions in inputFiles of MetastoreRelation What changes were proposed in this pull request? Always returns the files of all the partitions when calling `inputFiles`. Thus, the implementation of `inputFiles` in `MetastoreRelation` does not need to call `getPartitionsByFilter`. Instead, we should call `getPartitions`. No test case is available for `inputFiles` API in `MetastoreRelation`. This PR also adds the missing test cases. How was this patch tested? See above. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark testCase4InputFiles Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13633.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13633 commit 77c8808e5325b04c54d5f7d7c043ad34a8b09477 Author: gatorsmileDate: 2016-06-13T05:43:51Z fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13631: [SPARK-15911][SQL] Remove the additional Project to be c...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13631 @cloud-fan There is a test ("Detect table partitioning with correct partition order") in `InsertIntoHiveTableSuite` which is dedicated to test `insertInto` with this column re-ordering. What you think we should do about it? Remove it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r66741771 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -65,15 +65,20 @@ private[hive] trait HiveStrategies { // hive table scan operator to be used for partition pruning. val partitionKeyIds = AttributeSet(relation.partitionKeys) val (pruningPredicates, otherPredicates) = predicates.partition { predicate => - !predicate.references.isEmpty && + predicate.references.nonEmpty && predicate.references.subsetOf(partitionKeyIds) } +val additionalPartPredicates = + PhysicalOperation.partitionPrunningFromDisjunction( +otherPredicates.foldLeft[Expression](Literal(true))(And(_, _)), partitionKeyIds) pruneFilterProject( projectList, otherPredicates, identity[Seq[Expression]], - HiveTableScanExec(_, relation, pruningPredicates)(sparkSession)) :: Nil +HiveTableScanExec(_, +relation, +pruningPredicates ++ additionalPartPredicates)(sparkSession)) :: Nil --- End diff -- Sorry, @clockfly I am not so sure your mean, this PR is not designed to depends on the Optimizer (CNF), can you please give more concrete example if there is a bug? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/13394#discussion_r66741569 --- Diff: R/pkg/R/mllib.R --- @@ -218,9 +222,9 @@ setMethod("predict", signature(object = "GeneralizedLinearRegressionModel"), return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) -#' Make predictions from a naive Bayes model +#' predict --- End diff -- Same here: keep longer title --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/13394#discussion_r66741577 --- Diff: R/pkg/R/mllib.R --- @@ -582,9 +586,9 @@ setMethod("summary", signature(object = "AFTSurvivalRegressionModel"), return(list(coefficients = coefficients)) }) -#' Make predictions from an AFT survival regression model +#' predict --- End diff -- ditto: keep long title --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/13394#discussion_r66741575 --- Diff: R/pkg/R/mllib.R --- @@ -357,9 +361,9 @@ setMethod("summary", signature(object = "KMeansModel"), cluster = cluster, is.loaded = is.loaded)) }) -#' Make predictions from a k-means model +#' predict --- End diff -- ditto: keep long title --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/13394#discussion_r66741568 --- Diff: R/pkg/R/mllib.R --- @@ -197,7 +201,7 @@ print.summary.GeneralizedLinearRegressionModel <- function(x, ...) { invisible(x) } -#' Make predictions from a generalized linear model +#' predict --- End diff -- No need for this change. We can keep the longer title. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/13394#discussion_r66741560 --- Diff: R/pkg/R/column.R --- @@ -170,6 +172,8 @@ setMethod("between", signature(x = "Column"), } }) +#' cast +#' #' Casts the column to a different data type. --- End diff -- This can remain the title, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/13394#discussion_r66741566 --- Diff: R/pkg/R/functions.R --- @@ -249,10 +249,7 @@ col <- function(x) { #' #' Returns a Column based on the given column name. #' -#' @rdname col -#' @name column #' @family normal_funcs -#' @export --- End diff -- This function is exported, right? It's ```col``` which is not exported. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunctions sh...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13413 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunctions sh...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13413 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60379/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunctions sh...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13413 **[Test build #60379 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60379/consoleFull)** for PR 13413 at commit [`535d27c`](https://github.com/apache/spark/commit/535d27c45a1dd62ccb35616bf25e8363b625). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class CollectionAccumulator[T] extends AccumulatorV2[T, java.util.List[T]] ` * `class LibSVMFileFormat extends TextBasedFileFormat with DataSourceRegister ` * `public static class Prefix ` * `abstract class ForeachWriter[T] extends Serializable ` * `abstract class SparkStrategy extends GenericStrategy[SparkPlan] ` * `class CSVFileFormat extends TextBasedFileFormat with DataSourceRegister ` * `case class RefreshResource(path: String)` * `abstract class TextBasedFileFormat extends FileFormat ` * `class JsonFileFormat extends TextBasedFileFormat with DataSourceRegister ` * `class TextFileFormat extends TextBasedFileFormat with DataSourceRegister ` * `class ForeachSink[T : Encoder](writer: ForeachWriter[T]) extends Sink with Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r66741254 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -65,15 +65,20 @@ private[hive] trait HiveStrategies { // hive table scan operator to be used for partition pruning. val partitionKeyIds = AttributeSet(relation.partitionKeys) val (pruningPredicates, otherPredicates) = predicates.partition { predicate => - !predicate.references.isEmpty && + predicate.references.nonEmpty && predicate.references.subsetOf(partitionKeyIds) } +val additionalPartPredicates = + PhysicalOperation.partitionPrunningFromDisjunction( +otherPredicates.foldLeft[Expression](Literal(true))(And(_, _)), partitionKeyIds) pruneFilterProject( projectList, otherPredicates, identity[Seq[Expression]], - HiveTableScanExec(_, relation, pruningPredicates)(sparkSession)) :: Nil +HiveTableScanExec(_, +relation, +pruningPredicates ++ additionalPartPredicates)(sparkSession)) :: Nil --- End diff -- Sure, we understand that the additionalPartPredicates is the partition filter. But we may not be able to assure BooleanSimplification will push all NOT operator to leaf expression, as BooleanSimplification is an "optimizer" rule, which can be skipped if exceeding max iterations during optimization. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13631: [SPARK-15911][SQL] Remove the additional Project to be c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13631 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13631: [SPARK-15911][SQL] Remove the additional Project to be c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13631 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60380/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13631: [SPARK-15911][SQL] Remove the additional Project to be c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13631 **[Test build #60380 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60380/consoleFull)** for PR 13631 at commit [`5f4455a`](https://github.com/apache/spark/commit/5f4455ae3400302c4f3cb019419dbdada4edf5c9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13623: [SPARK-15895][SQL] Filters out metadata files while doin...
Github user clockfly commented on the issue: https://github.com/apache/spark/pull/13623 Looks good! +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13604: [SPARK-15898][SQL] DataFrameReader.text should re...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13604 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13604: [SPARK-15898][SQL] DataFrameReader.text should return Da...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13604 Merging in master/2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13629: [SPARK-15370][SQL] Fix count bug
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13629 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13629: [SPARK-15370][SQL] Fix count bug
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13629 Thanks - merging in master/2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13632: [SPARK-15910][SQL] Check schema consistency when using K...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13632 **[Test build #60381 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60381/consoleFull)** for PR 13632 at commit [`50bb48d`](https://github.com/apache/spark/commit/50bb48d6f2a59e2e88fe68699fceac308153e08a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13632: [SPARK-15910][SQL] Check schema consistency when ...
GitHub user clockfly opened a pull request: https://github.com/apache/spark/pull/13632 [SPARK-15910][SQL] Check schema consistency when using Kryo encoder to convert DataFrame to Dataset ## What changes were proposed in this pull request? This PR enforces schema check when converting DataFrame to Dataset using Kryo encoder. For example. **Before the change:** Schema is NOT checked when converting DataFrame to Dataset using kryo encoder. ``` scala> case class B(b: Int) scala> implicit val encoder = Encoders.kryo[B] scala> val df = Seq((1)).toDF("b") scala> val ds = df.as[B] // Schema compatibility is NOT checked ``` **After the change:** Report AnalysisException since the schema is NOT compatible. ``` scala> val ds = Seq((1)).toDF("b").as[B] org.apache.spark.sql.AnalysisException: cannot resolve 'CAST(`b` AS BINARY)' due to data type mismatch: cannot cast IntegerType to BinaryType; ... ``` ## How was this patch tested? Unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/clockfly/spark spark-15910 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13632.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13632 commit 50bb48d6f2a59e2e88fe68699fceac308153e08a Author: Sean ZhongDate: 2016-06-13T04:01:57Z SPARK-15910: Check schema --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13631: [SPARK-15911][SQL] Remove the additional Project to be c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13631 **[Test build #60380 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60380/consoleFull)** for PR 13631 at commit [`5f4455a`](https://github.com/apache/spark/commit/5f4455ae3400302c4f3cb019419dbdada4edf5c9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13631: [SPARK-15911][SQL] Remove the additional Project ...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/13631 [SPARK-15911][SQL] Remove the additional Project to be consistent with SQL ## What changes were proposed in this pull request? Currently In `DataFrameWriter`'s `insertInto` and `ResolveRelations` of `Analyzer`, we add additional Project to adjust column ordering. However, it should be using ordering not name for this resolution. This is how Hive does for dynamic partition. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 inserttable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13631.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13631 commit 5f4455ae3400302c4f3cb019419dbdada4edf5c9 Author: Liang-Chi HsiehDate: 2016-06-13T04:00:40Z Remove the additional Project to be consistent with SQL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13585 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13585 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60376/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13585 **[Test build #60376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60376/consoleFull)** for PR 13585 at commit [`79f7acb`](https://github.com/apache/spark/commit/79f7acbb660c2c398e21a36a7b92f316b7e5037f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunctions sh...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13413 **[Test build #60379 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60379/consoleFull)** for PR 13413 at commit [`535d27c`](https://github.com/apache/spark/commit/535d27c45a1dd62ccb35616bf25e8363b625). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10706: [SPARK-12543] [SPARK-4226] [SQL] Subquery in expression
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/10706 @kamalcoursera you could use a predicate scalar subquery here, i.e.: ```sql select runon as runon case when (select max(true) from sqltesttable b where b.key = a.key and group = 'vowels') then 'vowels' else 'consonants' end as group, key as key, someint as someint from sqltesttable a; ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13630: [SPARK-15892][ML] Change spark to sqlContext in t...
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/13630 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13630: [SPARK-15892][ML] Change spark to sqlContext in the test...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13630 It seems this should be reverted first. Closing this (see https://github.com/apache/spark/pull/13619). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13619: [SPARK-15892][ML] Incorrectly merged AFTAggregator with ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13619 Yes, it seems it is still failing. I can change the PR to revert this if he is busy for now. Otherwise, I will close mine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13619: [SPARK-15892][ML] Incorrectly merged AFTAggregator with ...
Github user zzcclp commented on the issue: https://github.com/apache/spark/pull/13619 @HyukjinKwon OK, but this pr should be reverted first. @jkbradley could you revert this pr for branch-1.6 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13558 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60378/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13558 **[Test build #60378 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60378/consoleFull)** for PR 13558 at commit [`8d87c0f`](https://github.com/apache/spark/commit/8d87c0f2bd9140928915f835fd7d21b178422c69). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13558 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13630: [SPARK-15892][ML] Change spark to sqlContext in the test...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13630 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13630: [SPARK-15892][ML] Change spark to sqlContext in the test...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13630 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60377/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13630: [SPARK-15892][ML] Change spark to sqlContext in the test...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13630 **[Test build #60377 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60377/consoleFull)** for PR 13630 at commit [`022`](https://github.com/apache/spark/commit/022d48d52127fb3cab804a78eed9ff253b76). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13558 **[Test build #60378 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60378/consoleFull)** for PR 13558 at commit [`8d87c0f`](https://github.com/apache/spark/commit/8d87c0f2bd9140928915f835fd7d21b178422c69). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13558 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13619: [SPARK-15892][ML] Incorrectly merged AFTAggregator with ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13619 @zzcclp I made a PR against branch-1.6 here, https://github.com/apache/spark/pull/13630. Thank you for pointing this out quickly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13630: [SPARK-15892][ML] Change spark to sqlContext in the test...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13630 **[Test build #60377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60377/consoleFull)** for PR 13630 at commit [`022`](https://github.com/apache/spark/commit/022d48d52127fb3cab804a78eed9ff253b76). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13630: [SPARK-15892][ML] Change spark to sqlContext in the test...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13630 cc @jkbradley and @zzcclp --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13630: [SPARK-15892][ML] Change spark to sqlContext in t...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/13630 [SPARK-15892][ML] Change spark to sqlContext in the test in AFTSurvivalRegressionSuite for 1.6 ## What changes were proposed in this pull request? https://github.com/apache/spark/pull/13619 was merged into master, branch-2.0 and 1.6 as well but the unit test uses `spark`. So, this PR change the `spark` to `sqlContext` in unit tests. It seems builds are failing due to this. ## How was this patch tested? Jenkins tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-15892-1.6 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13630.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13630 commit 022d48d52127fb3cab804a78eed9ff253b76 Author: hyukjinkwonDate: 2016-06-13T02:04:39Z Change spark to sqlContext for 1.6 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10706: [SPARK-12543] [SPARK-4226] [SQL] Subquery in expression
Github user kamalcoursera commented on the issue: https://github.com/apache/spark/pull/10706 Thank you ! Any alternative options to use instead of predicate subquery ? Is it in plan for amendment in 2.0? Select runon as runon, case when key in (Select key from sqltesttable where group = 'vowels') then 'vowels' else 'consonants' end as group, key as key, someint as someint from sqltesttable; --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r66733226 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -65,15 +65,20 @@ private[hive] trait HiveStrategies { // hive table scan operator to be used for partition pruning. val partitionKeyIds = AttributeSet(relation.partitionKeys) val (pruningPredicates, otherPredicates) = predicates.partition { predicate => - !predicate.references.isEmpty && + predicate.references.nonEmpty && predicate.references.subsetOf(partitionKeyIds) } +val additionalPartPredicates = + PhysicalOperation.partitionPrunningFromDisjunction( +otherPredicates.foldLeft[Expression](Literal(true))(And(_, _)), partitionKeyIds) pruneFilterProject( projectList, otherPredicates, identity[Seq[Expression]], - HiveTableScanExec(_, relation, pruningPredicates)(sparkSession)) :: Nil +HiveTableScanExec(_, +relation, +pruningPredicates ++ additionalPartPredicates)(sparkSession)) :: Nil --- End diff -- @yangw1234 @liancheng @clockfly `pruningPredicates ++ additionalPartPredicates` is the partition filter, and, the original filter still need to be applied after the partition pruned. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/13585 Updated with more meaningful function name and add more unit test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...
Github user chenghao-intel commented on the issue: https://github.com/apache/spark/pull/13585 cc @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66733080 --- Diff: core/src/main/scala/org/apache/spark/api/r/RRunner.scala --- @@ -40,7 +40,8 @@ private[spark] class RRunner[U]( broadcastVars: Array[Broadcast[Object]], numPartitions: Int = -1, isDataFrame: Boolean = false, -colNames: Array[String] = null) +colNames: Array[String] = null, +mode: Int = 0) --- End diff -- it is better to define enumerations for mode instead of hard-coding. for example, private[sql] object RRunnerModes = { val RDD = 0 val DATAFRAME_DAPPLY = 1 val DATAFRAME_GAPPLY = 2 } --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13619: [SPARK-15892][ML] Incorrectly merged AFTAggregato...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13619#discussion_r66732958 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala --- @@ -390,6 +390,18 @@ class AFTSurvivalRegressionSuite testEstimatorAndModelReadWrite(aft, datasetMultivariate, AFTSurvivalRegressionSuite.allParamSettings, checkModelData) } + + test("SPARK-15892: Incorrectly merged AFTAggregator with zero total count") { +// This `dataset` will contain an empty partition because it has two rows but +// the parallelism is bigger than that. Because the issue was about `AFTAggregator`s +// being merged incorrectly when it has an empty partition, running the codes below +// should not throw an exception. +val dataset = spark.createDataFrame( --- End diff -- Oh, it seems this is merged into branch-1.6 too. Yes, it should be `sqlContext` for branch-1.6. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13585 **[Test build #60376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60376/consoleFull)** for PR 13585 at commit [`79f7acb`](https://github.com/apache/spark/commit/79f7acbb660c2c398e21a36a7b92f316b7e5037f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13619: [SPARK-15892][ML] Incorrectly merged AFTAggregato...
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/spark/pull/13619#discussion_r66732911 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala --- @@ -390,6 +390,18 @@ class AFTSurvivalRegressionSuite testEstimatorAndModelReadWrite(aft, datasetMultivariate, AFTSurvivalRegressionSuite.allParamSettings, checkModelData) } + + test("SPARK-15892: Incorrectly merged AFTAggregator with zero total count") { +// This `dataset` will contain an empty partition because it has two rows but +// the parallelism is bigger than that. Because the issue was about `AFTAggregator`s +// being merged incorrectly when it has an empty partition, running the codes below +// should not throw an exception. +val dataset = spark.createDataFrame( --- End diff -- with branch-2.0, it is OK , i think that this pr should not be merged into branch-1.6 directly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13619: [SPARK-15892][ML] Incorrectly merged AFTAggregato...
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/spark/pull/13619#discussion_r66732825 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala --- @@ -390,6 +390,18 @@ class AFTSurvivalRegressionSuite testEstimatorAndModelReadWrite(aft, datasetMultivariate, AFTSurvivalRegressionSuite.allParamSettings, checkModelData) } + + test("SPARK-15892: Incorrectly merged AFTAggregator with zero total count") { +// This `dataset` will contain an empty partition because it has two rows but +// the parallelism is bigger than that. Because the issue was about `AFTAggregator`s +// being merged incorrectly when it has an empty partition, running the codes below +// should not throw an exception. +val dataset = spark.createDataFrame( --- End diff -- I compile it in branch-1.6. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13629: [SPARK-15370][SQL] Fix count bug
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13629 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13629: [SPARK-15370][SQL] Fix count bug
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13629 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60375/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #6983: [SPARK-6785][SQL] fix DateTimeUtils for dates befo...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/6983#discussion_r66732804 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala --- @@ -48,4 +49,41 @@ class DateTimeUtilsSuite extends SparkFunSuite { val t2 = DateTimeUtils.toJavaTimestamp(DateTimeUtils.fromJulianDay(d1, ns1)) assert(t.equals(t2)) } + + test("SPARK-6785: java date conversion before and after epoch") { +def checkFromToJavaDate(d1: Date): Unit = { + val d2 = DateTimeUtils.toJavaDate(DateTimeUtils.fromJavaDate(d1)) + assert(d2.toString === d1.toString) --- End diff -- Shouldn't it also be the case that `d1 === d2`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13619: [SPARK-15892][ML] Incorrectly merged AFTAggregato...
Github user zzcclp commented on a diff in the pull request: https://github.com/apache/spark/pull/13619#discussion_r66732808 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala --- @@ -390,6 +390,18 @@ class AFTSurvivalRegressionSuite testEstimatorAndModelReadWrite(aft, datasetMultivariate, AFTSurvivalRegressionSuite.allParamSettings, checkModelData) } + + test("SPARK-15892: Incorrectly merged AFTAggregator with zero total count") { +// This `dataset` will contain an empty partition because it has two rows but +// the parallelism is bigger than that. Because the issue was about `AFTAggregator`s +// being merged incorrectly when it has an empty partition, running the codes below +// should not throw an exception. +val dataset = spark.createDataFrame( --- End diff -- @HyukjinKwon value `spark` is not found here, it should be `sqlContext`, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13629: [SPARK-15370][SQL] Fix count bug
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13629 **[Test build #60375 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60375/consoleFull)** for PR 13629 at commit [`30dd0bd`](https://github.com/apache/spark/commit/30dd0bd7d560151085e53667fcc4f6a8895844ed). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13571: [SPARK-15369][WIP][RFC][PySpark][SQL] Expose potential t...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/13571 So this is a WIP of what this could look like, but I'd really like your thoughts on the draft @davies - do you think this is heading in the right direction given the performance #s from the benchmark? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66732763 --- Diff: R/pkg/R/group.R --- @@ -142,3 +142,58 @@ createMethods <- function() { } createMethods() + +#' gapply +#' +#' Applies a R function to each group in the input GroupedData +#' +#' @param x a GroupedData +#' @param func A function to be applied to each group partition specified by GroupedData. +#' The function `func` takes as argument a key - grouping columns and +#' a data frame - a local R data.frame. +#' The output of `func` is a local R data.frame. +#' @param schema The schema of the resulting SparkDataFrame after the function is applied. +#' It must match the output of func. +#' @return a SparkDataFrame +#' @rdname gapply +#' @name gapply +#' @examples +#' \dontrun{ +#' Computes the arithmetic mean of the second column by grouping +#' on the first and third columns. Output the grouping values and the average. +#' +#' df <- createDataFrame ( +#' list(list(1L, 1, "1", 0.1), list(1L, 2, "1", 0.2), list(3L, 3, "3", 0.3)), +#' c("a", "b", "c", "d")) +#' +#' schema <- structType(structField("a", "integer"), structField("c", "string"), +#' structField("avg", "double")) +#' df1 <- gapply( +#' df, +#' list("a", "c"), +#' function(key, x) { +#' y <- data.frame(key, mean(x$b), stringsAsFactors = FALSE) +#' }, +#' schema) +#' collect(df1) +#' +#' Result +#' -- +#' a c avg +#' 3 3 3.0 +#' 1 1 1.5 +#' } +setMethod("gapply", + signature(x = "GroupedData"), + function(x, func, schema) { +packageNamesArr <- serialize(.sparkREnv[[".packages"]], + connection = NULL) +broadcastArr <- lapply(ls(.broadcastNames), + function(name) { get(name, .broadcastNames) }) +sdf <- callJMethod(x@sgd, "flatMapGroupsInR", + serialize(cleanClosure(func), connection = NULL), + packageNamesArr, + broadcastArr, + if (is.null(schema)) { schema } else { schema$jobj }) --- End diff -- Thnx, I set an assertion. we cannot do it exactly like dapply by forcing with schema because gapply for GroupedData is slightly different from DataFrame's gapply. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13604: [SPARK-15898][SQL] DataFrameReader.text should return Da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13604 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60374/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13604: [SPARK-15898][SQL] DataFrameReader.text should return Da...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13604 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13604: [SPARK-15898][SQL] DataFrameReader.text should return Da...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13604 **[Test build #60374 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60374/consoleFull)** for PR 13604 at commit [`50538b7`](https://github.com/apache/spark/commit/50538b7c0952f6954a19402423e12349037b130c). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class CollectionAccumulator[T] extends AccumulatorV2[T, java.util.List[T]] ` * `class LibSVMFileFormat extends TextBasedFileFormat with DataSourceRegister ` * `public static class Prefix ` * `abstract class SparkStrategy extends GenericStrategy[SparkPlan] ` * `class CSVFileFormat extends TextBasedFileFormat with DataSourceRegister ` * `case class RefreshResource(path: String)` * `abstract class TextBasedFileFormat extends FileFormat ` * `class JsonFileFormat extends TextBasedFileFormat with DataSourceRegister ` * `class TextFileFormat extends TextBasedFileFormat with DataSourceRegister ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13623: [SPARK-15895][SQL] Filters out metadata files while doin...
Github user ssimeonov commented on the issue: https://github.com/apache/spark/pull/13623 ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13628: [SPARK-15907] [SQL] Issue Exceptions when Not Enough Inp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13628 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60373/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13628: [SPARK-15907] [SQL] Issue Exceptions when Not Enough Inp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13628 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13628: [SPARK-15907] [SQL] Issue Exceptions when Not Enough Inp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13628 **[Test build #60373 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60373/consoleFull)** for PR 13628 at commit [`4c49112`](https://github.com/apache/spark/commit/4c4911226136bd797ed17955e795615f9c145de8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13496: [SPARK-15753][SQL] Move Analyzer stuff to Analyze...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13496#discussion_r66731642 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -452,6 +452,17 @@ class Analyzer( def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +// A partitioned relation's schema can be different from the input logicalPlan, since +// partition columns are all moved after data columns. We Project to adjust the ordering. +val input = if (parts.nonEmpty) { + val (inputPartCols, inputDataCols) = child.output.partition { attr => +parts.contains(attr.name) + } + Project(inputDataCols ++ inputPartCols, child) +} else { + child +} --- End diff -- @cloud-fan ok. I will do it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13542: [SPARK-15730][SQL][WIP] Respect the --hiveconf in...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/13542#discussion_r66731077 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala --- @@ -91,6 +91,8 @@ class CliSuite extends SparkFunSuite with BeforeAndAfterAll with Logging { | --hiveconf ${ConfVars.METASTORECONNECTURLKEY}=$jdbcUrl | --hiveconf ${ConfVars.METASTOREWAREHOUSE}=$warehousePath | --hiveconf ${ConfVars.SCRATCHDIR}=$scratchDirPath + | --hiveconf conf1=conftest + | --hiveconf conf2=1 --- End diff -- yes, it works, that's intention, right? But seems the below code in `SparkSQLCliDriver` will not work as we expected. ```scala if (key != "javax.jdo.option.ConnectionURL") { conf.set(key, value) sessionState.getOverriddenConfigurations.put(key, value) } ``` Why do we have to ignore the connection url? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13629: [SPARK-15370][SQL] Fix count bug
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13629 **[Test build #60375 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60375/consoleFull)** for PR 13629 at commit [`30dd0bd`](https://github.com/apache/spark/commit/30dd0bd7d560151085e53667fcc4f6a8895844ed). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13629: [SPARK-15370][SQL] Fix count bug
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/13629 [SPARK-15370][SQL] Fix count bug # What changes were proposed in this pull request? This pull request fixes the COUNT bug in the `RewriteCorrelatedScalarSubquery` rule. After this change, the rule tests the expression at the root of the correlated subquery to determine whether the expression returns `NULL` on empty input. If the expression does not return `NULL`, the rule generates additional logic in the `Project` operator above the rewritten subquery. This additional logic intercepts `NULL` values coming from the outer join and replaces them with the value that the subquery's expression would return on empty input. This PR is a takes over https://github.com/apache/spark/pull/13155, and it only fixes an issue with `Literal` construction and some style. All credits should go @frreiss. # How was this patch tested? Added regression tests to cover all branches of the updated rule (see changes to `SubquerySuite`). Ran all existing automated regression tests after merging with latest trunk. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hvanhovell/spark SPARK-15370-cleanup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13629.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13629 commit 3b1649105869c72ccb16f86732e04829aaae0e93 Author: frreissDate: 2016-05-16T17:58:00Z Commit before merge. commit 58df60d5468e53c4b6fc41a1d7c896abfb01cdd1 Author: frreiss Date: 2016-05-16T17:58:21Z Merge branch 'master' of https://github.com/apache/spark commit 910cbf54e2300a57640e017610c204da2d462964 Author: frreiss Date: 2016-05-16T20:46:55Z Merge branch 'master' of https://github.com/apache/spark commit 76d9f4528b8536d1e5680279ab76b9e26dd3a873 Author: frreiss Date: 2016-05-17T14:52:46Z Merge branch 'master' of https://github.com/apache/spark commit 1615d560310a59b08a4c03677dd53eb3b9b49e06 Author: frreiss Date: 2016-05-20T02:01:33Z Second version of the updated rewrite commit 1b4ba5ed629d9b1e72d919d89b3592f7b29f3f3c Author: frreiss Date: 2016-05-20T14:57:24Z Merge branch 'master' of https://github.com/apache/spark commit fb7cb4304ba02815a79278d1d5d6d194fe8db25c Author: frreiss Date: 2016-05-24T18:11:54Z Merge branch 'master' of https://github.com/apache/spark commit 8cd2877179dded4557c8da92e5b16011637289b0 Author: frreiss Date: 2016-06-10T05:02:47Z Addressing additional corner cases and review comments. commit e5c592032b5604a8f8f10326ecd10ade22b5dc43 Author: Herman van Hovell Date: 2016-06-12T23:43:30Z Style fixes commit 39f7e043c0abbe27823499699877e986f6fa2eb7 Author: Herman van Hovell Date: 2016-06-12T23:43:32Z Merge remote-tracking branch 'apache-github/master' into SPARK-15370-cleanup commit 30dd0bd7d560151085e53667fcc4f6a8895844ed Author: Herman van Hovell Date: 2016-06-12T23:57:18Z Some simplification --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12313: [SPARK-14543] [SQL] Improve InsertIntoTable column resol...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/12313 If this is too large for merging to 2.0, could @rdblue deliver a small fix for capturing the illegal user inputs? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13628: [SPARK-15907] [SQL] Issue Exceptions when Not Enough Inp...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/13628 There is a better fix in https://github.com/apache/spark/pull/12313. Let me close it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13628: [SPARK-15907] [SQL] Issue Exceptions when Not Eno...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/13628 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13496: [SPARK-15753][SQL] Move Analyzer stuff to Analyze...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13496#discussion_r66730126 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -452,6 +452,17 @@ class Analyzer( def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +// A partitioned relation's schema can be different from the input logicalPlan, since +// partition columns are all moved after data columns. We Project to adjust the ordering. +val input = if (parts.nonEmpty) { + val (inputPartCols, inputDataCols) = child.output.partition { attr => +parts.contains(attr.name) + } + Project(inputDataCols ++ inputPartCols, child) +} else { + child +} --- End diff -- @gatorsmile good catch! The reason we have `insertInto` is to have a SQL INSERT INTO version in `DataFrameWriter`. We should use `saveAsTable` if we need by-name resolution. I have reverted this PR, @viirya do you mind open a new PR to also remove this logic in `insertInto` to make it consistent with SQL version? thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13626: [SPARK-15370][SQL] Revert PR "Update RewriteCorrelatedSu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13626 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13626: [SPARK-15370][SQL] Revert PR "Update RewriteCorrelatedSu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13626 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60372/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13626: [SPARK-15370][SQL] Revert PR "Update RewriteCorrelatedSu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13626 **[Test build #60372 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60372/consoleFull)** for PR 13626 at commit [`ebef12a`](https://github.com/apache/spark/commit/ebef12ad77084ff40db8601cd269f67778de293a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13604: [SPARK-15898][SQL] DataFrameReader.text should return Da...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13604 **[Test build #60374 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60374/consoleFull)** for PR 13604 at commit [`50538b7`](https://github.com/apache/spark/commit/50538b7c0952f6954a19402423e12349037b130c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13628: [SPARK-15907] [SQL] Issue Exceptions when Not Enough Inp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13628 **[Test build #60373 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60373/consoleFull)** for PR 13628 at commit [`4c49112`](https://github.com/apache/spark/commit/4c4911226136bd797ed17955e795615f9c145de8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13614: Support Stata-like tabulation of values in a single colu...
Github user shafiquejamal commented on the issue: https://github.com/apache/spark/pull/13614 Ok, I'll close this an open a new PR with the title done correctly. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13628: [SPARK-15907] [SQL] Issue Exceptions when Not Eno...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/13628 [SPARK-15907] [SQL] Issue Exceptions when Not Enough Input Columns for Dynamic Partitioning What changes were proposed in this pull request? ```SQL CREATE TABLE table_with_partition(c1 string) PARTITIONED by (p1 string,p2 string) INSERT OVERWRITE TABLE table_with_partition partition (p1='a',p2) IF NOT EXISTS SELECT 'blarr3' ``` In the above example, we do not have enough input columns for dynamic partitioning. The first input column is already taken as data columns. This PR is to issue an exception in this scenario. How was this patch tested? Added a test case and fixed an existing test case You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark dynamicPartitioningException Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13628.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13628 commit 4c4911226136bd797ed17955e795615f9c145de8 Author: gatorsmileDate: 2016-06-12T23:35:07Z bug fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13596: [SPARK-15870][SQL] DataFrame can't execute after uncache...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13596 thanks, merging to master and 2.0! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13596: [SPARK-15870][SQL] DataFrame can't execute after ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13596 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13496: [SPARK-15753][SQL] Move Analyzer stuff to Analyze...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13496#discussion_r66729871 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -452,6 +452,17 @@ class Analyzer( def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +// A partitioned relation's schema can be different from the input logicalPlan, since +// partition columns are all moved after data columns. We Project to adjust the ordering. +val input = if (parts.nonEmpty) { + val (inputPartCols, inputDataCols) = child.output.partition { attr => +parts.contains(attr.name) + } + Project(inputDataCols ++ inputPartCols, child) +} else { + child +} --- End diff -- If we use the name based resolution, we also need to check if the all the input columns have the expected partitioning names; Otherwise, the result will be not predictable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunct...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13413#discussion_r66729858 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -89,6 +89,10 @@ class SimpleFunctionRegistry extends FunctionRegistry { functionBuilders.iterator.map(_._1).toList.sorted } + private[catalyst] def functionSet(): Set[String] = synchronized { +functionBuilders.iterator.map(_._1).toSet --- End diff -- Seems we are still creating the set every time when we call `FunctionRegistry.builtin.functionSet`. Can you create this set in the object of `FunctionRegistry`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13496: [SPARK-15753][SQL] Move Analyzer stuff to Analyze...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13496#discussion_r66729833 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -452,6 +452,17 @@ class Analyzer( def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +// A partitioned relation's schema can be different from the input logicalPlan, since +// partition columns are all moved after data columns. We Project to adjust the ordering. +val input = if (parts.nonEmpty) { + val (inputPartCols, inputDataCols) = child.output.partition { attr => +parts.contains(attr.name) + } + Project(inputDataCols ++ inputPartCols, child) +} else { + child +} --- End diff -- Looks like Hive uses ordering not name to take dynamic partition columns. I am not sure if we want to completely follow this Hive behavior. DataFrameWriter's insertInto doesn't follow this. Besides, the rule in Analyzer is not completely follow this too. @liancheng @rxin @cloud-fan What do you think? Do you think we should change current behavior to follow Hive? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13627: [SPARK-15906][MLlib][WIP] Add complementary naive bayes ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13627 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13627: [SPARK-15906][MLlib][WIP] Add complementary naive...
GitHub user tilumi opened a pull request: https://github.com/apache/spark/pull/13627 [SPARK-15906][MLlib][WIP] Add complementary naive bayes algorithm ## What changes were proposed in this pull request? Add `ComplementaryNaiveBayes.scala` in package `org.apache.spark.mllib.classification` in MLlib module ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/tilumi/spark add_complementary_navie_bayes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13627.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13627 commit 4f67cd12364a830fa579e443c716dd09a9f13f8a Author: Lucas YangDate: 2016-06-11T04:37:55Z extract data aggregattion part in run as method commit d9f6191676c5c2253dcc6983e4418bcb67cf02b9 Author: Lucas Yang Date: 2016-06-11T04:38:18Z add complementary naive bayes algorithm commit 0f02643db606944e2f919ebeaae427efb45515b7 Author: Lucas Yang Date: 2016-06-12T23:03:09Z add Since annotation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13496: [SPARK-15753][SQL] Move Analyzer stuff to Analyze...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13496#discussion_r66729456 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -452,6 +452,17 @@ class Analyzer( def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +// A partitioned relation's schema can be different from the input logicalPlan, since +// partition columns are all moved after data columns. We Project to adjust the ordering. +val input = if (parts.nonEmpty) { + val (inputPartCols, inputDataCols) = child.output.partition { attr => +parts.contains(attr.name) + } + Project(inputDataCols ++ inputPartCols, child) +} else { + child +} --- End diff -- If the dynamic partitioning columns have multiple columns, the name-based reordering becomes risky. Some partitioning columns might not have names/alias. Right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13496: [SPARK-15753][SQL] Move Analyzer stuff to Analyze...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13496#discussion_r66729419 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -452,6 +452,17 @@ class Analyzer( def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +// A partitioned relation's schema can be different from the input logicalPlan, since +// partition columns are all moved after data columns. We Project to adjust the ordering. +val input = if (parts.nonEmpty) { + val (inputPartCols, inputDataCols) = child.output.partition { attr => +parts.contains(attr.name) + } + Project(inputDataCols ++ inputPartCols, child) +} else { + child +} --- End diff -- The names/alias of input columns are not used to determine whether they are the partitioning columns or data columns. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13496: [SPARK-15753][SQL] Move Analyzer stuff to Analyze...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13496#discussion_r66729375 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -452,6 +452,17 @@ class Analyzer( def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +// A partitioned relation's schema can be different from the input logicalPlan, since +// partition columns are all moved after data columns. We Project to adjust the ordering. +val input = if (parts.nonEmpty) { + val (inputPartCols, inputDataCols) = child.output.partition { attr => +parts.contains(attr.name) + } + Project(inputDataCols ++ inputPartCols, child) +} else { + child +} --- End diff -- @gatorsmile Your example confuses me. As the spec you cited, the dynamic partition columns should be last but you put it in the first? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13570: [SPARK-15832][SQL] Embedded IN/EXISTS predicate subquery...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/13570 @hvanhovell @jkbradley Could you add @ioana-delaney to the whitelist? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org