[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/7 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21596 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22337: [SPARK-25338][Test] Ensure to call super.beforeAll() and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22337 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95909/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22337: [SPARK-25338][Test] Ensure to call super.beforeAll() and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22337 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22337: [SPARK-25338][Test] Ensure to call super.beforeAll() and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22337 **[Test build #95909 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95909/testReport)** for PR 22337 at commit [`2d9e34a`](https://github.com/apache/spark/commit/2d9e34abdae13efa1fac9e906f331cdd04105e82). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class FlatMapGroupsWithStateSuite extends StateStoreMetricsTest ` * `class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions ` * `class StreamingDeduplicationSuite extends StateStoreMetricsTest ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22384: [SPARK-25398][CORE][MESOS] Minor bugs from comparing unr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22384 **[Test build #4334 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4334/testReport)** for PR 22384 at commit [`9e70b62`](https://github.com/apache/spark/commit/9e70b625992310a44880a9e42f6fead6c2068dc7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...
Github user seancxmao commented on the issue: https://github.com/apache/spark/pull/22343 @dongjoon-hyun It is a little complicated. There has been a discussion about this in #22184. Below are some key comments from @cloud-fan and @gatorsmile, just FYI. * https://github.com/apache/spark/pull/22184#discussion_r212834477 * https://github.com/apache/spark/pull/22184#issuecomment-416885728 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21860 LGTM cc @cloud-fan @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22355: [SPARK-25358][SQL] MutableProjection supports fallback t...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22355 Thanks for your review, kirs! I'll update in a day. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r216545091 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -110,7 +110,17 @@ private[sql] object ParquetSchemaPruning extends Rule[LogicalPlan] { val projectionRootFields = projects.flatMap(getRootFields) val filterRootFields = filters.flatMap(getRootFields) -(projectionRootFields ++ filterRootFields).distinct +// Kind of expressions don't need to access any fields of a root fields, e.g., `IsNotNull`. +// For them, if there are any nested fields accessed in the query, we don't need to add root +// field access of above expressions. +// For example, for a query `SELECT name.first FROM contacts WHERE name IS NOT NULL`, +// we don't need to read nested fields of `name` struct other than `first` field. --- End diff -- I'm having trouble accepting this, but perhaps I'm reading too much into it (or not enough). Let me illustrate with a couple of queries and their physical plans. Assuming the data model in `ParquetSchemaPruningSuite.scala`, the physical plan for the query select employer.id from contacts where employer is not null is ``` == Physical Plan == *(1) Project [employer#36.id AS id#46] +- *(1) Filter isnotnull(employer#36) +- *(1) FileScan parquet [employer#36,p#37] Batched: false, Format: Parquet, PartitionCount: 2, PartitionFilters: [], PushedFilters: [IsNotNull(employer)], ReadSchema: struct> ``` The physical plan for the query select employer.id from contacts where employer.id is not null is ``` == Physical Plan == *(1) Project [employer#36.id AS id#47] +- *(1) Filter (isnotnull(employer#36) && isnotnull(employer#36.id)) +- *(1) FileScan parquet [employer#36,p#37] Batched: false, Format: Parquet, PartitionCount: 2, PartitionFilters: [], PushedFilters: [IsNotNull(employer)], ReadSchema: struct> ``` The read schemata are the same, but the query filters are not. The file scan for the second query looks as I would expect, but the scan for the first query appears to only read `employer.id` even though it needs to check `employer is not null`. If it only reads `employer.id`, how does it check that `employer.company` is not null? Perhaps `employer.id` is null but `employer.company` is not null for some row... I have run some tests to validate that this PR is returning the correct results for both queries, and it is. But I don't understand why. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22343 What I asked was the following, wasn't it? > In case-insensitive mode, when converting hive parquet table to parquet data source, we switch the duplicated fields resolution mode to ask parquet data source to pick the first matched field - the same behavior as hive parquet table - to keep behaviors consistent. Spark should not pick up the first matched field in any cases because it's considered as a correctness behavior in previous PR which is backported to `branch-2.3` https://github.com/apache/spark/pull/22183. I don't think we need to follow incorrect Hive behavior. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22381: [SPARK-25394][CORE] Add an application status metrics so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22381 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22381: [SPARK-25394][CORE] Add an application status metrics so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22381 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95905/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22381: [SPARK-25394][CORE] Add an application status metrics so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22381 **[Test build #95905 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95905/testReport)** for PR 22381 at commit [`a8fc89d`](https://github.com/apache/spark/commit/a8fc89d51971e16c37783cf336daa07d18e6d3c1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21688 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95902/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21688 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21688 **[Test build #95902 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95902/testReport)** for PR 21688 at commit [`573390d`](https://github.com/apache/spark/commit/573390d0a933e7a2641f944046442a136bba6cd8). * This patch **fails from timeout after a configured wait of \`400m\`**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2998/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22387 **[Test build #95916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95916/testReport)** for PR 22387 at commit [`a7b857c`](https://github.com/apache/spark/commit/a7b857c69fa20615108413d6f17a87978ca44ae2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22387 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22387 Retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22384: [SPARK-25398][CORE][MESOS] Minor bugs from comparing unr...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22384 Thanks, @srowen . If you ran `inspection` for all modules, what about removing all tags `[CORE][MESOS]`? Otherwise, could you put `[SQL]` on the title because four `sql` module fixes are included, too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22376 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2996/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22385: [SPARK-25400][CORE] Increase test timeouts
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22385 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2997/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22385: [SPARK-25400][CORE] Increase test timeouts
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22385 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22376 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22376 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2996/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22387 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22387 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95910/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22387 **[Test build #95910 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95910/testReport)** for PR 22387 at commit [`a7b857c`](https://github.com/apache/spark/commit/a7b857c69fa20615108413d6f17a87978ca44ae2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22192 **[Test build #95915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95915/testReport)** for PR 22192 at commit [`447c5e5`](https://github.com/apache/spark/commit/447c5e5974ca2a176026e63518a7a6cf29b78008). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22384: [SPARK-25398][CORE][MESOS] Minor bugs from compar...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22384#discussion_r216540786 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala --- @@ -147,7 +147,7 @@ class PropagateEmptyRelationSuite extends PlanTest { .where(false) .select('a) .where('a > 1) - .where('a != 200) + .where('a =!= 200) --- End diff -- Oh, thank you for fixing this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21710 I think we missed the windows before the branch, I'll review in a few days --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/22192 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22385: [SPARK-25400][CORE] Increase test timeouts
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22385 **[Test build #95914 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95914/testReport)** for PR 22385 at commit [`daf76ed`](https://github.com/apache/spark/commit/daf76ed592ed82aa4b390b444c4669ae65c9b355). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22385: [SPARK-25400][CORE] Increase test timeouts
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22385 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22382: [SPARK-23243] [SPARK-20715][CORE][2.2] Fix RDD.repartiti...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22382 thanks, merging to 2.2! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22376 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2996/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21649: [SPARK-23648][R][SQL]Adds more types for hint in ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21649#discussion_r216539767 --- Diff: R/pkg/R/DataFrame.R --- @@ -3939,7 +3929,15 @@ setMethod("hint", signature(x = "SparkDataFrame", name = "character"), function(x, name, ...) { parameters <- list(...) -stopifnot(all(sapply(parameters, isTypeAllowedForSqlHint))) +stopifnot(all(sapply(parameters, function(x) { --- End diff -- if recall, let's not have a inside scope with the same variable name `x` in the outer scope? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22370: don't link to deprecated function
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22370#discussion_r216539411 --- Diff: R/pkg/R/catalog.R --- @@ -69,7 +69,6 @@ createExternalTable <- function(x, ...) { #' @param ... additional named parameters as options for the data source. #' @return A SparkDataFrame. #' @rdname createTable -#' @seealso \link{createExternalTable} --- End diff -- `registerTempTable` is because of the `@family` tag, so it's a bit different. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 > FYI, per further checking code and discussion with @dbtsai regarding with predicate pushdown, we know that predicate push down only works for primitive types on Parquet datasource. So both `IsNotNull(employer)` and `IsNotNull(employer.id)` are not actually pushed down to work at Parquet reader I would expect `IsNotNull(employer.id)` to be pushed down. In any case, I misunderstood what that `PushedFilters` metadata item means in the `FileScan` part of the physical plan. I thought that was a Parquet filter, but sometimes it is not. In any case, I'm not concerned about supporting filter push down at this point. My concern is around its side effects, but that has been allayed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22379#discussion_r216538924 --- Diff: R/pkg/R/functions.R --- @@ -3720,3 +3720,22 @@ setMethod("current_timestamp", jc <- callJStatic("org.apache.spark.sql.functions", "current_timestamp") column(jc) }) + +#' @details +#' \code{from_csv}: Parses a column containing a CSV string into a Column of \code{structType} +#' with the specified \code{schema}. +#' If the string is unparseable, the Column will contain the value NA. +#' +#' @rdname column_collection_functions +#' @param schema a DDL-formatted string +#' @aliases from_csv from_csv,Column,character-method +#' +#' @note from_csv since 3.0.0 +setMethod("from_csv", signature(x = "Column", schema = "character"), + function(x, schema, ...) { --- End diff -- can you add to the doc for `...` (in column_collection_functions) to indicate the use options for this function? if there is anything new? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22376 **[Test build #95913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95913/testReport)** for PR 22376 at commit [`4a0cffb`](https://github.com/apache/spark/commit/4a0cffb3ce9e1bede43e6a89fdd7a7b912bf93d2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/22357 if recall, parquet reader can have filter pushdown? only not so in spark parquet data source? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22387 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/22376 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Struc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22282 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Struc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22282 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95906/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Struc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22282 **[Test build #95906 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95906/testReport)** for PR 22282 at commit [`220bd0a`](https://github.com/apache/spark/commit/220bd0a90b0c606b8f74c227218bec7bb6614782). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...
Github user seancxmao commented on the issue: https://github.com/apache/spark/pull/22343 Hi, @dongjoon-hyun When we find duplicated field names in the case of convertMetastoreXXX, we have 2 options (1) raise exception as parquet data source. To most of end users, they do not know the difference between hive parquet table and parquet data source. If the conversion leads to different behaviors, they may be confused, and in some cases even lead to tricky data issues silently. (2) Adjust behaviors of parquet data source to keep behaviors consistent. This seems more friendly to end users, and avoid any potential issues introduced by the conversion. BTW, for parquet data source which is not converted from hive parquet table, we raise exception when there is ambiguity, sine this is more intuitive and reasonable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22386: [SPARK-25399] Continuous processing state should not aff...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22386 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95908/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22386: [SPARK-25399] Continuous processing state should not aff...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22386 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22386: [SPARK-25399] Continuous processing state should not aff...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22386 **[Test build #95908 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95908/testReport)** for PR 22386 at commit [`c2f813b`](https://github.com/apache/spark/commit/c2f813bb46bd08ee808ef35ad9569fb9dc7194a6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...
Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/22371 @squito , thanks for the review. I intend to using `ConcurrentHashMap[Int, AtomicReferenceArray]` previously. After re-think the code, I can know the lock here is used to prevent the same task with different attempt to commit the shuffle writer result at the same time. The task has a different attempt can be caused by follows: 1. Failed task or stage. In this case, the previous task attempt should already finish(failed or killed) or the result is not used anymore. 2. `Speculative task`. In this case, the speculative task can't be scheduled to the same executor as other attempts. So, what's real value for the lock. Maybe I'm wrong, hopeful some answers. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user bersprockets commented on the issue: https://github.com/apache/spark/pull/22192 retest this please. It's that old "java.lang.reflect.InvocationTargetException: null" error we've seen many times. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22357 FYI, per further checking code and discussion with @dbtsai regarding with predicate pushdown, we know that predicate push down only works for primitive types on Parquet datasource. So both `IsNotNull(employer)` and `IsNotNull(employer.id)` are not actually pushed down to work at Parquet reader. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22374: [SPARK-25387][SQL] Fix for NPE caused by bad CSV input
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22374 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95904/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22374: [SPARK-25387][SQL] Fix for NPE caused by bad CSV input
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22374 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22388: Revert [SPARK-24882][SQL] improve data source v2 API fro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22388 **[Test build #95912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95912/testReport)** for PR 22388 at commit [`e31ecfa`](https://github.com/apache/spark/commit/e31ecfa574393971586fa04d93766343f7661399). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22374: [SPARK-25387][SQL] Fix for NPE caused by bad CSV input
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22374 **[Test build #95904 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95904/testReport)** for PR 22374 at commit [`bd4ebe4`](https://github.com/apache/spark/commit/bd4ebe44c3268311e1a3569f9c32b9875ccbb292). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22388: Revert [SPARK-24882][SQL] improve data source v2 API fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22388 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2995/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22388: Revert [SPARK-24882][SQL] improve data source v2 API fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22388 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22388: Revert [SPARK-24882][SQL] improve data source v2 ...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/22388 Revert [SPARK-24882][SQL] improve data source v2 API from branch 2.4 ## What changes were proposed in this pull request? As discussed in the dev list, we don't want to include this change in Spark 2.4, as it needs data source v2 users to change the implementation intensitively, while they need to change again in next release. ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark revert Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22388.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22388 commit b4cf7701146675c682d51a72279c9c98a62e21c9 Author: Wenchen Fan Date: 2018-09-11T00:59:16Z Revert "[SPARK-24882][SQL] improve data source v2 API" This reverts commit e754887182304ad0d622754e33192ebcdd515965. commit e31ecfa574393971586fa04d93766343f7661399 Author: Wenchen Fan Date: 2018-09-11T02:24:36Z fix import --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22388: Revert [SPARK-24882][SQL] improve data source v2 API fro...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22388 cc @rxin @rdblue --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....
Github user ifilonenko commented on the issue: https://github.com/apache/spark/pull/22376 @holdenk @felixcheung can this be merged as error isn't related to the features presented in this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22375: [WIP][SPARK-25388][Test][SQL] Detect incorrect nullable ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22375 **[Test build #95911 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95911/testReport)** for PR 22375 at commit [`51aa9d5`](https://github.com/apache/spark/commit/51aa9d58999a546678a8dff1660e3e9f6d73ec8b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22375: [WIP][SPARK-25388][Test][SQL] Detect incorrect nullable ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22375 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2994/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22375: [WIP][SPARK-25388][Test][SQL] Detect incorrect nullable ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22375 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22375: [WIP][SPARK-25388][Test][SQL] Detect incorrect nullable ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22375 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2993/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22387 **[Test build #95910 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95910/testReport)** for PR 22387 at commit [`a7b857c`](https://github.com/apache/spark/commit/a7b857c69fa20615108413d6f17a87978ca44ae2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22387 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22385: [SPARK-25400][CORE] Increase test timeouts
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22385 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95903/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22385: [SPARK-25400][CORE] Increase test timeouts
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22385 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22385: [SPARK-25400][CORE] Increase test timeouts
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22385 **[Test build #95903 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95903/testReport)** for PR 22385 at commit [`daf76ed`](https://github.com/apache/spark/commit/daf76ed592ed82aa4b390b444c4669ae65c9b355). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22387: [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix I...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/22387 [SPARK-25313][SQL][FOLLOW-UP][BACKPORT-2.3] Fix InsertIntoHiveDirCommand output schema in Parquet issue ## What changes were proposed in this pull request? Backport https://github.com/apache/spark/pull/22359 to branch-2.3. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-25313-FOLLOW-UP-branch-2.3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22387.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22387 commit a7b857c69fa20615108413d6f17a87978ca44ae2 Author: Yuming Wang Date: 2018-09-11T02:02:55Z [SPARK-25313][SQL][FOLLOW-UP] Fix InsertIntoHiveDirCommand output schema in Parquet issue --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22355: [SPARK-25358][SQL] MutableProjection supports fal...
Github user rednaxelafx commented on a diff in the pull request: https://github.com/apache/spark/pull/22355#discussion_r216525084 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.InternalRow + + +/** + * A [[MutableProjection]] that is calculated by calling `eval` on each of the specified + * expressions. + * + * @param expressions a sequence of expressions that determine the value of each column of the + *output row. + */ +class InterpretedMutableProjection(expressions: Seq[Expression]) extends MutableProjection { + def this(expressions: Seq[Expression], inputSchema: Seq[Attribute]) = +this(expressions.map(BindReferences.bindReference(_, inputSchema))) --- End diff -- use `toBoundExpr`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22355: [SPARK-25358][SQL] MutableProjection supports fal...
Github user rednaxelafx commented on a diff in the pull request: https://github.com/apache/spark/pull/22355#discussion_r216524666 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala --- @@ -86,24 +86,12 @@ package object expressions { } /** - * Converts a [[InternalRow]] to another Row given a sequence of expression that define each - * column of the new row. If the schema of the input row is specified, then the given expression - * will be bound to that schema. - * - * In contrast to a normal projection, a MutableProjection reuses the same underlying row object - * each time an input row is added. This significantly reduces the cost of calculating the - * projection, but means that it is not safe to hold on to a reference to a [[InternalRow]] after - * `next()` has been called on the [[Iterator]] that produced it. Instead, the user must call - * `InternalRow.copy()` and hold on to the returned [[InternalRow]] before calling `next()`. + * A helper function to bound given expressions to an input schema. --- End diff -- Spelling nitpick: s/bound/bind/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22355: [SPARK-25358][SQL] MutableProjection supports fal...
Github user rednaxelafx commented on a diff in the pull request: https://github.com/apache/spark/pull/22355#discussion_r216526434 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.InternalRow + + +/** + * A [[MutableProjection]] that is calculated by calling `eval` on each of the specified + * expressions. + * + * @param expressions a sequence of expressions that determine the value of each column of the + *output row. + */ +class InterpretedMutableProjection(expressions: Seq[Expression]) extends MutableProjection { + def this(expressions: Seq[Expression], inputSchema: Seq[Attribute]) = +this(expressions.map(BindReferences.bindReference(_, inputSchema))) + + private[this] val buffer = new Array[Any](expressions.size) + + override def initialize(partitionIndex: Int): Unit = { +expressions.foreach(_.foreach { + case n: Nondeterministic => n.initialize(partitionIndex) + case _ => +}) + } + + private[this] val exprArray = expressions.toArray + private[this] var mutableRow: InternalRow = new GenericInternalRow(exprArray.length) + def currentValue: InternalRow = mutableRow + + override def target(row: InternalRow): MutableProjection = { +mutableRow = row +this + } + + override def apply(input: InternalRow): InternalRow = { +var i = 0 +while (i < exprArray.length) { + // Store the result into buffer first, to make the projection atomic (needed by aggregation) + buffer(i) = exprArray(i).eval(input) + i += 1 +} +i = 0 +while (i < exprArray.length) { + mutableRow(i) = buffer(i) + i += 1 +} +mutableRow + } +} --- End diff -- +1 on the check for `NoOp`s. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20999: [SPARK-14922][SPARK-17732][SPARK-23866][SQL] Supp...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20999#discussion_r216525719 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -293,6 +293,28 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } } + /** + * Create a partition specification map with filters. + */ + override def visitDropPartitionSpec( + ctx: DropPartitionSpecContext): Seq[Expression] = { +withOrigin(ctx) { + ctx.dropPartitionVal().asScala.map { pFilter => +if (pFilter.identifier() == null || pFilter.constant() == null || +pFilter.comparisonOperator() == null) { + throw new ParseException(s"Invalid partition spec: ${pFilter.getText}", ctx) +} +// We cannot use UnresolvedAttribute because resolution is performed after Analysis, when +// running the command. The type is not relevant, it is replaced during the real resolution +val partition = + AttributeReference(pFilter.identifier().getText, StringType)() --- End diff -- Ya, looks good to me. But, I'm not sure which one is the right approach, so we'd be better to wait for other reviewer's comments here, too. cc: @gatorsmile @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22379 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22379 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95901/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22364: [SPARK-25379][SQL] Improve AttributeSet and Colum...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22364#discussion_r216525223 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala --- @@ -39,10 +41,15 @@ object AttributeSet { /** Constructs a new [[AttributeSet]] given a sequence of [[Expression Expressions]]. */ def apply(baseSet: Iterable[Expression]): AttributeSet = { -new AttributeSet( - baseSet -.flatMap(_.references) -.map(new AttributeEquals(_)).toSet) --- End diff -- Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22379 **[Test build #95901 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95901/testReport)** for PR 22379 at commit [`d2bfd94`](https://github.com/apache/spark/commit/d2bfd9430f05d006accdecb6a62ed659fbd6a2f8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22337: [SPARK-25338][Test] Ensure to call super.beforeAll() and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22337 **[Test build #95909 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95909/testReport)** for PR 22337 at commit [`2d9e34a`](https://github.com/apache/spark/commit/2d9e34abdae13efa1fac9e906f331cdd04105e82). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22337: [SPARK-25338][Test] Ensure to call super.beforeAll() and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22337 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22337: [SPARK-25338][Test] Ensure to call super.beforeAll() and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22337 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2992/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22192 **[Test build #4333 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4333/testReport)** for PR 22192 at commit [`447c5e5`](https://github.com/apache/spark/commit/447c5e5974ca2a176026e63518a7a6cf29b78008). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22337: [SPARK-25338][Test] Ensure to call super.beforeAll() and...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22337 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22365: [SPARK-25381][SQL] Stratified sampling by Column argumen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22365 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95900/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22365: [SPARK-25381][SQL] Stratified sampling by Column argumen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22365 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22365: [SPARK-25381][SQL] Stratified sampling by Column argumen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22365 **[Test build #95900 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95900/testReport)** for PR 22365 at commit [`e85175e`](https://github.com/apache/spark/commit/e85175e18e95d7751748d4615792579375859786). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22341: [SPARK-24889][Core] Update block info when unpers...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22341#discussion_r216523989 --- Diff: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala --- @@ -646,7 +647,47 @@ private[spark] class AppStatusListener( } override def onUnpersistRDD(event: SparkListenerUnpersistRDD): Unit = { -liveRDDs.remove(event.rddId) +liveRDDs.remove(event.rddId).foreach { liveRDD => + val executorsToUpdate = new HashSet[LiveExecutor]() --- End diff -- Right. But it would be nice to avoid the hash set if possible. The less stuff listeners have to do, the better. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22341: [SPARK-24889][Core] Update block info when unpers...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22341#discussion_r216523824 --- Diff: core/src/main/scala/org/apache/spark/status/LiveEntity.scala --- @@ -538,6 +538,14 @@ private class LiveRDD(val info: RDDInfo) extends LiveEntity { distributions.get(exec.executorId) } + def getPartitions(): Map[String, LiveRDDPartition] = { +partitions.toMap --- End diff -- Sure, it's an internal API. Listener code needs to avoid doing unnecessary things like copying stuff to avoid issues with dropping events. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22384: [SPARK-25398][CORE][MESOS] Minor bugs from comparing unr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22384 **[Test build #4334 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4334/testReport)** for PR 22384 at commit [`9e70b62`](https://github.com/apache/spark/commit/9e70b625992310a44880a9e42f6fead6c2068dc7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22337: [SPARK-25338][Test] Ensure to call super.beforeAll() and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22337 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95890/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22337: [SPARK-25338][Test] Ensure to call super.beforeAll() and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22337 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22337: [SPARK-25338][Test] Ensure to call super.beforeAll() and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22337 **[Test build #95890 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95890/testReport)** for PR 22337 at commit [`2d9e34a`](https://github.com/apache/spark/commit/2d9e34abdae13efa1fac9e906f331cdd04105e82). * This patch **fails from timeout after a configured wait of \`400m\`**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class FlatMapGroupsWithStateSuite extends StateStoreMetricsTest ` * `class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions ` * `class StreamingDeduplicationSuite extends StateStoreMetricsTest ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22367: [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix empty stri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22367 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95899/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22367: [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix empty stri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22367 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org