[GitHub] [spark] AmplabJenkins commented on issue #24575: [SPARK-27670][SQL]Add HA for HiveThriftServer2 based on HiveServer2.
AmplabJenkins commented on issue #24575: [SPARK-27670][SQL]Add HA for HiveThriftServer2 based on HiveServer2. URL: https://github.com/apache/spark/pull/24575#issuecomment-532727584 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24575: [SPARK-27670][SQL]Add HA for HiveThriftServer2 based on HiveServer2.
SparkQA removed a comment on issue #24575: [SPARK-27670][SQL]Add HA for HiveThriftServer2 based on HiveServer2. URL: https://github.com/apache/spark/pull/24575#issuecomment-532715522 **[Test build #110917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110917/testReport)** for PR 24575 at commit [`e98c844`](https://github.com/apache/spark/commit/e98c844aa3851d65fac9ea5a5a0581f52bf14077). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24575: [SPARK-27670][SQL]Add HA for HiveThriftServer2 based on HiveServer2.
SparkQA commented on issue #24575: [SPARK-27670][SQL]Add HA for HiveThriftServer2 based on HiveServer2. URL: https://github.com/apache/spark/pull/24575#issuecomment-532727430 **[Test build #110917 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110917/testReport)** for PR 24575 at commit [`e98c844`](https://github.com/apache/spark/commit/e98c844aa3851d65fac9ea5a5a0581f52bf14077). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
cloud-fan commented on a change in pull request #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/25651#discussion_r325731051 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableProvider.java ## @@ -36,26 +40,21 @@ public interface TableProvider { /** - * Return a {@link Table} instance to do read/write with user-specified options. + * Return a {@link Table} instance to do read/write with the given table metadata. The returned + * table must report the same schema and partitioning with the given table metadata. * - * @param options the user-specified options that can identify a table, e.g. file path, Kafka - *topic name, etc. It's an immutable case-insensitive string-to-string map. - */ - Table getTable(CaseInsensitiveStringMap options); - - /** - * Return a {@link Table} instance to do read/write with user-specified schema and options. - * - * By default this method throws {@link UnsupportedOperationException}, implementations should - * override this method to handle user-specified schema. - * - * @param options the user-specified options that can identify a table, e.g. file path, Kafka - *topic name, etc. It's an immutable case-insensitive string-to-string map. - * @param schema the user-specified schema. - * @throws UnsupportedOperationException + * @param schema The schema of the table to load. If it's empty, implementations should infer it. + * @param partitions The data partitioning of the table to load. If it's empty, implementations + * should infer it. + * @param properties The properties of the table to load. It should be sufficient to define and + * access a table. The properties map may be {@link CaseInsensitiveStringMap}. + * + * @throws IllegalArgumentException if the implementation can't infer schema/partitioning, or + * the given schema/partitioning doesn't match the actual data + * schema/partitioning. */ - default Table getTable(CaseInsensitiveStringMap options, StructType schema) { -throw new UnsupportedOperationException( - this.getClass().getSimpleName() + " source does not support user-specified schema"); - } + Table getTable( + Optional schema, + Optional partitions, + Map properties); Review comment: I'd like to discuss how the API should look like. The current use cases include 1. users only specify options, implementation needs to infer schema/partitioning 2. users specify options and schema, implementation needs to infer partitioning 3. users specify all the things. Shall we create 3 methods or just create one single method like this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
AmplabJenkins removed a comment on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/25651#issuecomment-532726418 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
AmplabJenkins removed a comment on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/25651#issuecomment-532726429 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16034/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer
AmplabJenkins removed a comment on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer URL: https://github.com/apache/spark/pull/25812#issuecomment-532726838 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer
AmplabJenkins removed a comment on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer URL: https://github.com/apache/spark/pull/25812#issuecomment-532726845 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110916/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer
AmplabJenkins commented on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer URL: https://github.com/apache/spark/pull/25812#issuecomment-532726845 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110916/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer
AmplabJenkins commented on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer URL: https://github.com/apache/spark/pull/25812#issuecomment-532726838 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
AmplabJenkins commented on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/25651#issuecomment-532726429 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16034/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer
SparkQA removed a comment on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer URL: https://github.com/apache/spark/pull/25812#issuecomment-532715432 **[Test build #110916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110916/testReport)** for PR 25812 at commit [`69ea569`](https://github.com/apache/spark/commit/69ea56900c389a5e0046050b0777e0d20284deb6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
AmplabJenkins commented on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/25651#issuecomment-532726418 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer
SparkQA commented on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer URL: https://github.com/apache/spark/pull/25812#issuecomment-532726296 **[Test build #110916 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110916/testReport)** for PR 25812 at commit [`69ea569`](https://github.com/apache/spark/commit/69ea56900c389a5e0046050b0777e0d20284deb6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] phpisciuneri commented on a change in pull request #25818: [SPARK-29121][ML][MLLIB] Support for dot product operation on Vector(s)
phpisciuneri commented on a change in pull request #25818: [SPARK-29121][ML][MLLIB] Support for dot product operation on Vector(s) URL: https://github.com/apache/spark/pull/25818#discussion_r325728399 ## File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ## @@ -178,6 +178,13 @@ sealed trait Vector extends Serializable { */ @Since("2.0.0") def argmax: Int + + /** + * Calculate the dot product of this vector with another. + * + * If `size` does not match an [IllegalArgumentException] is thrown. + */ + def dot(v: Vector): Double = BLAS.dot(this, v) Review comment: @srowen thanks. I added the annotation to each of dot functions in ml and mllib. I actually am not very familiar (scala coder) with the Pyspark and SparkR portions of the code base... Let me have a look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gaborgsomogyi edited a comment on issue #25760: [SPARK-29054][SS] Invalidate Kafka consumer when new delegation token available
gaborgsomogyi edited a comment on issue #25760: [SPARK-29054][SS] Invalidate Kafka consumer when new delegation token available URL: https://github.com/apache/spark/pull/25760#issuecomment-532674237 Thanks guys, valid comments. Let me think it through and update it soon... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25831: [SPARK-29122][SQL] Propagate all the SQL conf to executors in SQLQueryTestSuite
AmplabJenkins removed a comment on issue #25831: [SPARK-29122][SQL] Propagate all the SQL conf to executors in SQLQueryTestSuite URL: https://github.com/apache/spark/pull/25831#issuecomment-532719435 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25831: [SPARK-29122][SQL] Propagate all the SQL conf to executors in SQLQueryTestSuite
AmplabJenkins removed a comment on issue #25831: [SPARK-29122][SQL] Propagate all the SQL conf to executors in SQLQueryTestSuite URL: https://github.com/apache/spark/pull/25831#issuecomment-532719441 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16033/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25831: [SPARK-29122][SQL] Propagate all the SQL conf to executors in SQLQueryTestSuite
AmplabJenkins commented on issue #25831: [SPARK-29122][SQL] Propagate all the SQL conf to executors in SQLQueryTestSuite URL: https://github.com/apache/spark/pull/25831#issuecomment-532719441 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16033/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25831: [SPARK-29122][SQL] Propagate all the SQL conf to executors in SQLQueryTestSuite
AmplabJenkins commented on issue #25831: [SPARK-29122][SQL] Propagate all the SQL conf to executors in SQLQueryTestSuite URL: https://github.com/apache/spark/pull/25831#issuecomment-532719435 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25831: [SPARK-29122][SQL] Propagate all the SQL conf to executors in SQLQueryTestSuite
SparkQA commented on issue #25831: [SPARK-29122][SQL] Propagate all the SQL conf to executors in SQLQueryTestSuite URL: https://github.com/apache/spark/pull/25831#issuecomment-532718855 **[Test build #110918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110918/testReport)** for PR 25831 at commit [`e0d8a5e`](https://github.com/apache/spark/commit/e0d8a5e119b9319cefd5f6d93e51ed7f5e4bd60f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu opened a new pull request #25831: [SPARK-29122][SQL] Propagate all the SQL conf to executors in SQLQueryTestSuite
maropu opened a new pull request #25831: [SPARK-29122][SQL] Propagate all the SQL conf to executors in SQLQueryTestSuite URL: https://github.com/apache/spark/pull/25831 ### What changes were proposed in this pull request? This pr is to propagate all the SQL configurations to executors in `SQLQueryTestSuite`. When the propagation enabled in the tests, a potential bug below becomes apparent; ``` CREATE TABLE num_data (id int, val decimal(38,10)) USING parquet; select sum(udf(CAST(null AS Decimal(38,0 from range(1,4): QueryOutput(select sum(udf(CAST(null AS Decimal(38,0 from range(1,4),struct<>,java.lang.IllegalArgumentException [info] requirement failed: MutableProjection cannot use UnsafeRow for output data types: decimal(38,0)) (SQLQueryTestSuite.scala:380) ``` The root culprit is that `InterpretedMutableProjection` has incorrect validation in the interpreter mode: `validExprs.forall { case (e, _) => UnsafeRow.isFixedLength(e.dataType) }`. This validation should be the same with the condition (`isMutable`) in `HashAggregate.supportsAggregate`: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L1126 ### Why are the changes needed? Bug fixes. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Added tests in `AggregationQuerySuite` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer
AmplabJenkins removed a comment on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer URL: https://github.com/apache/spark/pull/25812#issuecomment-532716055 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25826: [SPARK-29042][Core][BRANCH-2.4] Sampling-based RDD with unordered input should be INDETERMINATE
AmplabJenkins removed a comment on issue #25826: [SPARK-29042][Core][BRANCH-2.4] Sampling-based RDD with unordered input should be INDETERMINATE URL: https://github.com/apache/spark/pull/25826#issuecomment-532716076 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer
AmplabJenkins removed a comment on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer URL: https://github.com/apache/spark/pull/25812#issuecomment-532716066 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16032/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25826: [SPARK-29042][Core][BRANCH-2.4] Sampling-based RDD with unordered input should be INDETERMINATE
AmplabJenkins removed a comment on issue #25826: [SPARK-29042][Core][BRANCH-2.4] Sampling-based RDD with unordered input should be INDETERMINATE URL: https://github.com/apache/spark/pull/25826#issuecomment-532716083 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16031/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen closed pull request #25815: [SPARK-29118][ML] Avoid redundant computation in transform of GMM & GLR
srowen closed pull request #25815: [SPARK-29118][ML] Avoid redundant computation in transform of GMM & GLR URL: https://github.com/apache/spark/pull/25815 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on issue #25815: [SPARK-29118][ML] Avoid redundant computation in transform of GMM & GLR
srowen commented on issue #25815: [SPARK-29118][ML] Avoid redundant computation in transform of GMM & GLR URL: https://github.com/apache/spark/pull/25815#issuecomment-532716287 Merged to master This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25826: [SPARK-29042][Core][BRANCH-2.4] Sampling-based RDD with unordered input should be INDETERMINATE
AmplabJenkins commented on issue #25826: [SPARK-29042][Core][BRANCH-2.4] Sampling-based RDD with unordered input should be INDETERMINATE URL: https://github.com/apache/spark/pull/25826#issuecomment-532716076 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer
AmplabJenkins commented on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer URL: https://github.com/apache/spark/pull/25812#issuecomment-532716055 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer
AmplabJenkins commented on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer URL: https://github.com/apache/spark/pull/25812#issuecomment-532716066 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16032/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25826: [SPARK-29042][Core][BRANCH-2.4] Sampling-based RDD with unordered input should be INDETERMINATE
AmplabJenkins commented on issue #25826: [SPARK-29042][Core][BRANCH-2.4] Sampling-based RDD with unordered input should be INDETERMINATE URL: https://github.com/apache/spark/pull/25826#issuecomment-532716083 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16031/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #25818: [SPARK-29121][ML][MLLIB] Support for dot product operation on Vector(s)
srowen commented on a change in pull request #25818: [SPARK-29121][ML][MLLIB] Support for dot product operation on Vector(s) URL: https://github.com/apache/spark/pull/25818#discussion_r325715899 ## File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ## @@ -178,6 +178,13 @@ sealed trait Vector extends Serializable { */ @Since("2.0.0") def argmax: Int + + /** + * Calculate the dot product of this vector with another. + * + * If `size` does not match an [IllegalArgumentException] is thrown. + */ + def dot(v: Vector): Double = BLAS.dot(this, v) Review comment: Add `@Since("3.0.0")`. I think this needs to be exposed in Pyspark and/or SparkR too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer
SparkQA commented on issue #25812: [SPARK-22796][PYTHON][ML] Add multiple columns support to PySpark QuantileDiscretizer URL: https://github.com/apache/spark/pull/25812#issuecomment-532715432 **[Test build #110916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110916/testReport)** for PR 25812 at commit [`69ea569`](https://github.com/apache/spark/commit/69ea56900c389a5e0046050b0777e0d20284deb6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24575: [SPARK-27670][SQL]Add HA for HiveThriftServer2 based on HiveServer2.
SparkQA commented on issue #24575: [SPARK-27670][SQL]Add HA for HiveThriftServer2 based on HiveServer2. URL: https://github.com/apache/spark/pull/24575#issuecomment-532715522 **[Test build #110917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110917/testReport)** for PR 24575 at commit [`e98c844`](https://github.com/apache/spark/commit/e98c844aa3851d65fac9ea5a5a0581f52bf14077). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25826: [SPARK-29042][Core][BRANCH-2.4] Sampling-based RDD with unordered input should be INDETERMINATE
SparkQA commented on issue #25826: [SPARK-29042][Core][BRANCH-2.4] Sampling-based RDD with unordered input should be INDETERMINATE URL: https://github.com/apache/spark/pull/25826#issuecomment-532715437 **[Test build #110915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110915/testReport)** for PR 25826 at commit [`e17dd66`](https://github.com/apache/spark/commit/e17dd6610b13883b446c0db7840171974bd9aef3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on issue #25829: [SPARK-29144][ML] Binarizer handle sparse vectors incorrectly with negative threshold
srowen commented on issue #25829: [SPARK-29144][ML] Binarizer handle sparse vectors incorrectly with negative threshold URL: https://github.com/apache/spark/pull/25829#issuecomment-532715181 I think the right answer is to return 1 for all of the implicit 0 entries when the threshold is < 0. Yes it makes it dense, but it's the right answer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25826: [SPARK-29042][Core][BRANCH-2.4] Sampling-based RDD with unordered input should be INDETERMINATE
HyukjinKwon commented on issue #25826: [SPARK-29042][Core][BRANCH-2.4] Sampling-based RDD with unordered input should be INDETERMINATE URL: https://github.com/apache/spark/pull/25826#issuecomment-532713979 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected
HyukjinKwon closed pull request #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected URL: https://github.com/apache/spark/pull/25820 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected
HyukjinKwon commented on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected URL: https://github.com/apache/spark/pull/25820#issuecomment-532712466 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #25814: [SPARK-19926][PYSPARK] make captured exception from JVM side user friendly
HyukjinKwon closed pull request #25814: [SPARK-19926][PYSPARK] make captured exception from JVM side user friendly URL: https://github.com/apache/spark/pull/25814 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25814: [SPARK-19926][PYSPARK] make captured exception from JVM side user friendly
HyukjinKwon commented on issue #25814: [SPARK-19926][PYSPARK] make captured exception from JVM side user friendly URL: https://github.com/apache/spark/pull/25814#issuecomment-532711956 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #25716: [SPARK-29012][SQL] Support special timestamp values
HyukjinKwon closed pull request #25716: [SPARK-29012][SQL] Support special timestamp values URL: https://github.com/apache/spark/pull/25716 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25716: [SPARK-29012][SQL] Support special timestamp values
HyukjinKwon commented on issue #25716: [SPARK-29012][SQL] Support special timestamp values URL: https://github.com/apache/spark/pull/25716#issuecomment-532711283 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on issue #25789: [SPARK-28927][ML] Rethrow block mismatch exception in ALS when input data is nondeterministic
srowen commented on issue #25789: [SPARK-28927][ML] Rethrow block mismatch exception in ALS when input data is nondeterministic URL: https://github.com/apache/spark/pull/25789#issuecomment-532707660 Merged to master This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen closed pull request #25789: [SPARK-28927][ML] Rethrow block mismatch exception in ALS when input data is nondeterministic
srowen closed pull request #25789: [SPARK-28927][ML] Rethrow block mismatch exception in ALS when input data is nondeterministic URL: https://github.com/apache/spark/pull/25789 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on issue #25821: [SPARK-29124][CORE] Use MurmurHash3 `bytesHash(data, seed)` instead of `bytesHash(data)`
srowen commented on issue #25821: [SPARK-29124][CORE] Use MurmurHash3 `bytesHash(data, seed)` instead of `bytesHash(data)` URL: https://github.com/apache/spark/pull/25821#issuecomment-532706603 See comment on other PR - yep I get the idea now, makes sense. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] advancedxy commented on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished
advancedxy commented on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished URL: https://github.com/apache/spark/pull/25825#issuecomment-532706176 > Let's also mention the original PR in the description. Edited the description. And the tests passed, let's merge this then @cloud-fan ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #25802: [SPARK-29095][ML] add extractInstances
srowen commented on a change in pull request #25802: [SPARK-29095][ML] add extractInstances URL: https://github.com/apache/spark/pull/25802#discussion_r325702663 ## File path: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala ## @@ -62,6 +62,40 @@ private[ml] trait PredictorParams extends Params } SchemaUtils.appendColumn(schema, $(predictionCol), DoubleType) } + + /** + * Extract [[labelCol]], weightCol(if any) and [[featuresCol]] from the given dataset, + * and put it in an RDD with strong types. + */ + protected def extractInstances(dataset: Dataset[_]): RDD[Instance] = { +val w = this match { + case p: HasWeightCol => +if (isDefined(p.weightCol) && $(p.weightCol).nonEmpty) { + col($(p.weightCol)).cast(DoubleType) +} else { + lit(1.0) +} + case _ => lit(1.0) Review comment: If it doesn't have a weight column, does it mean there's no point in selecting lit(1.0) as a weight column as it will be unused? or do some algorithms not have a weight column but nevertheless have ways of using a weight? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] turboFei commented on issue #25795: [WIP][SPARK-29037][Core] Spark gives duplicate result when an application was killed
turboFei commented on issue #25795: [WIP][SPARK-29037][Core] Spark gives duplicate result when an application was killed URL: https://github.com/apache/spark/pull/25795#issuecomment-532704768 I have discussed with advancedxy offline, and I am clearly for the solution now. Thanks @advancedxy . I will complete this PR after https://github.com/apache/spark/pull/25739 is merged. Thanks again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #25759: [SPARK-19147][CORE] Gracefully handle error in task after executor is stopped
srowen commented on a change in pull request #25759: [SPARK-19147][CORE] Gracefully handle error in task after executor is stopped URL: https://github.com/apache/spark/pull/25759#discussion_r325700132 ## File path: core/src/main/scala/org/apache/spark/executor/Executor.scala ## @@ -624,37 +624,43 @@ private[spark] class Executor( execBackend.statusUpdate(taskId, TaskState.KILLED, ser.serialize(reason)) case t: Throwable => - // Attempt to exit cleanly by informing the driver of our failure. - // If anything goes wrong (or this was a fatal exception), we will delegate to - // the default uncaught exception handler, which will terminate the Executor. - logError(s"Exception in $taskName (TID $taskId)", t) - - // SPARK-20904: Do not report failure to driver if if happened during shut down. Because - // libraries may set up shutdown hooks that race with running tasks during shutdown, - // spurious failures may occur and can result in improper accounting in the driver (e.g. - // the task failure would not be ignored if the shutdown happened because of premption, - // instead of an app issue). - if (!ShutdownHookManager.inShutdown()) { -val (accums, accUpdates) = collectAccumulatorsAndResetStatusOnFailure(taskStartTimeNs) -val metricPeaks = WrappedArray.make(metricsPoller.getTaskMetricPeaks(taskId)) - -val serializedTaskEndReason = { - try { -val ef = new ExceptionFailure(t, accUpdates).withAccums(accums) - .withMetricPeaks(metricPeaks) -ser.serialize(ef) - } catch { -case _: NotSerializableException => - // t is not serializable so just send the stacktrace - val ef = new ExceptionFailure(t, accUpdates, false).withAccums(accums) + if (env.isStopped) { Review comment: This is looking OK overall, to me. You might be able to avoid most of the diff due to indentation by only adding a single case: ``` case t: Throwable if env.isStopped => logError(...) case t: Throwable => // unchanged ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on issue #25689: [SPARK-28972][DOCS] Updating unit description in configurations, to maintain consistency
srowen commented on issue #25689: [SPARK-28972][DOCS] Updating unit description in configurations, to maintain consistency URL: https://github.com/apache/spark/pull/25689#issuecomment-532702739 Merged to master This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen closed pull request #25689: [SPARK-28972][DOCS] Updating unit description in configurations, to maintain consistency
srowen closed pull request #25689: [SPARK-28972][DOCS] Updating unit description in configurations, to maintain consistency URL: https://github.com/apache/spark/pull/25689 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected
AmplabJenkins removed a comment on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected URL: https://github.com/apache/spark/pull/25820#issuecomment-532700198 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110898/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected
AmplabJenkins removed a comment on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected URL: https://github.com/apache/spark/pull/25820#issuecomment-532700190 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected
AmplabJenkins commented on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected URL: https://github.com/apache/spark/pull/25820#issuecomment-532700190 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished
AmplabJenkins removed a comment on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished URL: https://github.com/apache/spark/pull/25825#issuecomment-532699672 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110896/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished
AmplabJenkins removed a comment on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished URL: https://github.com/apache/spark/pull/25825#issuecomment-532699661 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected
AmplabJenkins commented on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected URL: https://github.com/apache/spark/pull/25820#issuecomment-532700198 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110898/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #25830: [SPARK-29140][SQL] Handle BinaryType of parameter properly in HashAggregateExec
maropu commented on issue #25830: [SPARK-29140][SQL] Handle BinaryType of parameter properly in HashAggregateExec URL: https://github.com/apache/spark/pull/25830#issuecomment-532700223 In the PR title, array types is more obvious than binary types? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished
AmplabJenkins commented on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished URL: https://github.com/apache/spark/pull/25825#issuecomment-532699661 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished
AmplabJenkins commented on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished URL: https://github.com/apache/spark/pull/25825#issuecomment-532699672 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110896/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file
AmplabJenkins removed a comment on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-532699223 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished
SparkQA removed a comment on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished URL: https://github.com/apache/spark/pull/25825#issuecomment-532584868 **[Test build #110896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110896/testReport)** for PR 25825 at commit [`6ee8d0d`](https://github.com/apache/spark/commit/6ee8d0d6aaddb8185122b9389155b64c102623d0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file
AmplabJenkins removed a comment on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-532699238 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16030/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25728: [SPARK-29020][WIP][SQL] Improving array_sort behaviour
AmplabJenkins removed a comment on issue #25728: [SPARK-29020][WIP][SQL] Improving array_sort behaviour URL: https://github.com/apache/spark/pull/25728#issuecomment-532699304 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16029/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected
SparkQA removed a comment on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected URL: https://github.com/apache/spark/pull/25820#issuecomment-532587821 **[Test build #110898 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110898/testReport)** for PR 25820 at commit [`f2c25f0`](https://github.com/apache/spark/commit/f2c25f068b29e2a71a7e4eacaa075e67001a2652). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25728: [SPARK-29020][WIP][SQL] Improving array_sort behaviour
AmplabJenkins removed a comment on issue #25728: [SPARK-29020][WIP][SQL] Improving array_sort behaviour URL: https://github.com/apache/spark/pull/25728#issuecomment-532699294 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected
SparkQA commented on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected URL: https://github.com/apache/spark/pull/25820#issuecomment-532699395 **[Test build #110898 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110898/testReport)** for PR 25820 at commit [`f2c25f0`](https://github.com/apache/spark/commit/f2c25f068b29e2a71a7e4eacaa075e67001a2652). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25728: [SPARK-29020][WIP][SQL] Improving array_sort behaviour
AmplabJenkins commented on issue #25728: [SPARK-29020][WIP][SQL] Improving array_sort behaviour URL: https://github.com/apache/spark/pull/25728#issuecomment-532699304 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16029/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25728: [SPARK-29020][WIP][SQL] Improving array_sort behaviour
AmplabJenkins commented on issue #25728: [SPARK-29020][WIP][SQL] Improving array_sort behaviour URL: https://github.com/apache/spark/pull/25728#issuecomment-532699294 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file
AmplabJenkins commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-532699223 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file
AmplabJenkins commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-532699238 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16030/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25830: [SPARK-29140][SQL] Handle BinaryType of parameter properly in HashAggregateExec
maropu commented on a change in pull request #25830: [SPARK-29140][SQL] Handle BinaryType of parameter properly in HashAggregateExec URL: https://github.com/apache/spark/pull/25830#discussion_r325694088 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ## @@ -392,6 +394,14 @@ case class HashAggregateExec( """.stripMargin } + private def typeNameForCodegen(clazz: Class[_]): String = { Review comment: It might be better to move this helper function to `CodeGenerator`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished
SparkQA commented on issue #25825: [SPARK-26713][CORE][2.4] Interrupt pipe IO threads in PipedRDD when task is finished URL: https://github.com/apache/spark/pull/25825#issuecomment-532699049 **[Test build #110896 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110896/testReport)** for PR 25825 at commit [`6ee8d0d`](https://github.com/apache/spark/commit/6ee8d0d6aaddb8185122b9389155b64c102623d0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25811: [SPARK-29111][CORE] Support snapshot/restore on KVStore
SparkQA commented on issue #25811: [SPARK-29111][CORE] Support snapshot/restore on KVStore URL: https://github.com/apache/spark/pull/25811#issuecomment-532698564 **[Test build #110912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110912/testReport)** for PR 25811 at commit [`9b63b05`](https://github.com/apache/spark/commit/9b63b054d7a5d49f635c28c55f4d7da97c8bffba). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25728: [SPARK-29020][WIP][SQL] Improving array_sort behaviour
SparkQA commented on issue #25728: [SPARK-29020][WIP][SQL] Improving array_sort behaviour URL: https://github.com/apache/spark/pull/25728#issuecomment-532698559 **[Test build #110913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110913/testReport)** for PR 25728 at commit [`7ad574a`](https://github.com/apache/spark/commit/7ad574a4df899be1f66be70ef955af83b8440ac0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file
SparkQA commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-532698517 **[Test build #110914 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110914/testReport)** for PR 25690 at commit [`36c394c`](https://github.com/apache/spark/commit/36c394ce08e6cac1e32176c684eac0c9d1615831). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25830: [SPARK-29140][SQL] Handle BinaryType of parameter properly in HashAggregateExec
maropu commented on a change in pull request #25830: [SPARK-29140][SQL] Handle BinaryType of parameter properly in HashAggregateExec URL: https://github.com/apache/spark/pull/25830#discussion_r325693305 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/aggregate/HashAggregateSuite.scala ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.aggregate + +import org.apache.spark.sql.Row +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.test.SharedSparkSession +import org.apache.spark.sql.types._ + +class HashAggregateSuite extends SharedSparkSession { + + import testImplicits._ + + test("SPARK-29140 HashAggregateExec aggregating binary type doesn't break codegen compilation") { +val withDistinct = countDistinct($"c1") + +val schema = new StructType().add("c1", BinaryType, nullable = true) +val schemaWithId = StructType(StructField("id", IntegerType, nullable = false) +: schema.fields) + +withSQLConf( +SQLConf.CODEGEN_SPLIT_AGGREGATE_FUNC.key -> "true", +SQLConf.CODEGEN_METHOD_SPLIT_THRESHOLD.key -> "1") { + val emptyRows = spark.sparkContext.parallelize(Seq.empty[Row], 1) + val aggDf = spark.createDataFrame(emptyRows, schemaWithId) +.groupBy($"id" % 10 as "group") +.agg(withDistinct) +.orderBy("group") + aggDf.collect().toSeq Review comment: plz check the result. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25830: [SPARK-29140][SQL] Handle BinaryType of parameter properly in HashAggregateExec
maropu commented on a change in pull request #25830: [SPARK-29140][SQL] Handle BinaryType of parameter properly in HashAggregateExec URL: https://github.com/apache/spark/pull/25830#discussion_r325692826 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/aggregate/HashAggregateSuite.scala ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.aggregate + +import org.apache.spark.sql.Row +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.test.SharedSparkSession +import org.apache.spark.sql.types._ + +class HashAggregateSuite extends SharedSparkSession { + + import testImplicits._ + + test("SPARK-29140 HashAggregateExec aggregating binary type doesn't break codegen compilation") { +val withDistinct = countDistinct($"c1") Review comment: Move to `AggregationQuerySuite`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file
wangyum commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-532698008 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #25811: [SPARK-29111][CORE] Support snapshot/restore on KVStore
HeartSaVioR commented on issue #25811: [SPARK-29111][CORE] Support snapshot/restore on KVStore URL: https://github.com/apache/spark/pull/25811#issuecomment-532697915 retest this, please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #25811: [SPARK-29111][CORE] Support snapshot/restore on KVStore
HeartSaVioR commented on issue #25811: [SPARK-29111][CORE] Support snapshot/restore on KVStore URL: https://github.com/apache/spark/pull/25811#issuecomment-532697859 Known flaky test: SPARK-23197. Not relevant to this patch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25814: [SPARK-19926][PYSPARK] make captured exception from JVM side user friendly
AmplabJenkins commented on issue #25814: [SPARK-19926][PYSPARK] make captured exception from JVM side user friendly URL: https://github.com/apache/spark/pull/25814#issuecomment-532697186 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25814: [SPARK-19926][PYSPARK] make captured exception from JVM side user friendly
AmplabJenkins removed a comment on issue #25814: [SPARK-19926][PYSPARK] make captured exception from JVM side user friendly URL: https://github.com/apache/spark/pull/25814#issuecomment-532697186 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25814: [SPARK-19926][PYSPARK] make captured exception from JVM side user friendly
AmplabJenkins removed a comment on issue #25814: [SPARK-19926][PYSPARK] make captured exception from JVM side user friendly URL: https://github.com/apache/spark/pull/25814#issuecomment-532697198 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110910/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25814: [SPARK-19926][PYSPARK] make captured exception from JVM side user friendly
AmplabJenkins commented on issue #25814: [SPARK-19926][PYSPARK] make captured exception from JVM side user friendly URL: https://github.com/apache/spark/pull/25814#issuecomment-532697198 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110910/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25814: [SPARK-19926][PYSPARK] make captured exception from JVM side user friendly
SparkQA removed a comment on issue #25814: [SPARK-19926][PYSPARK] make captured exception from JVM side user friendly URL: https://github.com/apache/spark/pull/25814#issuecomment-532685386 **[Test build #110910 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110910/testReport)** for PR 25814 at commit [`9e4e7e9`](https://github.com/apache/spark/commit/9e4e7e98cf4cca4248afb81c34168eef63fbd5cf). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25814: [SPARK-19926][PYSPARK] make captured exception from JVM side user friendly
SparkQA commented on issue #25814: [SPARK-19926][PYSPARK] make captured exception from JVM side user friendly URL: https://github.com/apache/spark/pull/25814#issuecomment-532696727 **[Test build #110910 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110910/testReport)** for PR 25814 at commit [`9e4e7e9`](https://github.com/apache/spark/commit/9e4e7e98cf4cca4248afb81c34168eef63fbd5cf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file
AmplabJenkins removed a comment on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-532695137 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110874/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking commented on a change in pull request #25768: [SPARK-29063][SQL] Modify fillValue approach to support joined dataframe
xuanyuanking commented on a change in pull request #25768: [SPARK-29063][SQL] Modify fillValue approach to support joined dataframe URL: https://github.com/apache/spark/pull/25768#discussion_r325689461 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ## @@ -497,12 +497,10 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { throw new IllegalArgumentException(s"$targetType is not matched at fillValue") } // Only fill if the column is part of the cols list. - if (typeMatches && cols.exists(col => columnEquals(f.name, col))) { -fillCol[T](f, value) - } else { -df.col(f.name) - } + typeMatches && cols.exists(col => columnEquals(f.name, col)) +}.map { col => + (col.name, fillCol[T](col, value)) } -df.select(projections : _*) +df.withColumns(fillColumnsInfo.map(_._1), fillColumnsInfo.map(_._2)) Review comment: Yes, in the new approach, we only pass in the columns found in the existing fields, and `withColumns` will replace the existing columns with the original order. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file
AmplabJenkins removed a comment on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-532695131 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file
AmplabJenkins commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-532695137 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110874/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file
AmplabJenkins commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-532695131 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file
SparkQA removed a comment on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-532552916 **[Test build #110874 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110874/testReport)** for PR 25690 at commit [`36c394c`](https://github.com/apache/spark/commit/36c394ce08e6cac1e32176c684eac0c9d1615831). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] colinmjj commented on a change in pull request #25759: [SPARK-19147][CORE] Gracefully handle error in task after executor is stopped
colinmjj commented on a change in pull request #25759: [SPARK-19147][CORE] Gracefully handle error in task after executor is stopped URL: https://github.com/apache/spark/pull/25759#discussion_r325688306 ## File path: core/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala ## @@ -246,6 +246,46 @@ class ExecutorSuite extends SparkFunSuite heartbeatZeroAccumulatorUpdateTest(false) } + test("SPARK-19147: Gracefully handle error in task after executor is stopped") { Review comment: Remove the case after I make clearly how metrics works, thanks for review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file
SparkQA commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-maven] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-532694695 **[Test build #110874 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110874/testReport)** for PR 25690 at commit [`36c394c`](https://github.com/apache/spark/commit/36c394ce08e6cac1e32176c684eac0c9d1615831). * This patch **fails from timeout after a configured wait of `400m`**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] colinmjj commented on a change in pull request #25759: [SPARK-19147][CORE] Gracefully handle error in task after executor is stopped
colinmjj commented on a change in pull request #25759: [SPARK-19147][CORE] Gracefully handle error in task after executor is stopped URL: https://github.com/apache/spark/pull/25759#discussion_r325687758 ## File path: core/src/main/scala/org/apache/spark/executor/Executor.scala ## @@ -604,6 +604,21 @@ private[spark] class Executor( val serializedTK = ser.serialize(TaskKilled(killReason, accUpdates, accums, metricPeaks)) execBackend.statusUpdate(taskId, TaskState.KILLED, serializedTK) +// When put the task in the pool, executor.stop may be called before task.run. +// The exception will be thrown from the task becauseof the unexpected status, +// see: SPARK-19147, here is to process the exception after executor.stop +// as the excepted exception. +case t: Throwable if !isLocal && env.isStopped => Review comment: @srowen @squito , thanks for the comments, I check the code again and make clearly how metrics & heartbeat work. You're right, report metrics is meaningless after executor.close(), because heartbeat won't work. Update the pr and the exception will be processed in "case t: Throwable =>" part with log only. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] PavithraRamachandran commented on issue #25689: [SPARK-28972][DOCS] Updating unit description in configurations, to maintain consistency
PavithraRamachandran commented on issue #25689: [SPARK-28972][DOCS] Updating unit description in configurations, to maintain consistency URL: https://github.com/apache/spark/pull/25689#issuecomment-532693849 @srowen @kiszk kiszk @dongjoon-hyun i have reworked the comments. Could you review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25404: [SPARK-28683][BUILD][test-hadoop3.2][test-maven] Upgrade Scala to 2.12.10
AmplabJenkins removed a comment on issue #25404: [SPARK-28683][BUILD][test-hadoop3.2][test-maven] Upgrade Scala to 2.12.10 URL: https://github.com/apache/spark/pull/25404#issuecomment-532692535 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16028/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org