[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
SaurabhChawla100 commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-658561486 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
dongjoon-hyun edited a comment on pull request #29118: URL: https://github.com/apache/spark/pull/29118#issuecomment-658560629 Also, cc @cloud-fan , @HyukjinKwon , @maropu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
dongjoon-hyun edited a comment on pull request #29118: URL: https://github.com/apache/spark/pull/29118#issuecomment-658560339 Could you review this, @viirya ? This will protect us from the future regression. This part is tricky. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
dongjoon-hyun commented on pull request #29118: URL: https://github.com/apache/spark/pull/29118#issuecomment-658560629 Also, cc @cloud-fan and @HyukjinKwon . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
dongjoon-hyun commented on pull request #29118: URL: https://github.com/apache/spark/pull/29118#issuecomment-658560339 Could you review this, @viirya ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes
dongjoon-hyun edited a comment on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658559706 The most big factor is file formats instead of Spark side. For example, in the above example, ORC files are small because it supports a special encoding when the input data is sorted with a fixed increment. For Parquet files, the result will be different. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes
dongjoon-hyun edited a comment on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658559706 No~ It depends on file formats instead of Spark side. For example, in the above example, ORC files are small because it supports a special encoding when the input data is sorted with a fixed increment. For Parquet files, the result will be different. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes
dongjoon-hyun commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658559706 No~ It depends on file formats instead of Spark side. For example, in the above example, ORC files are small because it supports a special encoding when the data is sorted with a fixed increment. For Parquet files, the result will be different. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes
dongjoon-hyun commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658558813 I made a PR to add a test coverage for the above case. - https://github.com/apache/spark/pull/29118 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes
viirya commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658558946 Oh, this is interesting. I know removing `Sort` before `Repartition` will result in different data distribution because `Repartition` uses `RoundRobinPartitioning`. Because I think repartition doesn't guarantee shuffled data distribution, so I thought it is okay. Now seems different data distribution causes difference storage output size. I think it is because to repartition sorted data using `RoundRobinPartitioning` can generate more compact output. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY
dongjoon-hyun opened a new pull request #29118: URL: https://github.com/apache/spark/pull/29118 ### What changes were proposed in this pull request? This PR aims to add a test case to EliminateSortsSuite to protect a valid use case which is using ORDER BY in DISTRIBUTE BY statement. ### Why are the changes needed? ``` scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/master") $ ls -al /tmp/master/ total 56 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:12 ./ drwxrwxrwt 15 root wheel 480 Jul 14 22:12 ../ -rw-r--r-- 1 dongjoon wheel8 Jul 14 22:12 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:12 .part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel0 Jul 14 22:12 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:12 part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 932 Jul 14 22:12 part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 939 Jul 14 22:12 part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc ``` If we remove the inner `ORDER BY`, the file size increases. ``` scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/SPARK-32276") $ ls -al /tmp/SPARK-32276/ total 632 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:08 ./ drwxrwxrwt 14 root wheel 448 Jul 14 22:08 ../ -rw-r--r-- 1 dongjoon wheel 8 Jul 14 22:08 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:08 .part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 0 Jul 14 22:08 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:08 part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150735 Jul 14 22:08 part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150741 Jul 14 22:08 part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc ``` ### Does this PR introduce _any_ user-facing change? No. This only improves the test coverage. ### How was this patch tested? Pass the GitHub Action or Jenkins. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel
viirya commented on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658556814 Do you read the above too links? The current approach is repeated random sub-sampling validation, this PR changes to k-fold cross-validation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya edited a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel
viirya edited a comment on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658556814 Do you read the above two links? The current approach is repeated random sub-sampling validation, this PR changes to k-fold cross-validation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log
SparkQA commented on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-658555806 **[Test build #125876 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125876/testReport)** for PR 27694 at commit [`86131af`](https://github.com/apache/spark/commit/86131afcf995fee64a629a7a440f03df8cabdd48). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log
SparkQA removed a comment on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-658519508 **[Test build #125876 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125876/testReport)** for PR 27694 at commit [`86131af`](https://github.com/apache/spark/commit/86131afcf995fee64a629a7a440f03df8cabdd48). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28931: [SPARK-32103][CORE] Support IPv6 host/port in core module
dongjoon-hyun edited a comment on pull request #28931: URL: https://github.com/apache/spark/pull/28931#issuecomment-658553220 Hi, @gatorsmile . Technically, this only handles `host/port` parsing inside `core` module. I'm sure that this is a meaningful step inside Spark. However, we didn't test anything on IPv6. Like what we did for JDK11, I expect lots of hurdle both inside and outside Spark. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28931: [SPARK-32103][CORE] Support IPv6 host/port in core module
dongjoon-hyun edited a comment on pull request #28931: URL: https://github.com/apache/spark/pull/28931#issuecomment-658553220 Hi, @gatorsmile . Technically, this only handles `host/port` parsing inside `core` module. I'm sure that this is a meaningful step inside Spark. However, we didn't test anything on IPv6. Like JDK11, I expects lots of hurdle both inside and outside Spark. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28931: [SPARK-32103][CORE] Support IPv6 host/port in core module
dongjoon-hyun commented on pull request #28931: URL: https://github.com/apache/spark/pull/28931#issuecomment-658553220 Hi, @gatorsmile . Technically, this only handles `host/port` parsing inside `core` module only. I'm sure that this is a meaningful step inside Spark. However, we didn't test anything on IPv6. Like JDK11, I expects lots of hurdle both inside and outside Spark. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] adjordan edited a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel
adjordan edited a comment on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658547236 @viirya Sorry, can you explain? I don't see how it changes the technique, it just allows models from multiple folds to be run in parallel. `MLUtils.kFold` is doing k-fold cross validation, not repeated random sub-sampling validation, right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes
dongjoon-hyun commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658550248 Very sorry, guys. Due to the above regression, I'll revert this commit urgently. We can rethink about this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
maropu commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r454795948 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkScriptTransformationExec.scala ## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import java.io._ +import java.nio.charset.StandardCharsets + +import scala.collection.JavaConverters._ +import scala.util.control.NonFatal + +import org.apache.hadoop.conf.Configuration + +import org.apache.spark.TaskContext +import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow} +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.logical.ScriptInputOutputSchema +import org.apache.spark.sql.types._ +import org.apache.spark.util.{CircularBuffer, RedirectThread} + +/** + * Transforms the input by forking and running the specified script. + * + * @param input the set of expression that should be passed to the script. + * @param script the command that should be executed. + * @param output the attributes that are produced by the script. + */ +case class SparkScriptTransformationExec( +input: Seq[Expression], +script: String, +output: Seq[Attribute], +child: SparkPlan, +ioschema: SparkScriptIOSchema) + extends BaseScriptTransformationExec { + + override def processIterator(inputIterator: Iterator[InternalRow], hadoopConf: Configuration) + : Iterator[InternalRow] = { +val cmd = List("/bin/bash", "-c", script) Review comment: Seems like the implementation of `processIterator` is pretty similar to the Hive one. Could we share the code between them more? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes
dongjoon-hyun commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658549984 **AFTER SPARK-32276** ``` scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/SPARK-32276") ``` ``` $ ls -al /tmp/SPARK-32276/ total 632 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:08 ./ drwxrwxrwt 14 root wheel 448 Jul 14 22:08 ../ -rw-r--r-- 1 dongjoon wheel 8 Jul 14 22:08 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:08 .part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel1188 Jul 14 22:08 .part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 0 Jul 14 22:08 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:08 part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150735 Jul 14 22:08 part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 150741 Jul 14 22:08 part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc ``` **BEFORE** ``` scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b) distribute by a").write.orc("/tmp/master") ``` ``` $ ls -al /tmp/master/ total 56 drwxr-xr-x 10 dongjoon wheel 320 Jul 14 22:12 ./ drwxrwxrwt 15 root wheel 480 Jul 14 22:12 ../ -rw-r--r-- 1 dongjoon wheel8 Jul 14 22:12 ._SUCCESS.crc -rw-r--r-- 1 dongjoon wheel 12 Jul 14 22:12 .part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel 16 Jul 14 22:12 .part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc -rw-r--r-- 1 dongjoon wheel0 Jul 14 22:12 _SUCCESS -rw-r--r-- 1 dongjoon wheel 119 Jul 14 22:12 part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 932 Jul 14 22:12 part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc -rw-r--r-- 1 dongjoon wheel 939 Jul 14 22:12 part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
maropu commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r454780673 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala ## @@ -87,17 +90,60 @@ trait BaseScriptTransformationExec extends UnaryExecNode { } } } + + def wrapper(data: String, dt: DataType): Any = { +dt match { + case StringType => data + case ByteType => JavaUtils.stringToBytes(data) + case IntegerType => data.toInt + case ShortType => data.toShort + case LongType => data.toLong + case FloatType => data.toFloat + case DoubleType => data.toDouble + case dt: DecimalType => BigDecimal(data) + case DateType if conf.datetimeJava8ApiEnabled => +DateTimeUtils.stringToDate( + UTF8String.fromString(data), + DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) + .map(DateTimeUtils.daysToLocalDate).orNull + case DateType => +DateTimeUtils.stringToDate( + UTF8String.fromString(data), + DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) + .map(DateTimeUtils.toJavaDate).orNull + case TimestampType if conf.datetimeJava8ApiEnabled => +DateTimeUtils.stringToTimestamp( + UTF8String.fromString(data), + DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) + .map(DateTimeUtils.microsToInstant).orNull + case TimestampType => +DateTimeUtils.stringToTimestamp( + UTF8String.fromString(data), + DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) + .map(DateTimeUtils.toJavaTimestamp).orNull + case CalendarIntervalType => IntervalUtils.stringToInterval(UTF8String.fromString(data)) + case dataType: DataType => data +} + } } -abstract class BaseScriptTransformationWriterThread( -iter: Iterator[InternalRow], -inputSchema: Seq[DataType], -ioSchema: BaseScriptTransformIOSchema, -outputStream: OutputStream, -proc: Process, -stderrBuffer: CircularBuffer, -taskContext: TaskContext, -conf: Configuration) extends Thread with Logging { +abstract class BaseScriptTransformationWriterThread extends Thread with Logging { + + def iter: Iterator[InternalRow] + + def inputSchema: Seq[DataType] + + def ioSchema: BaseScriptTransformIOSchema + + def outputStream: OutputStream + + def proc: Process + + def stderrBuffer: CircularBuffer + + def taskContext: TaskContext + + def conf: Configuration Review comment: nit: we don't need line breaks? ``` def inputRowFormat: Seq[(String, String)] def outputRowFormat: Seq[(String, String)] def inputSerdeClass: Option[String] def outputSerdeClass: Option[String] def inputSerdeProps: Seq[(String, String)] def outputSerdeProps: Seq[(String, String)] def recordReaderClass: Option[String] def recordWriterClass: Option[String] def schemaLess: Boolean ``` ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala ## @@ -87,17 +90,60 @@ trait BaseScriptTransformationExec extends UnaryExecNode { } } } + + def wrapper(data: String, dt: DataType): Any = { Review comment: `protected` ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkScriptTransformationExec.scala ## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import java.io._ +import java.nio.charset.StandardCharsets + +import scala.collection.JavaConverters._ +import scala.util.control.NonFatal + +import org.apache.hadoop.conf.Configuration + +import org.apache.spark.TaskContext +import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow} +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.logical.ScriptInputOutputSchema +import org.apache.spark.sql.types._ +import org.apache.spark.util.{CircularBuffer, RedirectThread} + +/** + * Transforms the input by forking and running the specified script. + * + * @param input the set of expression that should be
[GitHub] [spark] adjordan edited a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel
adjordan edited a comment on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658547236 @viirya Sorry, can you explain? I don't see how it changes the technique, it just allows models from multiple folds to be run in parallel. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] adjordan commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel
adjordan commented on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658547236 @viirya Sorry, can you explain? I don't see how it changes anything, it just allows models from multiple folds to be run in parallel. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation
srowen commented on a change in pull request #29111: URL: https://github.com/apache/spark/pull/29111#discussion_r454792607 ## File path: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala ## @@ -76,7 +76,7 @@ abstract class Estimator[M <: Model[M]] extends PipelineStage { * @return fitted models, matching the input parameter maps */ @Since("2.0.0") - def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[M] = { + def fit(dataset: Dataset[_], paramMaps: Seq[ParamMap]): Seq[M] = { Review comment: Yeah, this fixes the weird compile error (Arrays + generic types are stricter in Scala 2.13) though I don't directly see what it has to do with type M. Still, this is an API change I think MiMa will fail and I think I need another workaround for _that_. This is an obscure method that isn't even called by tests, AFAICT, so not sure it even has coverage. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation
srowen commented on pull request #29111: URL: https://github.com/apache/spark/pull/29111#issuecomment-658546568 I think I understand the last test failures, will fix too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource
MaxGekk commented on pull request #27366: URL: https://github.com/apache/spark/pull/27366#issuecomment-658546141 @cloud-fan Anything else should I do in the PR to be merged? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] stczwd commented on a change in pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel
stczwd commented on a change in pull request #29088: URL: https://github.com/apache/spark/pull/29088#discussion_r454791986 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CsvOutputWriter.scala ## @@ -39,6 +39,10 @@ class CsvOutputWriter( private val gen = new UnivocityGenerator(dataSchema, writer, params) + if (params.bom) { +writer.write(0xFEFF) Review comment: Excel. It will change the actual value if we add `0xFEFF` in the front. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes
dongjoon-hyun edited a comment on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658544475 To generate small final Parquet/ORC files, we do the above tricks, don't we? This may cause a regression on the size of output storage. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes
dongjoon-hyun edited a comment on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658544475 To generate small final Parquet/ORC files, we do the above tricks, don't we? This PR may cause a regression on the size of output storage. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes
dongjoon-hyun edited a comment on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658544475 To generate small final Parquet/ORC files, we do the above tricks, don't we? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes
dongjoon-hyun commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658544475 To generate small Parquet/ORC files, we do the above tricks, don't we? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] warrenzhu25 edited a comment on pull request #29044: [WIP][SPARK-32227] Fix regression bug in load-spark-env.cmd with Spark 3.0.0
warrenzhu25 edited a comment on pull request #29044: URL: https://github.com/apache/spark/pull/29044#issuecomment-656771107 > It's directly relevant to this PR because your patch is changing `environment` variable. > > * Please see this for the detail (https://github.com/cdarlint/winutils) > * You can run AppVeyor in your Spark fork, too. winutils only impacted by PATH and HADOOP_HOME, and I don't touch both. Also, my change is just reverting into the version as 2.4.4. Could you help rerun the tests? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes
dongjoon-hyun commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658543717 Oops. Sorry, guys. It seems that I missed something during testing. For the following case, we should not remove `Sort`. **BEFORE THIS PR** ```scala scala> Seq((1,10),(1,20),(2,30),(2,40)).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b desc) distribute by a").show() +---+---+ | a| b| +---+---+ | 1| 20| | 1| 10| | 2| 40| | 2| 30| +---+---+ ``` **AFTER THIS PR** ```scala scala> Seq((1,10),(1,20),(2,30),(2,40)).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala> sql("select * from (select * from t order by b desc) distribute by a").show() +---+---+ | a| b| +---+---+ | 1| 10| | 1| 20| | 2| 30| | 2| 40| +---+---+ ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] warrenzhu25 commented on pull request #28942: [SPARK-32125][UI] Support get taskList by status in Web UI and SHS Rest API
warrenzhu25 commented on pull request #28942: URL: https://github.com/apache/spark/pull/28942#issuecomment-658543670 @gengliangwang Tests passed, could you help merge this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request #29117: [WIP] Debug flaky pip installation test failure
HyukjinKwon opened a new pull request #29117: URL: https://github.com/apache/spark/pull/29117 ### What changes were proposed in this pull request? TBD ### Why are the changes needed? TBD ### Does this PR introduce _any_ user-facing change? TBD ### How was this patch tested? TBD This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR closed pull request #29077: [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode
HeartSaVioR closed pull request #29077: URL: https://github.com/apache/spark/pull/29077 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #29077: [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode
HeartSaVioR commented on pull request #29077: URL: https://github.com/apache/spark/pull/29077#issuecomment-658539797 Thanks for the reviewing and kind words :) I'll deal with merging. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation
dongjoon-hyun commented on a change in pull request #29111: URL: https://github.com/apache/spark/pull/29111#discussion_r454784921 ## File path: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala ## @@ -76,7 +76,7 @@ abstract class Estimator[M <: Model[M]] extends PipelineStage { * @return fitted models, matching the input parameter maps */ @Since("2.0.0") - def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[M] = { + def fit(dataset: Dataset[_], paramMaps: Seq[ParamMap]): Seq[M] = { Review comment: cc @mengxr and @gatorsmile This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation
dongjoon-hyun commented on a change in pull request #29111: URL: https://github.com/apache/spark/pull/29111#discussion_r454784282 ## File path: examples/src/main/scala/org/apache/spark/examples/SparkKMeans.scala ## @@ -102,5 +102,10 @@ object SparkKMeans { kPoints.foreach(println) spark.stop() } + + private def mergeResults(a: (Vector[Double], Int), + b: (Vector[Double], Int)): (Vector[Double], Int) = { Review comment: nit. Indentation? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes
aokolnychyi commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658538432 Thanks, everyone! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes
dongjoon-hyun commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658538140 Also, cc @gatorsmile and @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel
SparkQA removed a comment on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658519469 **[Test build #125874 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125874/testReport)** for PR 29080 at commit [`6dd0a4d`](https://github.com/apache/spark/commit/6dd0a4d9a2157086ef33bd810f9e250114b33c7d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
AmplabJenkins removed a comment on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658536762 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125866/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel
AmplabJenkins commented on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658537135 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel
AmplabJenkins removed a comment on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658537135 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel
AmplabJenkins removed a comment on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658537137 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125874/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
AmplabJenkins removed a comment on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658536619 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel
SparkQA commented on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658536994 **[Test build #125874 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125874/testReport)** for PR 29080 at commit [`6dd0a4d`](https://github.com/apache/spark/commit/6dd0a4d9a2157086ef33bd810f9e250114b33c7d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
AmplabJenkins commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658536613 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
AmplabJenkins commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658536758 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
SparkQA removed a comment on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658491516 **[Test build #125865 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125865/testReport)** for PR 29114 at commit [`5630999`](https://github.com/apache/spark/commit/5630999689a555f5e026cabe5f7c200ff8b24256). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
AmplabJenkins commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658536691 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
AmplabJenkins removed a comment on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658536613 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
SparkQA commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658536417 **[Test build #125865 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125865/testReport)** for PR 29114 at commit [`5630999`](https://github.com/apache/spark/commit/5630999689a555f5e026cabe5f7c200ff8b24256). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes
dongjoon-hyun closed pull request #29089: URL: https://github.com/apache/spark/pull/29089 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
SparkQA commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658535423 **[Test build #125878 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125878/testReport)** for PR 29114 at commit [`465fd8a`](https://github.com/apache/spark/commit/465fd8a5f4773c3fee69df9c5cf8d3ad57160d03). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
AmplabJenkins removed a comment on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-658534819 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125867/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
AmplabJenkins commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-658534813 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
AmplabJenkins removed a comment on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-658534813 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
SparkQA removed a comment on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-658493500 **[Test build #125867 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125867/testReport)** for PR 28708 at commit [`fe5ba7b`](https://github.com/apache/spark/commit/fe5ba7befc243a30377b0d3057ec3862726db2d3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
AmplabJenkins removed a comment on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-658503907 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/30475/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
SparkQA commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-658534225 **[Test build #125867 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125867/testReport)** for PR 28708 at commit [`fe5ba7b`](https://github.com/apache/spark/commit/fe5ba7befc243a30377b0d3057ec3862726db2d3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
AmplabJenkins removed a comment on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658533895 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
AmplabJenkins commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658533895 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is l
AmplabJenkins removed a comment on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-658533186 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125863/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29116: [SPARK-32316][TESTS][INFRA] Test PySpark with Python 3.8 in Github Actions
HyukjinKwon commented on pull request #29116: URL: https://github.com/apache/spark/pull/29116#issuecomment-658533425 Thanks, @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is l
AmplabJenkins removed a comment on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-658533182 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost
SparkQA removed a comment on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-658485359 **[Test build #125863 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125863/testReport)** for PR 28848 at commit [`0e00862`](https://github.com/apache/spark/commit/0e0086288f6279569e8a11cef9d928b87c40469b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost
AmplabJenkins commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-658533182 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost
SparkQA commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-658532861 **[Test build #125863 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125863/testReport)** for PR 28848 at commit [`0e00862`](https://github.com/apache/spark/commit/0e0086288f6279569e8a11cef9d928b87c40469b). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
AmplabJenkins removed a comment on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658529664 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
AmplabJenkins commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658529664 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
SparkQA commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658529122 **[Test build #125877 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125877/testReport)** for PR 29114 at commit [`bdf31a8`](https://github.com/apache/spark/commit/bdf31a8035ae15c4fb496df173e408453c0ec2a4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #29116: [SPARK-32316][TESTS][INFRA] Test PySpark with Python 3.8 in Github Actions
dongjoon-hyun closed pull request #29116: URL: https://github.com/apache/spark/pull/29116 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29101: [WIP][SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions
maropu commented on pull request #29101: URL: https://github.com/apache/spark/pull/29101#issuecomment-658527647 Just a question; if this proposal works well, we don't need the fix, https://github.com/apache/spark/pull/29075 ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jose-torres commented on pull request #29077: [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode
jose-torres commented on pull request #29077: URL: https://github.com/apache/spark/pull/29077#issuecomment-658526205 LGTM. I don't have the repo fully set up on my new computer, so I'll try to find time to set it up and merge tomorrow. (Or you can do it if you want to try out your new committer powers; congrats btw!) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation
AmplabJenkins removed a comment on pull request #29111: URL: https://github.com/apache/spark/pull/29111#issuecomment-658524127 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125864/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation
AmplabJenkins removed a comment on pull request #29111: URL: https://github.com/apache/spark/pull/29111#issuecomment-658524120 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation
AmplabJenkins commented on pull request #29111: URL: https://github.com/apache/spark/pull/29111#issuecomment-658524120 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation
SparkQA commented on pull request #29111: URL: https://github.com/apache/spark/pull/29111#issuecomment-658523924 **[Test build #125864 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125864/testReport)** for PR 29111 at commit [`bc74297`](https://github.com/apache/spark/commit/bc74297f72cf51c773b6abfe6dcd19c691f3dfac). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class Estimator[M <: Model[M]] extends PipelineStage ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation
SparkQA removed a comment on pull request #29111: URL: https://github.com/apache/spark/pull/29111#issuecomment-658489540 **[Test build #125864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125864/testReport)** for PR 29111 at commit [`bc74297`](https://github.com/apache/spark/commit/bc74297f72cf51c773b6abfe6dcd19c691f3dfac). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel
wangyum commented on a change in pull request #29088: URL: https://github.com/apache/spark/pull/29088#discussion_r454764920 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CsvOutputWriter.scala ## @@ -39,6 +39,10 @@ class CsvOutputWriter( private val gen = new UnivocityGenerator(dataSchema, writer, params) + if (params.bom) { +writer.write(0xFEFF) Review comment: @stczwd What tool will change the value if we use `0xFEFF`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29115: [SPARK-32315][ML] Provide an explanation error message when calling require
AmplabJenkins removed a comment on pull request #29115: URL: https://github.com/apache/spark/pull/29115#issuecomment-658519915 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125873/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log
AmplabJenkins commented on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-658519988 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue
AmplabJenkins removed a comment on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-658519932 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29115: [SPARK-32315][ML] Provide an explanation error message when calling require
AmplabJenkins removed a comment on pull request #29115: URL: https://github.com/apache/spark/pull/29115#issuecomment-658519909 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel
AmplabJenkins removed a comment on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658519974 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log
AmplabJenkins removed a comment on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-658519988 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29115: [SPARK-32315][ML] Provide an explanation error message when calling require
SparkQA removed a comment on pull request #29115: URL: https://github.com/apache/spark/pull/29115#issuecomment-658519446 **[Test build #125873 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125873/testReport)** for PR 29115 at commit [`96d65f4`](https://github.com/apache/spark/commit/96d65f4890e312bc4446b008bf6fadcb2d011779). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel
AmplabJenkins commented on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658519974 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29115: [SPARK-32315][ML] Provide an explanation error message when calling require
SparkQA commented on pull request #29115: URL: https://github.com/apache/spark/pull/29115#issuecomment-658519894 **[Test build #125873 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125873/testReport)** for PR 29115 at commit [`96d65f4`](https://github.com/apache/spark/commit/96d65f4890e312bc4446b008bf6fadcb2d011779). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29115: [SPARK-32315][ML] Provide an explanation error message when calling require
AmplabJenkins commented on pull request #29115: URL: https://github.com/apache/spark/pull/29115#issuecomment-658519909 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue
AmplabJenkins commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-658519932 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log
SparkQA commented on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-658519508 **[Test build #125876 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125876/testReport)** for PR 27694 at commit [`86131af`](https://github.com/apache/spark/commit/86131afcf995fee64a629a7a440f03df8cabdd48). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue
SparkQA commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-658519488 **[Test build #125875 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125875/testReport)** for PR 28904 at commit [`006c028`](https://github.com/apache/spark/commit/006c028cf917bcfa4e78955a280e815418cbc2be). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel
SparkQA commented on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658519469 **[Test build #125874 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125874/testReport)** for PR 29080 at commit [`6dd0a4d`](https://github.com/apache/spark/commit/6dd0a4d9a2157086ef33bd810f9e250114b33c7d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29115: [SPARK-32315][ML] Provide an explanation error message when calling require
SparkQA commented on pull request #29115: URL: https://github.com/apache/spark/pull/29115#issuecomment-658519446 **[Test build #125873 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125873/testReport)** for PR 29115 at commit [`96d65f4`](https://github.com/apache/spark/commit/96d65f4890e312bc4446b008bf6fadcb2d011779). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel
AmplabJenkins removed a comment on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-657302373 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org