date:20200714

[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-14 Thread GitBox



SaurabhChawla100 commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-658561486


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox



dongjoon-hyun edited a comment on pull request #29118:
URL: https://github.com/apache/spark/pull/29118#issuecomment-658560629


   Also, cc @cloud-fan , @HyukjinKwon , @maropu 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox



dongjoon-hyun edited a comment on pull request #29118:
URL: https://github.com/apache/spark/pull/29118#issuecomment-658560339


   Could you review this, @viirya ? This will protect us from the future 
regression. This part is tricky.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox



dongjoon-hyun commented on pull request #29118:
URL: https://github.com/apache/spark/pull/29118#issuecomment-658560629


   Also, cc @cloud-fan and @HyukjinKwon .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox



dongjoon-hyun commented on pull request #29118:
URL: https://github.com/apache/spark/pull/29118#issuecomment-658560339


   Could you review this, @viirya ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox



dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658559706


   The most big factor is file formats instead of Spark side.
   For example, in the above example, ORC files are small because it supports a 
special encoding when the input data is sorted with a fixed increment. For 
Parquet files, the result will be different.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox



dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658559706


   No~ It depends on file formats instead of Spark side.
   For example, in the above example, ORC files are small because it supports a 
special encoding when the input data is sorted with a fixed increment. For 
Parquet files, the result will be different.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox



dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658559706


   No~ It depends on file formats instead of Spark side.
   For example, in the above example, ORC files are small because it supports a 
special encoding when the data is sorted with a fixed increment. For Parquet 
files, the result will be different.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox



dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658558813


   I made a PR to add a test coverage for the above case.
   - https://github.com/apache/spark/pull/29118



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox



viirya commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658558946


   Oh, this is interesting. I know removing `Sort` before `Repartition` will 
result in different data distribution because `Repartition` uses 
`RoundRobinPartitioning`. Because I think repartition doesn't guarantee 
shuffled data distribution, so I thought it is okay.
   
   Now seems different data distribution causes difference storage output size. 
I think it is because to repartition sorted data using `RoundRobinPartitioning` 
can generate more compact output.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun opened a new pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox



dongjoon-hyun opened a new pull request #29118:
URL: https://github.com/apache/spark/pull/29118


   ### What changes were proposed in this pull request?
   
   This PR aims to add a test case to EliminateSortsSuite to protect a valid 
use case which is using ORDER BY in DISTRIBUTE BY statement.
   
   ### Why are the changes needed?
   
   ```
   scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")
   
   scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/master")
   
   $ ls -al /tmp/master/
   total 56
   drwxr-xr-x  10 dongjoon  wheel  320 Jul 14 22:12 ./
   drwxrwxrwt  15 root  wheel  480 Jul 14 22:12 ../
   -rw-r--r--   1 dongjoon  wheel8 Jul 14 22:12 ._SUCCESS.crc
   -rw-r--r--   1 dongjoon  wheel   12 Jul 14 22:12 
.part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel0 Jul 14 22:12 _SUCCESS
   -rw-r--r--   1 dongjoon  wheel  119 Jul 14 22:12 
part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
   -rw-r--r--   1 dongjoon  wheel  932 Jul 14 22:12 
part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
   -rw-r--r--   1 dongjoon  wheel  939 Jul 14 22:12 
part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
   ```
   
   If we remove the inner `ORDER BY`, the file size increases.
   ```
   scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")
   
   scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/SPARK-32276")
   
   $ ls -al /tmp/SPARK-32276/
   total 632
   drwxr-xr-x  10 dongjoon  wheel 320 Jul 14 22:08 ./
   drwxrwxrwt  14 root  wheel 448 Jul 14 22:08 ../
   -rw-r--r--   1 dongjoon  wheel   8 Jul 14 22:08 ._SUCCESS.crc
   -rw-r--r--   1 dongjoon  wheel  12 Jul 14 22:08 
.part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel   0 Jul 14 22:08 _SUCCESS
   -rw-r--r--   1 dongjoon  wheel 119 Jul 14 22:08 
part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
   -rw-r--r--   1 dongjoon  wheel  150735 Jul 14 22:08 
part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
   -rw-r--r--   1 dongjoon  wheel  150741 Jul 14 22:08 
part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. This only improves the test coverage.
   
   ### How was this patch tested?
   
   Pass the GitHub Action or Jenkins.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox



viirya commented on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658556814


   Do you read the above too links? The current approach is repeated random 
sub-sampling validation, this PR changes to k-fold cross-validation.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya edited a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox



viirya edited a comment on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658556814


   Do you read the above two links? The current approach is repeated random 
sub-sampling validation, this PR changes to k-fold cross-validation.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-07-14 Thread GitBox



SparkQA commented on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-658555806


   **[Test build #125876 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125876/testReport)**
 for PR 27694 at commit 
[`86131af`](https://github.com/apache/spark/commit/86131afcf995fee64a629a7a440f03df8cabdd48).
* This patch **fails PySpark pip packaging tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-07-14 Thread GitBox



SparkQA removed a comment on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-658519508


   **[Test build #125876 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125876/testReport)**
 for PR 27694 at commit 
[`86131af`](https://github.com/apache/spark/commit/86131afcf995fee64a629a7a440f03df8cabdd48).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28931: [SPARK-32103][CORE] Support IPv6 host/port in core module

2020-07-14 Thread GitBox



dongjoon-hyun edited a comment on pull request #28931:
URL: https://github.com/apache/spark/pull/28931#issuecomment-658553220


   Hi, @gatorsmile . Technically, this only handles `host/port` parsing inside 
`core` module. I'm sure that this is a meaningful step inside Spark. However, 
we didn't test anything on IPv6. Like what we did for JDK11, I expect lots of 
hurdle both inside and outside Spark.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28931: [SPARK-32103][CORE] Support IPv6 host/port in core module

2020-07-14 Thread GitBox



dongjoon-hyun edited a comment on pull request #28931:
URL: https://github.com/apache/spark/pull/28931#issuecomment-658553220


   Hi, @gatorsmile . Technically, this only handles `host/port` parsing inside 
`core` module. I'm sure that this is a meaningful step inside Spark. However, 
we didn't test anything on IPv6. Like JDK11, I expects lots of hurdle both 
inside and outside Spark.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #28931: [SPARK-32103][CORE] Support IPv6 host/port in core module

2020-07-14 Thread GitBox



dongjoon-hyun commented on pull request #28931:
URL: https://github.com/apache/spark/pull/28931#issuecomment-658553220


   Hi, @gatorsmile . Technically, this only handles `host/port` parsing inside 
`core` module only. I'm sure that this is a meaningful step inside Spark. 
However, we didn't test anything on IPv6. Like JDK11, I expects lots of hurdle 
both inside and outside Spark.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] adjordan edited a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox



adjordan edited a comment on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658547236


   @viirya Sorry, can you explain? I don't see how it changes the technique, it 
just allows models from multiple folds to be run in parallel. `MLUtils.kFold` 
is doing k-fold cross validation, not repeated random sub-sampling validation, 
right?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox



dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658550248


   Very sorry, guys. Due to the above regression, I'll revert this commit 
urgently. We can rethink about this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-14 Thread GitBox



maropu commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r454795948



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkScriptTransformationExec.scala
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import java.io._
+import java.nio.charset.StandardCharsets
+
+import scala.collection.JavaConverters._
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.conf.Configuration
+
+import org.apache.spark.TaskContext
+import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.logical.ScriptInputOutputSchema
+import org.apache.spark.sql.types._
+import org.apache.spark.util.{CircularBuffer, RedirectThread}
+
+/**
+ * Transforms the input by forking and running the specified script.
+ *
+ * @param input the set of expression that should be passed to the script.
+ * @param script the command that should be executed.
+ * @param output the attributes that are produced by the script.
+ */
+case class SparkScriptTransformationExec(
+input: Seq[Expression],
+script: String,
+output: Seq[Attribute],
+child: SparkPlan,
+ioschema: SparkScriptIOSchema)
+  extends BaseScriptTransformationExec {
+
+  override def processIterator(inputIterator: Iterator[InternalRow], 
hadoopConf: Configuration)
+  : Iterator[InternalRow] = {
+val cmd = List("/bin/bash", "-c", script)

Review comment:
   Seems like the implementation of `processIterator` is pretty similar to 
the Hive one. Could we share the code between them more?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox



dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658549984


   **AFTER SPARK-32276**
   ```
   scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")
   
   scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/SPARK-32276")
   ```
   
   ```
   $ ls -al /tmp/SPARK-32276/
   total 632
   drwxr-xr-x  10 dongjoon  wheel 320 Jul 14 22:08 ./
   drwxrwxrwt  14 root  wheel 448 Jul 14 22:08 ../
   -rw-r--r--   1 dongjoon  wheel   8 Jul 14 22:08 ._SUCCESS.crc
   -rw-r--r--   1 dongjoon  wheel  12 Jul 14 22:08 
.part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel   0 Jul 14 22:08 _SUCCESS
   -rw-r--r--   1 dongjoon  wheel 119 Jul 14 22:08 
part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
   -rw-r--r--   1 dongjoon  wheel  150735 Jul 14 22:08 
part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
   -rw-r--r--   1 dongjoon  wheel  150741 Jul 14 22:08 
part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
   ```
   
   **BEFORE**
   ```
   scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")
   
   scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/master")
   ```
   
   ```
   $ ls -al /tmp/master/
   total 56
   drwxr-xr-x  10 dongjoon  wheel  320 Jul 14 22:12 ./
   drwxrwxrwt  15 root  wheel  480 Jul 14 22:12 ../
   -rw-r--r--   1 dongjoon  wheel8 Jul 14 22:12 ._SUCCESS.crc
   -rw-r--r--   1 dongjoon  wheel   12 Jul 14 22:12 
.part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel0 Jul 14 22:12 _SUCCESS
   -rw-r--r--   1 dongjoon  wheel  119 Jul 14 22:12 
part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
   -rw-r--r--   1 dongjoon  wheel  932 Jul 14 22:12 
part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
   -rw-r--r--   1 dongjoon  wheel  939 Jul 14 22:12 
part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-14 Thread GitBox



maropu commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r454780673



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
##
@@ -87,17 +90,60 @@ trait BaseScriptTransformationExec extends UnaryExecNode {
   }
 }
   }
+
+  def wrapper(data: String, dt: DataType): Any = {
+dt match {
+  case StringType => data
+  case ByteType => JavaUtils.stringToBytes(data)
+  case IntegerType => data.toInt
+  case ShortType => data.toShort
+  case LongType => data.toLong
+  case FloatType => data.toFloat
+  case DoubleType => data.toDouble
+  case dt: DecimalType => BigDecimal(data)
+  case DateType if conf.datetimeJava8ApiEnabled =>
+DateTimeUtils.stringToDate(
+  UTF8String.fromString(data),
+  DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+  .map(DateTimeUtils.daysToLocalDate).orNull
+  case DateType =>
+DateTimeUtils.stringToDate(
+  UTF8String.fromString(data),
+  DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+  .map(DateTimeUtils.toJavaDate).orNull
+  case TimestampType if conf.datetimeJava8ApiEnabled =>
+DateTimeUtils.stringToTimestamp(
+  UTF8String.fromString(data),
+  DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+  .map(DateTimeUtils.microsToInstant).orNull
+  case TimestampType =>
+DateTimeUtils.stringToTimestamp(
+  UTF8String.fromString(data),
+  DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+  .map(DateTimeUtils.toJavaTimestamp).orNull
+  case CalendarIntervalType => 
IntervalUtils.stringToInterval(UTF8String.fromString(data))
+  case dataType: DataType => data
+}
+  }
 }
 
-abstract class BaseScriptTransformationWriterThread(
-iter: Iterator[InternalRow],
-inputSchema: Seq[DataType],
-ioSchema: BaseScriptTransformIOSchema,
-outputStream: OutputStream,
-proc: Process,
-stderrBuffer: CircularBuffer,
-taskContext: TaskContext,
-conf: Configuration) extends Thread with Logging {
+abstract class BaseScriptTransformationWriterThread extends Thread with 
Logging {
+
+  def iter: Iterator[InternalRow]
+
+  def inputSchema: Seq[DataType]
+
+  def ioSchema: BaseScriptTransformIOSchema
+
+  def outputStream: OutputStream
+
+  def proc: Process
+
+  def stderrBuffer: CircularBuffer
+
+  def taskContext: TaskContext
+
+  def conf: Configuration

Review comment:
   nit: we don't need line breaks?
   ```
 def inputRowFormat: Seq[(String, String)]
 def outputRowFormat: Seq[(String, String)]
 def inputSerdeClass: Option[String]
 def outputSerdeClass: Option[String]
 def inputSerdeProps: Seq[(String, String)]
 def outputSerdeProps: Seq[(String, String)]
 def recordReaderClass: Option[String]
 def recordWriterClass: Option[String]
 def schemaLess: Boolean
   ```

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
##
@@ -87,17 +90,60 @@ trait BaseScriptTransformationExec extends UnaryExecNode {
   }
 }
   }
+
+  def wrapper(data: String, dt: DataType): Any = {

Review comment:
   `protected`

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkScriptTransformationExec.scala
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import java.io._
+import java.nio.charset.StandardCharsets
+
+import scala.collection.JavaConverters._
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.conf.Configuration
+
+import org.apache.spark.TaskContext
+import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.logical.ScriptInputOutputSchema
+import org.apache.spark.sql.types._
+import org.apache.spark.util.{CircularBuffer, RedirectThread}
+
+/**
+ * Transforms the input by forking and running the specified script.
+ *
+ * @param input the set of expression that should be

[GitHub] [spark] adjordan edited a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox



adjordan edited a comment on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658547236


   @viirya Sorry, can you explain? I don't see how it changes the technique, it 
just allows models from multiple folds to be run in parallel.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] adjordan commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox



adjordan commented on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658547236


   @viirya Sorry, can you explain? I don't see how it changes anything, it just 
allows models from multiple folds to be run in parallel.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-14 Thread GitBox



srowen commented on a change in pull request #29111:
URL: https://github.com/apache/spark/pull/29111#discussion_r454792607



##
File path: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala
##
@@ -76,7 +76,7 @@ abstract class Estimator[M <: Model[M]] extends PipelineStage 
{
* @return fitted models, matching the input parameter maps
*/
   @Since("2.0.0")
-  def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[M] = {
+  def fit(dataset: Dataset[_], paramMaps: Seq[ParamMap]): Seq[M] = {

Review comment:
   Yeah, this fixes the weird compile error (Arrays + generic types are 
stricter in Scala 2.13) though I don't directly see what it has to do with type 
M. Still, this is an API change I think MiMa will fail and I think I need 
another workaround for _that_. This is an obscure method that isn't even called 
by tests, AFAICT, so not sure it even has coverage. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-14 Thread GitBox



srowen commented on pull request #29111:
URL: https://github.com/apache/spark/pull/29111#issuecomment-658546568


   I think I understand the last test failures, will fix too.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-14 Thread GitBox



MaxGekk commented on pull request #27366:
URL: https://github.com/apache/spark/pull/27366#issuecomment-658546141


   @cloud-fan Anything else should I do in the PR to be merged?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] stczwd commented on a change in pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel

2020-07-14 Thread GitBox



stczwd commented on a change in pull request #29088:
URL: https://github.com/apache/spark/pull/29088#discussion_r454791986



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CsvOutputWriter.scala
##
@@ -39,6 +39,10 @@ class CsvOutputWriter(
 
   private val gen = new UnivocityGenerator(dataSchema, writer, params)
 
+  if (params.bom) {
+writer.write(0xFEFF)

Review comment:
   Excel. It will change the actual value if we add `0xFEFF` in the front.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox



dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658544475


   To generate small final Parquet/ORC files, we do the above tricks, don't we? 
This may cause a regression on the size of output storage.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox



dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658544475


   To generate small final Parquet/ORC files, we do the above tricks, don't we? 
This PR may cause a regression on the size of output storage.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox



dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658544475


   To generate small final Parquet/ORC files, we do the above tricks, don't we?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox



dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658544475


   To generate small Parquet/ORC files, we do the above tricks, don't we?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] warrenzhu25 edited a comment on pull request #29044: [WIP][SPARK-32227] Fix regression bug in load-spark-env.cmd with Spark 3.0.0

2020-07-14 Thread GitBox



warrenzhu25 edited a comment on pull request #29044:
URL: https://github.com/apache/spark/pull/29044#issuecomment-656771107


   > It's directly relevant to this PR because your patch is changing 
`environment` variable.
   > 
   > * Please see this for the detail (https://github.com/cdarlint/winutils)
   > * You can run AppVeyor in your Spark fork, too.
   
   winutils only impacted by PATH and HADOOP_HOME, and I don't touch both. 
Also, my change is just reverting into the version as 2.4.4. Could you help 
rerun the tests?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox



dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658543717


   Oops. Sorry, guys. It seems that I missed something during testing. For the 
following case, we should not remove `Sort`.
   
   **BEFORE THIS PR**
   ```scala
   scala> Seq((1,10),(1,20),(2,30),(2,40)).toDF("a", 
"b").repartition(2).createOrReplaceTempView("t")
   
   scala> sql("select * from (select * from t order by b desc) distribute by 
a").show()
   +---+---+
   |  a|  b|
   +---+---+
   |  1| 20|
   |  1| 10|
   |  2| 40|
   |  2| 30|
   +---+---+
   ```
   
   **AFTER THIS PR**
   ```scala
   scala> Seq((1,10),(1,20),(2,30),(2,40)).toDF("a", 
"b").repartition(2).createOrReplaceTempView("t")
   
   scala> sql("select * from (select * from t order by b desc) distribute by 
a").show()
   +---+---+
   |  a|  b|
   +---+---+
   |  1| 10|
   |  1| 20|
   |  2| 30|
   |  2| 40|
   +---+---+
   ```
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] warrenzhu25 commented on pull request #28942: [SPARK-32125][UI] Support get taskList by status in Web UI and SHS Rest API

2020-07-14 Thread GitBox



warrenzhu25 commented on pull request #28942:
URL: https://github.com/apache/spark/pull/28942#issuecomment-658543670


   @gengliangwang Tests passed, could you help merge this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon opened a new pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-14 Thread GitBox



HyukjinKwon opened a new pull request #29117:
URL: https://github.com/apache/spark/pull/29117


   ### What changes were proposed in this pull request?
   
   TBD
   
   ### Why are the changes needed?
   
   TBD
   
   ### Does this PR introduce _any_ user-facing change?
   
   TBD
   
   ### How was this patch tested?
   
   TBD
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR closed pull request #29077: [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode

2020-07-14 Thread GitBox



HeartSaVioR closed pull request #29077:
URL: https://github.com/apache/spark/pull/29077


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #29077: [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode

2020-07-14 Thread GitBox



HeartSaVioR commented on pull request #29077:
URL: https://github.com/apache/spark/pull/29077#issuecomment-658539797


   Thanks for the reviewing and kind words :) I'll deal with merging.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-14 Thread GitBox



dongjoon-hyun commented on a change in pull request #29111:
URL: https://github.com/apache/spark/pull/29111#discussion_r454784921



##
File path: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala
##
@@ -76,7 +76,7 @@ abstract class Estimator[M <: Model[M]] extends PipelineStage 
{
* @return fitted models, matching the input parameter maps
*/
   @Since("2.0.0")
-  def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[M] = {
+  def fit(dataset: Dataset[_], paramMaps: Seq[ParamMap]): Seq[M] = {

Review comment:
   cc @mengxr and @gatorsmile





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-14 Thread GitBox



dongjoon-hyun commented on a change in pull request #29111:
URL: https://github.com/apache/spark/pull/29111#discussion_r454784282



##
File path: examples/src/main/scala/org/apache/spark/examples/SparkKMeans.scala
##
@@ -102,5 +102,10 @@ object SparkKMeans {
 kPoints.foreach(println)
 spark.stop()
   }
+
+  private def mergeResults(a: (Vector[Double], Int),
+   b: (Vector[Double], Int)): (Vector[Double], Int) = {

Review comment:
   nit. Indentation?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aokolnychyi commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox



aokolnychyi commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658538432


   Thanks, everyone!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox



dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658538140


   Also, cc @gatorsmile and @cloud-fan 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox



SparkQA removed a comment on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658519469


   **[Test build #125874 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125874/testReport)**
 for PR 29080 at commit 
[`6dd0a4d`](https://github.com/apache/spark/commit/6dd0a4d9a2157086ef33bd810f9e250114b33c7d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox



AmplabJenkins removed a comment on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658536762


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125866/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox



AmplabJenkins commented on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658537135







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox



AmplabJenkins removed a comment on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658537135


   Merged build finished. Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox



AmplabJenkins removed a comment on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658537137


   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125874/
   Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox



AmplabJenkins removed a comment on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658536619







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox



SparkQA commented on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658536994


   **[Test build #125874 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125874/testReport)**
 for PR 29080 at commit 
[`6dd0a4d`](https://github.com/apache/spark/commit/6dd0a4d9a2157086ef33bd810f9e250114b33c7d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox



AmplabJenkins commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658536613







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox



AmplabJenkins commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658536758







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox



SparkQA removed a comment on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658491516


   **[Test build #125865 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125865/testReport)**
 for PR 29114 at commit 
[`5630999`](https://github.com/apache/spark/commit/5630999689a555f5e026cabe5f7c200ff8b24256).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox



AmplabJenkins commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658536691







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox



AmplabJenkins removed a comment on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658536613


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox



SparkQA commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658536417


   **[Test build #125865 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125865/testReport)**
 for PR 29114 at commit 
[`5630999`](https://github.com/apache/spark/commit/5630999689a555f5e026cabe5f7c200ff8b24256).
* This patch **fails PySpark pip packaging tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox



dongjoon-hyun closed pull request #29089:
URL: https://github.com/apache/spark/pull/29089


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox



SparkQA commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658535423


   **[Test build #125878 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125878/testReport)**
 for PR 29114 at commit 
[`465fd8a`](https://github.com/apache/spark/commit/465fd8a5f4773c3fee69df9c5cf8d3ad57160d03).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox



AmplabJenkins removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-658534819


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125867/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox



AmplabJenkins commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-658534813







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox



AmplabJenkins removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-658534813


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox



SparkQA removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-658493500


   **[Test build #125867 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125867/testReport)**
 for PR 28708 at commit 
[`fe5ba7b`](https://github.com/apache/spark/commit/fe5ba7befc243a30377b0d3057ec3862726db2d3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox



AmplabJenkins removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-658503907


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/30475/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox



SparkQA commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-658534225


   **[Test build #125867 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125867/testReport)**
 for PR 28708 at commit 
[`fe5ba7b`](https://github.com/apache/spark/commit/fe5ba7befc243a30377b0d3057ec3862726db2d3).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox



AmplabJenkins removed a comment on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658533895







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox



AmplabJenkins commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658533895







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is l

2020-07-14 Thread GitBox



AmplabJenkins removed a comment on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-658533186


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125863/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #29116: [SPARK-32316][TESTS][INFRA] Test PySpark with Python 3.8 in Github Actions

2020-07-14 Thread GitBox



HyukjinKwon commented on pull request #29116:
URL: https://github.com/apache/spark/pull/29116#issuecomment-658533425


   Thanks, @dongjoon-hyun 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is l

2020-07-14 Thread GitBox



AmplabJenkins removed a comment on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-658533182


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-07-14 Thread GitBox



SparkQA removed a comment on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-658485359


   **[Test build #125863 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125863/testReport)**
 for PR 28848 at commit 
[`0e00862`](https://github.com/apache/spark/commit/0e0086288f6279569e8a11cef9d928b87c40469b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-07-14 Thread GitBox



AmplabJenkins commented on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-658533182







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-07-14 Thread GitBox



SparkQA commented on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-658532861


   **[Test build #125863 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125863/testReport)**
 for PR 28848 at commit 
[`0e00862`](https://github.com/apache/spark/commit/0e0086288f6279569e8a11cef9d928b87c40469b).
* This patch **fails PySpark pip packaging tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox



AmplabJenkins removed a comment on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658529664







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox



AmplabJenkins commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658529664







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox



SparkQA commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658529122


   **[Test build #125877 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125877/testReport)**
 for PR 29114 at commit 
[`bdf31a8`](https://github.com/apache/spark/commit/bdf31a8035ae15c4fb496df173e408453c0ec2a4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #29116: [SPARK-32316][TESTS][INFRA] Test PySpark with Python 3.8 in Github Actions

2020-07-14 Thread GitBox



dongjoon-hyun closed pull request #29116:
URL: https://github.com/apache/spark/pull/29116


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on pull request #29101: [WIP][SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-14 Thread GitBox



maropu commented on pull request #29101:
URL: https://github.com/apache/spark/pull/29101#issuecomment-658527647


   Just a question; if this proposal works well, we don't need the fix, 
https://github.com/apache/spark/pull/29075 ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jose-torres commented on pull request #29077: [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode

2020-07-14 Thread GitBox



jose-torres commented on pull request #29077:
URL: https://github.com/apache/spark/pull/29077#issuecomment-658526205


   LGTM. I don't have the repo fully set up on my new computer, so I'll try to 
find time to set it up and merge tomorrow. (Or you can do it if you want to try 
out your new committer powers; congrats btw!)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-14 Thread GitBox



AmplabJenkins removed a comment on pull request #29111:
URL: https://github.com/apache/spark/pull/29111#issuecomment-658524127


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125864/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-14 Thread GitBox



AmplabJenkins removed a comment on pull request #29111:
URL: https://github.com/apache/spark/pull/29111#issuecomment-658524120


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-14 Thread GitBox



AmplabJenkins commented on pull request #29111:
URL: https://github.com/apache/spark/pull/29111#issuecomment-658524120







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-14 Thread GitBox



SparkQA commented on pull request #29111:
URL: https://github.com/apache/spark/pull/29111#issuecomment-658523924


   **[Test build #125864 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125864/testReport)**
 for PR 29111 at commit 
[`bc74297`](https://github.com/apache/spark/commit/bc74297f72cf51c773b6abfe6dcd19c691f3dfac).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `abstract class Estimator[M <: Model[M]] extends PipelineStage `



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-14 Thread GitBox



SparkQA removed a comment on pull request #29111:
URL: https://github.com/apache/spark/pull/29111#issuecomment-658489540


   **[Test build #125864 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125864/testReport)**
 for PR 29111 at commit 
[`bc74297`](https://github.com/apache/spark/commit/bc74297f72cf51c773b6abfe6dcd19c691f3dfac).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on a change in pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel

2020-07-14 Thread GitBox



wangyum commented on a change in pull request #29088:
URL: https://github.com/apache/spark/pull/29088#discussion_r454764920



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CsvOutputWriter.scala
##
@@ -39,6 +39,10 @@ class CsvOutputWriter(
 
   private val gen = new UnivocityGenerator(dataSchema, writer, params)
 
+  if (params.bom) {
+writer.write(0xFEFF)

Review comment:
   @stczwd What tool will change the value if we use `0xFEFF`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29115: [SPARK-32315][ML] Provide an explanation error message when calling require

2020-07-14 Thread GitBox



AmplabJenkins removed a comment on pull request #29115:
URL: https://github.com/apache/spark/pull/29115#issuecomment-658519915


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125873/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-07-14 Thread GitBox



AmplabJenkins commented on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-658519988







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue

2020-07-14 Thread GitBox



AmplabJenkins removed a comment on pull request #28904:
URL: https://github.com/apache/spark/pull/28904#issuecomment-658519932







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29115: [SPARK-32315][ML] Provide an explanation error message when calling require

2020-07-14 Thread GitBox



AmplabJenkins removed a comment on pull request #29115:
URL: https://github.com/apache/spark/pull/29115#issuecomment-658519909







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox



AmplabJenkins removed a comment on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658519974







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-07-14 Thread GitBox



AmplabJenkins removed a comment on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-658519988







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29115: [SPARK-32315][ML] Provide an explanation error message when calling require

2020-07-14 Thread GitBox



SparkQA removed a comment on pull request #29115:
URL: https://github.com/apache/spark/pull/29115#issuecomment-658519446


   **[Test build #125873 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125873/testReport)**
 for PR 29115 at commit 
[`96d65f4`](https://github.com/apache/spark/commit/96d65f4890e312bc4446b008bf6fadcb2d011779).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox



AmplabJenkins commented on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658519974







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29115: [SPARK-32315][ML] Provide an explanation error message when calling require

2020-07-14 Thread GitBox



SparkQA commented on pull request #29115:
URL: https://github.com/apache/spark/pull/29115#issuecomment-658519894


   **[Test build #125873 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125873/testReport)**
 for PR 29115 at commit 
[`96d65f4`](https://github.com/apache/spark/commit/96d65f4890e312bc4446b008bf6fadcb2d011779).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29115: [SPARK-32315][ML] Provide an explanation error message when calling require

2020-07-14 Thread GitBox



AmplabJenkins commented on pull request #29115:
URL: https://github.com/apache/spark/pull/29115#issuecomment-658519909







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue

2020-07-14 Thread GitBox



AmplabJenkins commented on pull request #28904:
URL: https://github.com/apache/spark/pull/28904#issuecomment-658519932







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-07-14 Thread GitBox



SparkQA commented on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-658519508


   **[Test build #125876 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125876/testReport)**
 for PR 27694 at commit 
[`86131af`](https://github.com/apache/spark/commit/86131afcf995fee64a629a7a440f03df8cabdd48).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue

2020-07-14 Thread GitBox



SparkQA commented on pull request #28904:
URL: https://github.com/apache/spark/pull/28904#issuecomment-658519488


   **[Test build #125875 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125875/testReport)**
 for PR 28904 at commit 
[`006c028`](https://github.com/apache/spark/commit/006c028cf917bcfa4e78955a280e815418cbc2be).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox



SparkQA commented on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658519469


   **[Test build #125874 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125874/testReport)**
 for PR 29080 at commit 
[`6dd0a4d`](https://github.com/apache/spark/commit/6dd0a4d9a2157086ef33bd810f9e250114b33c7d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29115: [SPARK-32315][ML] Provide an explanation error message when calling require

2020-07-14 Thread GitBox



SparkQA commented on pull request #29115:
URL: https://github.com/apache/spark/pull/29115#issuecomment-658519446


   **[Test build #125873 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125873/testReport)**
 for PR 29115 at commit 
[`96d65f4`](https://github.com/apache/spark/commit/96d65f4890e312bc4446b008bf6fadcb2d011779).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox



AmplabJenkins removed a comment on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-657302373


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 919 matches

Mail list logo