date:20210906

[GitHub] [spark] SparkQA commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite

2021-09-06 Thread GitBox



SparkQA commented on pull request #33912:
URL: https://github.com/apache/spark/pull/33912#issuecomment-914010830


   **[Test build #143036 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143036/testReport)**
 for PR 33912 at commit 
[`b6f20cf`](https://github.com/apache/spark/commit/b6f20cf3380a3295efbcc53e1394a96ccce9f013).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33844: [SPARK-36506][PYTHON] Improve test coverage for series.py and indexes/*.py.

2021-09-06 Thread GitBox



HyukjinKwon commented on a change in pull request #33844:
URL: https://github.com/apache/spark/pull/33844#discussion_r703196115



##
File path: python/pyspark/pandas/indexes/base.py
##
@@ -2601,6 +2599,12 @@ def __iter__(self) -> Iterator:
 return MissingPandasLikeIndex.__iter__(self)
 
 def __xor__(self, other: "Index") -> "Index":
+warnings.warn(
+"Index.__xor__ operating as a set operation is deprecated, "
+"in the future this will be a logical operation matching 
Series.__xor__.  "

Review comment:
   @itholic is there any update on this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing

2021-09-06 Thread GitBox



AmplabJenkins commented on pull request #33922:
URL: https://github.com/apache/spark/pull/33922#issuecomment-914009115


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143031/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing

2021-09-06 Thread GitBox



SparkQA commented on pull request #33922:
URL: https://github.com/apache/spark/pull/33922#issuecomment-914008361


   **[Test build #143031 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143031/testReport)**
 for PR 33922 at commit 
[`181c5d1`](https://github.com/apache/spark/commit/181c5d19d819debef1ebe50a078acbb4bfe512a8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite

2021-09-06 Thread GitBox



AmplabJenkins removed a comment on pull request #33912:
URL: https://github.com/apache/spark/pull/33912#issuecomment-91488






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

2021-09-06 Thread GitBox



AmplabJenkins removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-91487


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47540/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

2021-09-06 Thread GitBox



AmplabJenkins commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-91487


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47540/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite

2021-09-06 Thread GitBox



AmplabJenkins commented on pull request #33912:
URL: https://github.com/apache/spark/pull/33912#issuecomment-91488






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite

2021-09-06 Thread GitBox



SparkQA removed a comment on pull request #33912:
URL: https://github.com/apache/spark/pull/33912#issuecomment-913944602


   **[Test build #143033 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143033/testReport)**
 for PR 33912 at commit 
[`0029f33`](https://github.com/apache/spark/commit/0029f332b4b91efbaed1fee0a7fb9dcac7c183cf).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

2021-09-06 Thread GitBox



SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913997050


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47541/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dbtsai commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite

2021-09-06 Thread GitBox



dbtsai commented on pull request #33912:
URL: https://github.com/apache/spark/pull/33912#issuecomment-913995000


   Could we add a test for hadoop seq files using `sc.sequenceFile(...)`? There 
are still many legacy applications using hadoop seq files, and we want to 
ensure it works.
   
   We might want to exclude the relocation of snappy in Hadoop as well. The 
relocation only relocates the java classes, and the native jni interfaces will 
not be relocated. So let's say we include two different version of 
`snappy-java`, and one is the relocated one provided by Hadoop and the other 
one is the one provided by Spark. If `snappy-java` decides to change the native 
C interfaces, since those native methods can not be relocated, it will cause 
the incompatibility issue in loading the native methods. If both of them are 
non-relocated, and the dependency resolution will ensure we only include one 
version of snappy-java to avoid the potential incompatibility issue from native 
interface which can not technically be relocated.
   
   I remember @dongjoon-hyun saw this issue when he worked on the `zstd-jni` in 
Spark and Iceberg.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #33872: [SPARK-36575][CORE] Should ignore task finished event if its task set is gone in TaskSchedulerImpl.handleSuccessfulTask

2021-09-06 Thread GitBox



Ngone51 commented on a change in pull request #33872:
URL: https://github.com/apache/spark/pull/33872#discussion_r703178285



##
File path: 
core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
##
@@ -1995,6 +2000,61 @@ class TaskSchedulerImplSuite extends SparkFunSuite with 
LocalSparkContext with B
 assert(!normalTSM.runningTasksSet.contains(taskId))
   }
 
+  test("SPARK-36575: Executor lost cause task hang") {
+val taskScheduler = setupScheduler()
+
+val resultGetter = new TaskResultGetter(sc.env, taskScheduler) {
+  override protected val getTaskResultExecutor: ExecutorService =
+ThreadUtils.newDaemonFixedThreadPool(1, "task-result-getter")
+  def taskResultExecutor() : ExecutorService = getTaskResultExecutor
+}
+taskScheduler.taskResultGetter = resultGetter
+
+val workerOffers = IndexedSeq(new WorkerOffer("executor0", "host0", 1),
+  new WorkerOffer("executor1", "host1", 1))
+val task1 = new ShuffleMapTask(1, 0, null, new Partition {
+  override def index: Int = 0
+}, Seq(TaskLocation("host0", "executor0")), new Properties, null)
+
+val task2 = new ShuffleMapTask(1, 0, null, new Partition {
+  override def index: Int = 0
+}, Seq(TaskLocation("host1", "executor1")), new Properties, null)
+
+val taskSet = new TaskSet(Array(task1, task2), 0, 0, 0, null, 0)
+
+taskScheduler.submitTasks(taskSet)
+val taskDescriptions = taskScheduler.resourceOffers(workerOffers).flatten
+assert(2 === taskDescriptions.length)
+
+val ser = sc.env.serializer.newInstance()
+val directResult = new DirectTaskResult[Int](ser.serialize(1), Seq(), 
Array.empty)
+val resultBytes = ser.serialize(directResult)
+
+// make getTaskResultExecutor busy
+import scala.language.reflectiveCalls
+resultGetter.taskResultExecutor().submit( new Runnable {
+  override def run(): Unit = Thread.sleep(100)
+})
+
+// task1 finished
+taskScheduler.statusUpdate(
+  tid = taskDescriptions(0).taskId,
+  state = TaskState.FINISHED,
+  serializedData = resultBytes
+)
+
+// mark executor heartbeat timed out
+taskScheduler.executorLost(taskDescriptions(0).executorId, 
ExecutorProcessLost("Executor " +
+  "heartbeat timed out"))
+
+// Wait a while until all events are processed
+Thread.sleep(100)

Review comment:
   Wouldn't the second `executorLost` reset `successful(index)` to false?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

2021-09-06 Thread GitBox



SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913990550


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47540/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite

2021-09-06 Thread GitBox



SparkQA commented on pull request #33912:
URL: https://github.com/apache/spark/pull/33912#issuecomment-913989846


   **[Test build #143033 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143033/testReport)**
 for PR 33912 at commit 
[`0029f33`](https://github.com/apache/spark/commit/0029f332b4b91efbaed1fee0a7fb9dcac7c183cf).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class OrcCodecSuite extends FileSourceCodecSuite with 
SharedSparkSession `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite

2021-09-06 Thread GitBox



SparkQA commented on pull request #33912:
URL: https://github.com/apache/spark/pull/33912#issuecomment-913989299


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47539/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

2021-09-06 Thread GitBox



HyukjinKwon commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913986456


   I cleaned up a bit more.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

2021-09-06 Thread GitBox



HyukjinKwon removed a comment on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913964232


   I cleaned up a bit more.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

2021-09-06 Thread GitBox



SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913983644


   **[Test build #143038 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143038/testReport)**
 for PR 33877 at commit 
[`f83511b`](https://github.com/apache/spark/commit/f83511b4729f5d4906205f400cba57f5ab0dcd3b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing

2021-09-06 Thread GitBox



AmplabJenkins commented on pull request #33922:
URL: https://github.com/apache/spark/pull/33922#issuecomment-913980993


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite

2021-09-06 Thread GitBox



AmplabJenkins removed a comment on pull request #33912:
URL: https://github.com/apache/spark/pull/33912#issuecomment-913980223


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47535/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property

2021-09-06 Thread GitBox



AmplabJenkins removed a comment on pull request #33923:
URL: https://github.com/apache/spark/pull/33923#issuecomment-913980222


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47537/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property

2021-09-06 Thread GitBox



AmplabJenkins commented on pull request #33923:
URL: https://github.com/apache/spark/pull/33923#issuecomment-913980222


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47537/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite

2021-09-06 Thread GitBox



AmplabJenkins commented on pull request #33912:
URL: https://github.com/apache/spark/pull/33912#issuecomment-913980223


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47535/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

2021-09-06 Thread GitBox



SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913977412


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47540/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite

2021-09-06 Thread GitBox



SparkQA commented on pull request #33912:
URL: https://github.com/apache/spark/pull/33912#issuecomment-913976289


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47539/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property

2021-09-06 Thread GitBox



SparkQA commented on pull request #33923:
URL: https://github.com/apache/spark/pull/33923#issuecomment-913975805


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47537/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite

2021-09-06 Thread GitBox



SparkQA commented on pull request #33912:
URL: https://github.com/apache/spark/pull/33912#issuecomment-913973852


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47535/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sleep1661 commented on a change in pull request #33872: [SPARK-36575][CORE] Should ignore task finished event if its task set is gone in TaskSchedulerImpl.handleSuccessfulTask

2021-09-06 Thread GitBox



sleep1661 commented on a change in pull request #33872:
URL: https://github.com/apache/spark/pull/33872#discussion_r703156809



##
File path: 
core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
##
@@ -1995,6 +2000,61 @@ class TaskSchedulerImplSuite extends SparkFunSuite with 
LocalSparkContext with B
 assert(!normalTSM.runningTasksSet.contains(taskId))
   }
 
+  test("SPARK-36575: Executor lost cause task hang") {
+val taskScheduler = setupScheduler()
+
+val resultGetter = new TaskResultGetter(sc.env, taskScheduler) {
+  override protected val getTaskResultExecutor: ExecutorService =
+ThreadUtils.newDaemonFixedThreadPool(1, "task-result-getter")
+  def taskResultExecutor() : ExecutorService = getTaskResultExecutor
+}
+taskScheduler.taskResultGetter = resultGetter
+
+val workerOffers = IndexedSeq(new WorkerOffer("executor0", "host0", 1),
+  new WorkerOffer("executor1", "host1", 1))
+val task1 = new ShuffleMapTask(1, 0, null, new Partition {
+  override def index: Int = 0
+}, Seq(TaskLocation("host0", "executor0")), new Properties, null)
+
+val task2 = new ShuffleMapTask(1, 0, null, new Partition {
+  override def index: Int = 0
+}, Seq(TaskLocation("host1", "executor1")), new Properties, null)
+
+val taskSet = new TaskSet(Array(task1, task2), 0, 0, 0, null, 0)
+
+taskScheduler.submitTasks(taskSet)
+val taskDescriptions = taskScheduler.resourceOffers(workerOffers).flatten
+assert(2 === taskDescriptions.length)
+
+val ser = sc.env.serializer.newInstance()
+val directResult = new DirectTaskResult[Int](ser.serialize(1), Seq(), 
Array.empty)
+val resultBytes = ser.serialize(directResult)
+
+// make getTaskResultExecutor busy
+import scala.language.reflectiveCalls
+resultGetter.taskResultExecutor().submit( new Runnable {
+  override def run(): Unit = Thread.sleep(100)
+})
+
+// task1 finished
+taskScheduler.statusUpdate(
+  tid = taskDescriptions(0).taskId,
+  state = TaskState.FINISHED,
+  serializedData = resultBytes
+)
+
+// mark executor heartbeat timed out
+taskScheduler.executorLost(taskDescriptions(0).executorId, 
ExecutorProcessLost("Executor " +
+  "heartbeat timed out"))
+
+// Wait a while until all events are processed
+Thread.sleep(100)

Review comment:
   > 
   >  The event, IIUC, is a successful event, which will do `tasksSuccessful 
+=1` first, right?
   
   Yes. But the second `executorLost` made `tasksSuccessful -= 1`. With 
`successful(index)` was true now,  the task would never been scheduled again,  
which resulted in "tasksSuccessful always less than numTask"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

2021-09-06 Thread GitBox



HyukjinKwon commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913964232


   I cleaned up a bit more.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark

2021-09-06 Thread GitBox



SparkQA commented on pull request #33877:
URL: https://github.com/apache/spark/pull/33877#issuecomment-913964045


   **[Test build #143037 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143037/testReport)**
 for PR 33877 at commit 
[`6620ade`](https://github.com/apache/spark/commit/6620adeb54fe95870c6de1604dfc2cd32028bed2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite

2021-09-06 Thread GitBox



SparkQA commented on pull request #33912:
URL: https://github.com/apache/spark/pull/33912#issuecomment-913962814


   **[Test build #143036 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143036/testReport)**
 for PR 33912 at commit 
[`b6f20cf`](https://github.com/apache/spark/commit/b6f20cf3380a3295efbcc53e1394a96ccce9f013).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

2021-09-06 Thread GitBox



AmplabJenkins removed a comment on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-913962126


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47536/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

2021-09-06 Thread GitBox



AmplabJenkins commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-913962126


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47536/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

2021-09-06 Thread GitBox



SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-913960171


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47536/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property

2021-09-06 Thread GitBox



SparkQA commented on pull request #33923:
URL: https://github.com/apache/spark/pull/33923#issuecomment-913959121


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47537/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

2021-09-06 Thread GitBox



SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-913956801


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47536/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite

2021-09-06 Thread GitBox



SparkQA commented on pull request #33912:
URL: https://github.com/apache/spark/pull/33912#issuecomment-913956607


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47535/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #33872: [SPARK-36575][CORE] Should ignore task finished event if its task set is gone in TaskSchedulerImpl.handleSuccessfulTask

2021-09-06 Thread GitBox



Ngone51 commented on a change in pull request #33872:
URL: https://github.com/apache/spark/pull/33872#discussion_r703140913



##
File path: 
core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
##
@@ -1995,6 +2000,61 @@ class TaskSchedulerImplSuite extends SparkFunSuite with 
LocalSparkContext with B
 assert(!normalTSM.runningTasksSet.contains(taskId))
   }
 
+  test("SPARK-36575: Executor lost cause task hang") {
+val taskScheduler = setupScheduler()
+
+val resultGetter = new TaskResultGetter(sc.env, taskScheduler) {
+  override protected val getTaskResultExecutor: ExecutorService =
+ThreadUtils.newDaemonFixedThreadPool(1, "task-result-getter")
+  def taskResultExecutor() : ExecutorService = getTaskResultExecutor
+}
+taskScheduler.taskResultGetter = resultGetter
+
+val workerOffers = IndexedSeq(new WorkerOffer("executor0", "host0", 1),
+  new WorkerOffer("executor1", "host1", 1))
+val task1 = new ShuffleMapTask(1, 0, null, new Partition {
+  override def index: Int = 0
+}, Seq(TaskLocation("host0", "executor0")), new Properties, null)
+
+val task2 = new ShuffleMapTask(1, 0, null, new Partition {
+  override def index: Int = 0
+}, Seq(TaskLocation("host1", "executor1")), new Properties, null)
+
+val taskSet = new TaskSet(Array(task1, task2), 0, 0, 0, null, 0)
+
+taskScheduler.submitTasks(taskSet)
+val taskDescriptions = taskScheduler.resourceOffers(workerOffers).flatten
+assert(2 === taskDescriptions.length)
+
+val ser = sc.env.serializer.newInstance()
+val directResult = new DirectTaskResult[Int](ser.serialize(1), Seq(), 
Array.empty)
+val resultBytes = ser.serialize(directResult)
+
+// make getTaskResultExecutor busy
+import scala.language.reflectiveCalls
+resultGetter.taskResultExecutor().submit( new Runnable {
+  override def run(): Unit = Thread.sleep(100)
+})
+
+// task1 finished
+taskScheduler.statusUpdate(
+  tid = taskDescriptions(0).taskId,
+  state = TaskState.FINISHED,
+  serializedData = resultBytes
+)
+
+// mark executor heartbeat timed out
+taskScheduler.executorLost(taskDescriptions(0).executorId, 
ExecutorProcessLost("Executor " +
+  "heartbeat timed out"))
+
+// Wait a while until all events are processed
+Thread.sleep(100)

Review comment:
   > Finally, it was found that TaskSetManager.executorLost was executed 
twice, and the second time resulted in tasksSuccessful -= 1, resulting in 
tasksSuccessful always less than numTask.
   
   Then, how could it result in "tasksSuccessful always less than numTask" if 
there's a task finish event? The event, IIUC, is a successful event, which will 
do `tasksSuccessful +=1` first, right?
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite

2021-09-06 Thread GitBox



viirya commented on a change in pull request #33912:
URL: https://github.com/apache/spark/pull/33912#discussion_r703140216



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceCodecSuite.scala
##
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.{SharedSparkSession, SQLTestUtils}
+
+trait FileSourceCodecSuite extends QueryTest with SQLTestUtils {
+
+  protected def dataSourceName: String
+  protected val codecConfigName: String
+  protected def availableCodecs: Seq[String]
+
+  def testWithAllCodecs(name: String)(f: => Unit): Unit = {
+for (codec <- availableCodecs) {
+  test(s"$name - data source $dataSourceName - codec: $codec") {
+withSQLConf(codecConfigName -> codec) {
+  f
+}
+  }
+}
+  }
+
+  testWithAllCodecs("write and read") {
+withTempPath { dir =>
+  testData
+.repartition(5)
+.write
+.format(dataSourceName)
+.save(dir.getCanonicalPath)
+
+  val df = spark.read.format(dataSourceName).load(dir.getCanonicalPath)
+  checkAnswer(df, testData)
+}
+  }
+}
+
+class ParquetCodecSuite extends FileSourceCodecSuite with SharedSparkSession {
+
+  override def dataSourceName: String = "parquet"
+  override val codecConfigName = SQLConf.PARQUET_COMPRESSION.key
+  // Exclude "lzo" because it is GPL-licenced so not included in Hadoop.
+  override protected def availableCodecs: Seq[String] = Seq("none", 
"uncompressed", "snappy",

Review comment:
   Excluded "brotli" codec for non-supported arch.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33160: [SPARK-35959][BUILD][test-maven][test-hadoop3.2][test-java11] Add a new Maven profile "no-shaded-hadoop-client" for Hadoop versions older tha

2021-09-06 Thread GitBox



SparkQA commented on pull request #33160:
URL: https://github.com/apache/spark/pull/33160#issuecomment-913950541


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47538/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33160: [SPARK-35959][BUILD][test-maven][test-hadoop3.2][test-java11] Add a new Maven profile "no-shaded-hadoop-client" for Hadoop versions old

2021-09-06 Thread GitBox



AmplabJenkins commented on pull request #33160:
URL: https://github.com/apache/spark/pull/33160#issuecomment-913950554


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47538/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property

2021-09-06 Thread GitBox



AmplabJenkins removed a comment on pull request #33923:
URL: https://github.com/apache/spark/pull/33923#issuecomment-913949321


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143035/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property

2021-09-06 Thread GitBox



AmplabJenkins commented on pull request #33923:
URL: https://github.com/apache/spark/pull/33923#issuecomment-913949321


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143035/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property

2021-09-06 Thread GitBox



SparkQA removed a comment on pull request #33923:
URL: https://github.com/apache/spark/pull/33923#issuecomment-913945885


   **[Test build #143035 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143035/testReport)**
 for PR 33923 at commit 
[`4d1f7ad`](https://github.com/apache/spark/commit/4d1f7adc718fef8d2564201ed90f45d836e248d2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property

2021-09-06 Thread GitBox



SparkQA commented on pull request #33923:
URL: https://github.com/apache/spark/pull/33923#issuecomment-913949230


   **[Test build #143035 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143035/testReport)**
 for PR 33923 at commit 
[`4d1f7ad`](https://github.com/apache/spark/commit/4d1f7adc718fef8d2564201ed90f45d836e248d2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property

2021-09-06 Thread GitBox



HyukjinKwon closed pull request #33923:
URL: https://github.com/apache/spark/pull/33923


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property

2021-09-06 Thread GitBox



HyukjinKwon commented on pull request #33923:
URL: https://github.com/apache/spark/pull/33923#issuecomment-913947801


   i manually built the docs and checked this.
   
   Merged to master and branch-3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on a change in pull request #33744: [SPARK-36403][PYTHON] Implement `Index.putmask`

2021-09-06 Thread GitBox



itholic commented on a change in pull request #33744:
URL: https://github.com/apache/spark/pull/33744#discussion_r703134281



##
File path: python/pyspark/pandas/typedef/typehints.py
##
@@ -323,7 +323,7 @@ def infer_pd_series_spark_type(pser: pd.Series, dtype: 
Dtype) -> types.DataType:
 if dtype == np.dtype("object"):
 if len(pser) == 0 or pser.isnull().all():
 return types.NullType()
-elif hasattr(pser.iloc[0], "__UDT__"):
+elif hasattr(pser, "iloc") and hasattr(pser.iloc[0], "__UDT__"):

Review comment:
   I think we should not pass the `Index` to the 
`infer_pd_series_spark_type`.
   
   At least we should change the function name (such as 
`infer_pd_indexops_spark_type`) and input & output type of the function, or add 
a new function and use it as you mentioned.
   
   BTW, actually I think we don't really need to use `pandas_udf` here, though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on a change in pull request #33744: [SPARK-36403][PYTHON] Implement `Index.putmask`

2021-09-06 Thread GitBox



itholic commented on a change in pull request #33744:
URL: https://github.com/apache/spark/pull/33744#discussion_r703134281



##
File path: python/pyspark/pandas/typedef/typehints.py
##
@@ -323,7 +323,7 @@ def infer_pd_series_spark_type(pser: pd.Series, dtype: 
Dtype) -> types.DataType:
 if dtype == np.dtype("object"):
 if len(pser) == 0 or pser.isnull().all():
 return types.NullType()
-elif hasattr(pser.iloc[0], "__UDT__"):
+elif hasattr(pser, "iloc") and hasattr(pser.iloc[0], "__UDT__"):

Review comment:
   I think we should not pass the `Index` to the 
`infer_pd_series_spark_type`.
   
   At least we should change the function name (such as 
`infer_pd_indexops_spark_type`) and type of input `pser` and mention this fix 
in PR description, or add a new function and use it, as you mentioned.
   
   BTW, actually I think we don't really need to use `pandas_udf` here, though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on a change in pull request #33744: [SPARK-36403][PYTHON] Implement `Index.putmask`

2021-09-06 Thread GitBox



itholic commented on a change in pull request #33744:
URL: https://github.com/apache/spark/pull/33744#discussion_r703134281



##
File path: python/pyspark/pandas/typedef/typehints.py
##
@@ -323,7 +323,7 @@ def infer_pd_series_spark_type(pser: pd.Series, dtype: 
Dtype) -> types.DataType:
 if dtype == np.dtype("object"):
 if len(pser) == 0 or pser.isnull().all():
 return types.NullType()
-elif hasattr(pser.iloc[0], "__UDT__"):
+elif hasattr(pser, "iloc") and hasattr(pser.iloc[0], "__UDT__"):

Review comment:
   I think we should not pass the `Index` to the 
`infer_pd_series_spark_type`.
   
   At least we should change the function name and type of input `pser` and 
mention this fix in PR description, or add a new function and use it, as you 
mentioned.
   
   BTW, actually I think we don't really need to use `pandas_udf` here, though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on a change in pull request #33744: [SPARK-36403][PYTHON] Implement `Index.putmask`

2021-09-06 Thread GitBox



itholic commented on a change in pull request #33744:
URL: https://github.com/apache/spark/pull/33744#discussion_r703134281



##
File path: python/pyspark/pandas/typedef/typehints.py
##
@@ -323,7 +323,7 @@ def infer_pd_series_spark_type(pser: pd.Series, dtype: 
Dtype) -> types.DataType:
 if dtype == np.dtype("object"):
 if len(pser) == 0 or pser.isnull().all():
 return types.NullType()
-elif hasattr(pser.iloc[0], "__UDT__"):
+elif hasattr(pser, "iloc") and hasattr(pser.iloc[0], "__UDT__"):

Review comment:
   I think we should not pass the `Index` to the 
`infer_pd_series_spark_type`.
   
   At least we should change the function name and type of input `pser`, and 
mention this fix in PR description, or add a new function and use it, as you 
mentioned.
   
   BTW, actually I think we don't really need to use `pandas_udf` here, though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property

2021-09-06 Thread GitBox



SparkQA commented on pull request #33923:
URL: https://github.com/apache/spark/pull/33923#issuecomment-913945885


   **[Test build #143035 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143035/testReport)**
 for PR 33923 at commit 
[`4d1f7ad`](https://github.com/apache/spark/commit/4d1f7adc718fef8d2564201ed90f45d836e248d2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sarutak commented on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property

2021-09-06 Thread GitBox



sarutak commented on pull request #33923:
URL: https://github.com/apache/spark/pull/33923#issuecomment-913945882


   cc: @HyukjinKwon @srowen @AngersZh who are involved in that PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on a change in pull request #33744: [SPARK-36403][PYTHON] Implement `Index.putmask`

2021-09-06 Thread GitBox



itholic commented on a change in pull request #33744:
URL: https://github.com/apache/spark/pull/33744#discussion_r703134281



##
File path: python/pyspark/pandas/typedef/typehints.py
##
@@ -323,7 +323,7 @@ def infer_pd_series_spark_type(pser: pd.Series, dtype: 
Dtype) -> types.DataType:
 if dtype == np.dtype("object"):
 if len(pser) == 0 or pser.isnull().all():
 return types.NullType()
-elif hasattr(pser.iloc[0], "__UDT__"):
+elif hasattr(pser, "iloc") and hasattr(pser.iloc[0], "__UDT__"):

Review comment:
   Got it!
   
   Then I think we should create the ticket and handle this separately.
   (Or, at least we should change the function name and type of input `pser`, 
and mention this fix in PR description with before & after examples)
   
   btw, actually I think we don't need to use `pandas_udf` here, though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on a change in pull request #33744: [SPARK-36403][PYTHON] Implement `Index.putmask`

2021-09-06 Thread GitBox



itholic commented on a change in pull request #33744:
URL: https://github.com/apache/spark/pull/33744#discussion_r703134281



##
File path: python/pyspark/pandas/typedef/typehints.py
##
@@ -323,7 +323,7 @@ def infer_pd_series_spark_type(pser: pd.Series, dtype: 
Dtype) -> types.DataType:
 if dtype == np.dtype("object"):
 if len(pser) == 0 or pser.isnull().all():
 return types.NullType()
-elif hasattr(pser.iloc[0], "__UDT__"):
+elif hasattr(pser, "iloc") and hasattr(pser.iloc[0], "__UDT__"):

Review comment:
   Got it!
   
   Then I think we should create the ticket and handle this separately.
   (Or, at least we should change the function name and type of input `pser`, 
and mention this fix in PR description with before after examples)
   
   btw, actually I think we don't need to use `pandas_udf` here, though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on a change in pull request #33744: [SPARK-36403][PYTHON] Implement `Index.putmask`

2021-09-06 Thread GitBox



itholic commented on a change in pull request #33744:
URL: https://github.com/apache/spark/pull/33744#discussion_r703134281



##
File path: python/pyspark/pandas/typedef/typehints.py
##
@@ -323,7 +323,7 @@ def infer_pd_series_spark_type(pser: pd.Series, dtype: 
Dtype) -> types.DataType:
 if dtype == np.dtype("object"):
 if len(pser) == 0 or pser.isnull().all():
 return types.NullType()
-elif hasattr(pser.iloc[0], "__UDT__"):
+elif hasattr(pser, "iloc") and hasattr(pser.iloc[0], "__UDT__"):

Review comment:
   Got it!
   
   Then I think we should create the ticket and handle this separately.
   (Or, at least we should change the function name and type of `pser`, and 
mention this fix in PR description with before after examples)
   
   btw, actually I think we don't need to use `pandas_udf` here, though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on a change in pull request #33744: [SPARK-36403][PYTHON] Implement `Index.putmask`

2021-09-06 Thread GitBox



itholic commented on a change in pull request #33744:
URL: https://github.com/apache/spark/pull/33744#discussion_r703134281



##
File path: python/pyspark/pandas/typedef/typehints.py
##
@@ -323,7 +323,7 @@ def infer_pd_series_spark_type(pser: pd.Series, dtype: 
Dtype) -> types.DataType:
 if dtype == np.dtype("object"):
 if len(pser) == 0 or pser.isnull().all():
 return types.NullType()
-elif hasattr(pser.iloc[0], "__UDT__"):
+elif hasattr(pser, "iloc") and hasattr(pser.iloc[0], "__UDT__"):

Review comment:
   Yeah, I think we should create the ticket and handle this separately.
   (Or, at least we should change the function name and type of `pser`, and 
mention this fix in PR description with before after examples)
   
   btw, actually I think we don't need to use `pandas_udf` here, though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sarutak opened a new pull request #33923: [SPARK-36153][SQL][DOCS] Fix the description about the possible values of `spark.sql.catalogImplementation` property

2021-09-06 Thread GitBox



sarutak opened a new pull request #33923:
URL: https://github.com/apache/spark/pull/33923


   ### What changes were proposed in this pull request?
   
   This PR fixes the description about the possible values of 
`spark.sql.catalogImplementation` property.
   It was added in SPARK-36153 (#33362) but the possible values are `hive` or 
`in-memory` rather than `true` or `false`.
   
   ### Why are the changes needed?
   
   To fix wrong description.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   I just confirmed `in-memory` and `hive` are the valid values with SparkShell.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin

2021-09-06 Thread GitBox



SparkQA commented on pull request #33893:
URL: https://github.com/apache/spark/pull/33893#issuecomment-913944664


   **[Test build #143034 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143034/testReport)**
 for PR 33893 at commit 
[`545bf9b`](https://github.com/apache/spark/commit/545bf9bc0d2463869fdc46833366cb53c6a9e2fa).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite

2021-09-06 Thread GitBox



SparkQA commented on pull request #33912:
URL: https://github.com/apache/spark/pull/33912#issuecomment-913944602


   **[Test build #143033 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143033/testReport)**
 for PR 33912 at commit 
[`0029f33`](https://github.com/apache/spark/commit/0029f332b4b91efbaed1fee0a7fb9dcac7c183cf).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33921: [SPARK-36677][SQL] NestedColumnAliasing should not push down aggregate functions into projections

2021-09-06 Thread GitBox



AmplabJenkins removed a comment on pull request #33921:
URL: https://github.com/apache/spark/pull/33921#issuecomment-913944244


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47534/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33803: [SPARK-36556][SQL] Add DSV2 filters

2021-09-06 Thread GitBox



AmplabJenkins removed a comment on pull request #33803:
URL: https://github.com/apache/spark/pull/33803#issuecomment-913944140


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143029/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33921: [SPARK-36677][SQL] NestedColumnAliasing should not push down aggregate functions into projections

2021-09-06 Thread GitBox



AmplabJenkins commented on pull request #33921:
URL: https://github.com/apache/spark/pull/33921#issuecomment-913944244


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47534/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33921: [SPARK-36677][SQL] NestedColumnAliasing should not push down aggregate functions into projections

2021-09-06 Thread GitBox



SparkQA commented on pull request #33921:
URL: https://github.com/apache/spark/pull/33921#issuecomment-913944229


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47534/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33803: [SPARK-36556][SQL] Add DSV2 filters

2021-09-06 Thread GitBox



AmplabJenkins commented on pull request #33803:
URL: https://github.com/apache/spark/pull/33803#issuecomment-913944140


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143029/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #33803: [SPARK-36556][SQL] Add DSV2 filters

2021-09-06 Thread GitBox



SparkQA removed a comment on pull request #33803:
URL: https://github.com/apache/spark/pull/33803#issuecomment-913863758


   **[Test build #143029 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143029/testReport)**
 for PR 33803 at commit 
[`ed0b009`](https://github.com/apache/spark/commit/ed0b009e973789517a9cce7db581617cf577a38d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing

2021-09-06 Thread GitBox



AmplabJenkins removed a comment on pull request #33922:
URL: https://github.com/apache/spark/pull/33922#issuecomment-913943448


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47533/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33803: [SPARK-36556][SQL] Add DSV2 filters

2021-09-06 Thread GitBox



SparkQA commented on pull request #33803:
URL: https://github.com/apache/spark/pull/33803#issuecomment-913943512


   **[Test build #143029 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143029/testReport)**
 for PR 33803 at commit 
[`ed0b009`](https://github.com/apache/spark/commit/ed0b009e973789517a9cce7db581617cf577a38d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing

2021-09-06 Thread GitBox



AmplabJenkins commented on pull request #33922:
URL: https://github.com/apache/spark/pull/33922#issuecomment-913943448


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47533/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33921: [SPARK-36677][SQL] NestedColumnAliasing should not push down aggregate functions into projections

2021-09-06 Thread GitBox



SparkQA commented on pull request #33921:
URL: https://github.com/apache/spark/pull/33921#issuecomment-913941593


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47534/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing

2021-09-06 Thread GitBox



SparkQA commented on pull request #33922:
URL: https://github.com/apache/spark/pull/33922#issuecomment-913941564


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47533/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite

2021-09-06 Thread GitBox



viirya commented on a change in pull request #33912:
URL: https://github.com/apache/spark/pull/33912#discussion_r703128884



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceCodecSuite.scala
##
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.{SharedSparkSession, SQLTestUtils}
+
+trait FileSourceCodecSuite extends QueryTest with SQLTestUtils {
+
+  protected def dataSourceName: String
+  protected val codecConfigName: String
+  protected def availableCodecs: Seq[String]
+
+  def testWithAllCodecs(name: String)(f: => Unit): Unit = {
+for (codec <- availableCodecs) {
+  test(s"$name - data source $dataSourceName - codec: $codec") {
+withSQLConf(codecConfigName -> codec) {
+  f
+}
+  }
+}
+  }
+
+  testWithAllCodecs("write and read") {
+withTempPath { dir =>
+  testData
+.repartition(5)
+.write
+.format(dataSourceName)
+.save(dir.getCanonicalPath)
+
+  val df = spark.read.format(dataSourceName).load(dir.getCanonicalPath)
+  checkAnswer(df, testData)
+}
+  }
+}
+
+class ParquetCodecSuite extends FileSourceCodecSuite with SharedSparkSession {
+
+  override def dataSourceName: String = "parquet"
+  override val codecConfigName = SQLConf.PARQUET_COMPRESSION.key
+  // Exclude "lzo" because it is GPL-licenced so not included in Hadoop.
+  override protected def availableCodecs: Seq[String] = Seq("none", 
"uncompressed", "snappy",

Review comment:
   Hm, should we skip brotli-codec test for ARM64?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing

2021-09-06 Thread GitBox



SparkQA commented on pull request #33922:
URL: https://github.com/apache/spark/pull/33922#issuecomment-913938612


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47533/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite

2021-09-06 Thread GitBox



viirya commented on a change in pull request #33912:
URL: https://github.com/apache/spark/pull/33912#discussion_r703126645



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceCodecSuite.scala
##
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.{SharedSparkSession, SQLTestUtils}
+
+trait FileSourceCodecSuite extends QueryTest with SQLTestUtils {
+
+  protected def dataSourceName: String
+  protected val codecConfigName: String
+  protected def availableCodecs: Seq[String]
+
+  def testWithAllCodecs(name: String)(f: => Unit): Unit = {
+for (codec <- availableCodecs) {
+  test(s"$name - data source $dataSourceName - codec: $codec") {
+withSQLConf(codecConfigName -> codec) {
+  f
+}
+  }
+}
+  }
+
+  testWithAllCodecs("write and read") {
+withTempPath { dir =>
+  testData
+.repartition(5)
+.write
+.format(dataSourceName)
+.save(dir.getCanonicalPath)
+
+  val df = spark.read.format(dataSourceName).load(dir.getCanonicalPath)
+  checkAnswer(df, testData)
+}
+  }
+}
+
+class ParquetCodecSuite extends FileSourceCodecSuite with SharedSparkSession {
+
+  override def dataSourceName: String = "parquet"
+  override val codecConfigName = SQLConf.PARQUET_COMPRESSION.key
+  // Exclude "lzo" because it is GPL-licenced so not included in Hadoop.
+  override protected def availableCodecs: Seq[String] = Seq("none", 
"uncompressed", "snappy",

Review comment:
   For gzip, Parquet will use GzipCompressOutputStream. Hadoop doesn't have 
GZIP compressor yet, but Parquet still can write gzip compressed output.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sunchao commented on a change in pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite

2021-09-06 Thread GitBox



sunchao commented on a change in pull request #33912:
URL: https://github.com/apache/spark/pull/33912#discussion_r703125181



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceCodecSuite.scala
##
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.{SharedSparkSession, SQLTestUtils}
+
+trait FileSourceCodecSuite extends QueryTest with SQLTestUtils {
+
+  protected def format: String
+  protected val codecConfigName: String
+  protected def availableCodecs: Seq[String]
+
+  def testWithAllCodecs(name: String)(f: => Unit): Unit = {
+for (codec <- availableCodecs) {
+  test(s"$name - file source $format - codec: $codec") {
+withSQLConf(codecConfigName -> codec) {
+  f
+}
+  }
+}
+  }
+
+  testWithAllCodecs("write and read") {
+withTempPath { dir =>
+  testData
+.repartition(5)
+.write
+.format(format)
+.save(dir.getCanonicalPath)
+
+  val df = spark.read.format(format).load(dir.getCanonicalPath)
+  checkAnswer(df, testData)
+}
+  }
+}
+
+class ParquetCodecSuite extends FileSourceCodecSuite with SharedSparkSession {
+
+  override def format: String = "parquet"
+  override val codecConfigName = SQLConf.PARQUET_COMPRESSION.key
+  // Exclude "lzo" because it is GPL-licenced so not included in Hadoop.
+  override protected def availableCodecs: Seq[String] = Seq("none", 
"uncompressed", "snappy",
+"gzip", "brotli", "zstd", "lz4")
+}
+
+class OrcCodecSuite extends FileSourceCodecSuite with SharedSparkSession{
+
+  override def format: String = "orc"
+  override val codecConfigName = SQLConf.ORC_COMPRESSION.key

Review comment:
   nit: add type annotation for public member?

##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceCodecSuite.scala
##
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.{SharedSparkSession, SQLTestUtils}
+
+trait FileSourceCodecSuite extends QueryTest with SQLTestUtils {
+
+  protected def format: String
+  protected val codecConfigName: String
+  protected def availableCodecs: Seq[String]
+
+  def testWithAllCodecs(name: String)(f: => Unit): Unit = {
+for (codec <- availableCodecs) {
+  test(s"$name - file source $format - codec: $codec") {
+withSQLConf(codecConfigName -> codec) {
+  f
+}
+  }
+}
+  }
+
+  testWithAllCodecs("write and read") {
+withTempPath { dir =>
+  testData
+.repartition(5)
+.write
+.format(format)
+.save(dir.getCanonicalPath)
+
+  val df = spark.read.format(format).load(dir.getCanonicalPath)
+  checkAnswer(df, testData)
+}
+  }
+}
+
+class ParquetCodecSuite extends FileSourceCodecSuite with SharedSparkSession {
+
+  override def format: String = "parquet"
+  override val codecConfigName = SQLConf.PARQUET_COMPRESSION.key
+  // Exclude "lzo" because it is GPL-licenced so not included in Hadoop.
+  override protected def availableCodecs: Seq[String] = Seq("none", 
"uncompressed", "snappy",
+

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add end-to-end codec test cases for ORC/Parquet datasources and LZ4 hadoop wrapper

2021-09-06 Thread GitBox



dongjoon-hyun commented on a change in pull request #33912:
URL: https://github.com/apache/spark/pull/33912#discussion_r703120313



##
File path: 
core/src/main/java/org/apache/hadoop/shaded/net/jpountz/lz4/LZ4Compressor.java
##
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.shaded.net.jpountz.lz4;

Review comment:
   You are right. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #33914: [SPARK-32268][SQL] Dynamic bloom filter join pruning

2021-09-06 Thread GitBox



HyukjinKwon commented on pull request #33914:
URL: https://github.com/apache/spark/pull/33914#issuecomment-913928239


   I think we should probably at least have a design doc to explain this .. 
from a cursory look the change looks huge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #33917: [SPARK-36622][CORE] Making spark.history.kerberos.principal _HOST compliant

2021-09-06 Thread GitBox



HyukjinKwon commented on pull request #33917:
URL: https://github.com/apache/spark/pull/33917#issuecomment-913927177


   I am not sure who's the best person to review .. maybe @gaborgsomogyi and 
@bersprockets ... ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33917: [SPARK-36622][CORE] Making spark.history.kerberos.principal _HOST compliant

2021-09-06 Thread GitBox



HyukjinKwon commented on a change in pull request #33917:
URL: https://github.com/apache/spark/pull/33917#discussion_r703118302



##
File path: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
##
@@ -147,6 +148,15 @@ private[spark] class SparkHadoopUtil extends Logging {
 }
   }
 
+  /**
+   *

Review comment:
   Maybe remove this line or add some description




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33917: [SPARK-36622][CORE] Making spark.history.kerberos.principal _HOST compliant

2021-09-06 Thread GitBox



HyukjinKwon commented on a change in pull request #33917:
URL: https://github.com/apache/spark/pull/33917#discussion_r703118164



##
File path: 
core/src/test/scala/org/apache/spark/deploy/SparkHadoopUtilSuite.scala
##
@@ -80,6 +82,18 @@ class SparkHadoopUtilSuite extends SparkFunSuite {
 assertConfigValue(hadoopConf, "fs.s3a.endpoint", null)
   }
 
+  /**
+   * test for _HOST pattern replacement with Server cannonical address
+   */
+  test("server principal with _HOST pattern") {
+assert(SparkHadoopUtil.get.getServerPrincipal("spark/_h...@realm.com")
+  === 
"spark/%s...@realm.com".format(InetAddress.getLocalHost.getCanonicalHostName())
+  , s"Mismatch in expected value")
+assert(SparkHadoopUtil.get.getServerPrincipal("spark/0.0@realm.com")
+  === 
"spark/0.0@realm.com".format(InetAddress.getLocalHost.getCanonicalHostName())
+  , s"Mismatch in expected value")

Review comment:
   ```suggestion
   assert(SparkHadoopUtil.get.getServerPrincipal("spark/_h...@realm.com")
 === 
"spark/%s...@realm.com".format(InetAddress.getLocalHost.getCanonicalHostName()),
 "Mismatch in expected value")
   assert(SparkHadoopUtil.get.getServerPrincipal("spark/0.0@realm.com")
 === 
"spark/0.0@realm.com".format(InetAddress.getLocalHost.getCanonicalHostName()),
 "Mismatch in expected value")
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33917: [SPARK-36622][CORE] Making spark.history.kerberos.principal _HOST compliant

2021-09-06 Thread GitBox



HyukjinKwon commented on a change in pull request #33917:
URL: https://github.com/apache/spark/pull/33917#discussion_r703118025



##
File path: 
core/src/test/scala/org/apache/spark/deploy/SparkHadoopUtilSuite.scala
##
@@ -80,6 +82,18 @@ class SparkHadoopUtilSuite extends SparkFunSuite {
 assertConfigValue(hadoopConf, "fs.s3a.endpoint", null)
   }
 
+  /**
+   * test for _HOST pattern replacement with Server cannonical address
+   */
+  test("server principal with _HOST pattern") {

Review comment:
   Maybe let's add a JIRA prefix in the test title: 
   ```suggestion
 test("SPARK-36622: server principal with _HOST pattern") {
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sunchao commented on a change in pull request #33910: [SPARK-36666][SQL] Fix regression in AQEShuffleReadExec

2021-09-06 Thread GitBox



sunchao commented on a change in pull request #33910:
URL: https://github.com/apache/spark/pull/33910#discussion_r703117054



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEShuffleReadExec.scala
##
@@ -82,8 +82,11 @@ case class AQEShuffleReadExec private(
 // `RoundRobinPartitioning` but we don't need to retain the number of 
partitions.
 case r: RoundRobinPartitioning =>
   r.copy(numPartitions = partitionSpecs.length)
-case other => throw new IllegalStateException(
-  "Unexpected partitioning for coalesced shuffle read: " + other)
+case _ =>
+  // Spark plugins may have custom partitioning and may replace this 
operator
+  // during the postStageOptimization phase, so return 
UnknownPartitioning here
+  // rather than throw an exception
+  UnknownPartitioning(partitionSpecs.length)

Review comment:
   ya that'll do




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing

2021-09-06 Thread GitBox



AmplabJenkins removed a comment on pull request #33922:
URL: https://github.com/apache/spark/pull/33922#issuecomment-913910489


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33921: [SPARK-36677][SQL] NestedColumnAliasing should not push down aggregate functions into projections

2021-09-06 Thread GitBox



AmplabJenkins removed a comment on pull request #33921:
URL: https://github.com/apache/spark/pull/33921#issuecomment-913802314


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing

2021-09-06 Thread GitBox



SparkQA commented on pull request #33922:
URL: https://github.com/apache/spark/pull/33922#issuecomment-913925523


   **[Test build #143031 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143031/testReport)**
 for PR 33922 at commit 
[`181c5d1`](https://github.com/apache/spark/commit/181c5d19d819debef1ebe50a078acbb4bfe512a8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #33921: [SPARK-36677][SQL] NestedColumnAliasing should not push down aggregate functions into projections

2021-09-06 Thread GitBox



SparkQA commented on pull request #33921:
URL: https://github.com/apache/spark/pull/33921#issuecomment-913925558


   **[Test build #143032 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143032/testReport)**
 for PR 33921 at commit 
[`b7dbdc8`](https://github.com/apache/spark/commit/b7dbdc82477c6d6c6ea6c2085294c611b14833af).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #33921: [SPARK-36677][SQL] NestedColumnAliasing should not push down aggregate functions into projections

2021-09-06 Thread GitBox



HyukjinKwon commented on pull request #33921:
URL: https://github.com/apache/spark/pull/33921#issuecomment-913923694






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing

2021-09-06 Thread GitBox



HyukjinKwon commented on a change in pull request #33922:
URL: https://github.com/apache/spark/pull/33922#discussion_r703114054



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala
##
@@ -83,4 +90,54 @@ private[sql] object DataSourceV2Utils extends Logging {
   options.asCaseSensitiveMap())
 }
   }
+
+  def loadV2Source(sparkSession: SparkSession, provider: TableProvider,
+  userSpecifiedSchema: Option[StructType], extraOptions: 
CaseInsensitiveMap[String],
+   source: String, paths: String*): Option[DataFrame] = {
+val catalogManager = sparkSession.sessionState.catalogManager
+val sessionOptions = DataSourceV2Utils.extractSessionConfigs(
+  source = provider, conf = sparkSession.sessionState.conf)
+
+val optionsWithPath = getOptionsWithPaths(extraOptions, paths: _*)
+
+val finalOptions = 
sessionOptions.filterKeys(!optionsWithPath.contains(_)).toMap ++
+  optionsWithPath.originalMap
+val dsOptions = new CaseInsensitiveStringMap(finalOptions.asJava)
+val (table, catalog, ident) = provider match {
+  case _: SupportsCatalogOptions if userSpecifiedSchema.nonEmpty =>
+throw new IllegalArgumentException(
+  s"$source does not support user specified schema. Please don't 
specify the schema.")
+  case hasCatalog: SupportsCatalogOptions =>
+val ident = hasCatalog.extractIdentifier(dsOptions)
+val catalog = CatalogV2Util.getTableProviderCatalog(
+  hasCatalog,
+  catalogManager,
+  dsOptions)
+(catalog.loadTable(ident), Some(catalog), Some(ident))
+  case _ =>
+// TODO: Non-catalog paths for DSV2 are currently not well defined.

Review comment:
   I know this comment was already existent before but wanted to make a 
note. This isn't a good example of a comment. There's no JIRA. and we don't 
know what's not well defined.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing

2021-09-06 Thread GitBox



HyukjinKwon commented on a change in pull request #33922:
URL: https://github.com/apache/spark/pull/33922#discussion_r703113664



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala
##
@@ -83,4 +90,54 @@ private[sql] object DataSourceV2Utils extends Logging {
   options.asCaseSensitiveMap())
 }
   }
+
+  def loadV2Source(sparkSession: SparkSession, provider: TableProvider,
+  userSpecifiedSchema: Option[StructType], extraOptions: 
CaseInsensitiveMap[String],
+   source: String, paths: String*): Option[DataFrame] = {

Review comment:
   ```suggestion
 def loadV2Source(
 sparkSession: SparkSession,
 provider: TableProvider,
 userSpecifiedSchema: Option[StructType],
 extraOptions: CaseInsensitiveMap[String],
 source: String,
 paths: String*): Option[DataFrame] = {
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a change in pull request #33710: [SPARK-36481][ML] Expose LogisticRegression.setInitialModel, like KMeans et al do

2021-09-06 Thread GitBox



zhengruifeng commented on a change in pull request #33710:
URL: https://github.com/apache/spark/pull/33710#discussion_r703113604



##
File path: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
##
@@ -486,7 +486,8 @@ class LogisticRegression @Since("1.2.0") (
 
   private var optInitialModel: Option[LogisticRegressionModel] = None
 
-  private[spark] def setInitialModel(model: LogisticRegressionModel): 
this.type = {
+  @Since("3.3.0")
+  def setInitialModel(model: LogisticRegressionModel): this.type = {

Review comment:
   ok




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #33916: [SPARK-36667][SS][TEST] Close resources properly in StateStoreSuite/RocksDBStateStoreSuite

2021-09-06 Thread GitBox



HeartSaVioR commented on pull request #33916:
URL: https://github.com/apache/spark/pull/33916#issuecomment-913921322


   Thanks all for reviewing and merging!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing

2021-09-06 Thread GitBox



HyukjinKwon commented on pull request #33922:
URL: https://github.com/apache/spark/pull/33922#issuecomment-913921196


   ok to test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine

2021-09-06 Thread GitBox



HyukjinKwon commented on pull request #33858:
URL: https://github.com/apache/spark/pull/33858#issuecomment-913918373


   @itholic can you review this one please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya closed pull request #33916: [SPARK-36667][SS][TEST] Close resources properly in StateStoreSuite/RocksDBStateStoreSuite

2021-09-06 Thread GitBox



viirya closed pull request #33916:
URL: https://github.com/apache/spark/pull/33916


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #33916: [SPARK-36667][SS][TEST] Close resources properly in StateStoreSuite/RocksDBStateStoreSuite

2021-09-06 Thread GitBox



viirya commented on pull request #33916:
URL: https://github.com/apache/spark/pull/33916#issuecomment-913914092


   Thanks. Merging to master/3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing

2021-09-06 Thread GitBox



AmplabJenkins commented on pull request #33922:
URL: https://github.com/apache/spark/pull/33922#issuecomment-913910489


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add end-to-end codec test cases for ORC/Parquet datasources and LZ4 hadoop wrapper

2021-09-06 Thread GitBox



AmplabJenkins removed a comment on pull request #33912:
URL: https://github.com/apache/spark/pull/33912#issuecomment-913909867


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143028/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33916: [SPARK-36667][SS][TEST] Close resources properly in StateStoreSuite/RocksDBStateStoreSuite

2021-09-06 Thread GitBox



AmplabJenkins removed a comment on pull request #33916:
URL: https://github.com/apache/spark/pull/33916#issuecomment-913909866


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143027/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add end-to-end codec test cases for ORC/Parquet datasources and LZ4 hadoop wrapper

2021-09-06 Thread GitBox



AmplabJenkins commented on pull request #33912:
URL: https://github.com/apache/spark/pull/33912#issuecomment-913909867


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143028/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #33916: [SPARK-36667][SS][TEST] Close resources properly in StateStoreSuite/RocksDBStateStoreSuite

2021-09-06 Thread GitBox



AmplabJenkins commented on pull request #33916:
URL: https://github.com/apache/spark/pull/33916#issuecomment-913909866


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143027/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 371 matches

Mail list logo