[GitHub] [spark] SparkQA commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite
SparkQA commented on pull request #33912: URL: https://github.com/apache/spark/pull/33912#issuecomment-914010830 **[Test build #143036 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143036/testReport)** for PR 33912 at commit [`b6f20cf`](https://github.com/apache/spark/commit/b6f20cf3380a3295efbcc53e1394a96ccce9f013). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #33844: [SPARK-36506][PYTHON] Improve test coverage for series.py and indexes/*.py.
HyukjinKwon commented on a change in pull request #33844: URL: https://github.com/apache/spark/pull/33844#discussion_r703196115 ## File path: python/pyspark/pandas/indexes/base.py ## @@ -2601,6 +2599,12 @@ def __iter__(self) -> Iterator: return MissingPandasLikeIndex.__iter__(self) def __xor__(self, other: "Index") -> "Index": +warnings.warn( +"Index.__xor__ operating as a set operation is deprecated, " +"in the future this will be a logical operation matching Series.__xor__. " Review comment: @itholic is there any update on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing
AmplabJenkins commented on pull request #33922: URL: https://github.com/apache/spark/pull/33922#issuecomment-914009115 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143031/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing
SparkQA commented on pull request #33922: URL: https://github.com/apache/spark/pull/33922#issuecomment-914008361 **[Test build #143031 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143031/testReport)** for PR 33922 at commit [`181c5d1`](https://github.com/apache/spark/commit/181c5d19d819debef1ebe50a078acbb4bfe512a8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite
AmplabJenkins removed a comment on pull request #33912: URL: https://github.com/apache/spark/pull/33912#issuecomment-91488 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark
AmplabJenkins removed a comment on pull request #33877: URL: https://github.com/apache/spark/pull/33877#issuecomment-91487 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47540/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark
AmplabJenkins commented on pull request #33877: URL: https://github.com/apache/spark/pull/33877#issuecomment-91487 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47540/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite
AmplabJenkins commented on pull request #33912: URL: https://github.com/apache/spark/pull/33912#issuecomment-91488 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite
SparkQA removed a comment on pull request #33912: URL: https://github.com/apache/spark/pull/33912#issuecomment-913944602 **[Test build #143033 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143033/testReport)** for PR 33912 at commit [`0029f33`](https://github.com/apache/spark/commit/0029f332b4b91efbaed1fee0a7fb9dcac7c183cf). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark
SparkQA commented on pull request #33877: URL: https://github.com/apache/spark/pull/33877#issuecomment-913997050 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47541/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dbtsai commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite
dbtsai commented on pull request #33912: URL: https://github.com/apache/spark/pull/33912#issuecomment-913995000 Could we add a test for hadoop seq files using `sc.sequenceFile(...)`? There are still many legacy applications using hadoop seq files, and we want to ensure it works. We might want to exclude the relocation of snappy in Hadoop as well. The relocation only relocates the java classes, and the native jni interfaces will not be relocated. So let's say we include two different version of `snappy-java`, and one is the relocated one provided by Hadoop and the other one is the one provided by Spark. If `snappy-java` decides to change the native C interfaces, since those native methods can not be relocated, it will cause the incompatibility issue in loading the native methods. If both of them are non-relocated, and the dependency resolution will ensure we only include one version of snappy-java to avoid the potential incompatibility issue from native interface which can not technically be relocated. I remember @dongjoon-hyun saw this issue when he worked on the `zstd-jni` in Spark and Iceberg. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #33872: [SPARK-36575][CORE] Should ignore task finished event if its task set is gone in TaskSchedulerImpl.handleSuccessfulTask
Ngone51 commented on a change in pull request #33872: URL: https://github.com/apache/spark/pull/33872#discussion_r703178285 ## File path: core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala ## @@ -1995,6 +2000,61 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext with B assert(!normalTSM.runningTasksSet.contains(taskId)) } + test("SPARK-36575: Executor lost cause task hang") { +val taskScheduler = setupScheduler() + +val resultGetter = new TaskResultGetter(sc.env, taskScheduler) { + override protected val getTaskResultExecutor: ExecutorService = +ThreadUtils.newDaemonFixedThreadPool(1, "task-result-getter") + def taskResultExecutor() : ExecutorService = getTaskResultExecutor +} +taskScheduler.taskResultGetter = resultGetter + +val workerOffers = IndexedSeq(new WorkerOffer("executor0", "host0", 1), + new WorkerOffer("executor1", "host1", 1)) +val task1 = new ShuffleMapTask(1, 0, null, new Partition { + override def index: Int = 0 +}, Seq(TaskLocation("host0", "executor0")), new Properties, null) + +val task2 = new ShuffleMapTask(1, 0, null, new Partition { + override def index: Int = 0 +}, Seq(TaskLocation("host1", "executor1")), new Properties, null) + +val taskSet = new TaskSet(Array(task1, task2), 0, 0, 0, null, 0) + +taskScheduler.submitTasks(taskSet) +val taskDescriptions = taskScheduler.resourceOffers(workerOffers).flatten +assert(2 === taskDescriptions.length) + +val ser = sc.env.serializer.newInstance() +val directResult = new DirectTaskResult[Int](ser.serialize(1), Seq(), Array.empty) +val resultBytes = ser.serialize(directResult) + +// make getTaskResultExecutor busy +import scala.language.reflectiveCalls +resultGetter.taskResultExecutor().submit( new Runnable { + override def run(): Unit = Thread.sleep(100) +}) + +// task1 finished +taskScheduler.statusUpdate( + tid = taskDescriptions(0).taskId, + state = TaskState.FINISHED, + serializedData = resultBytes +) + +// mark executor heartbeat timed out +taskScheduler.executorLost(taskDescriptions(0).executorId, ExecutorProcessLost("Executor " + + "heartbeat timed out")) + +// Wait a while until all events are processed +Thread.sleep(100) Review comment: Wouldn't the second `executorLost` reset `successful(index)` to false? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark
SparkQA commented on pull request #33877: URL: https://github.com/apache/spark/pull/33877#issuecomment-913990550 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47540/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite
SparkQA commented on pull request #33912: URL: https://github.com/apache/spark/pull/33912#issuecomment-913989846 **[Test build #143033 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143033/testReport)** for PR 33912 at commit [`0029f33`](https://github.com/apache/spark/commit/0029f332b4b91efbaed1fee0a7fb9dcac7c183cf). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class OrcCodecSuite extends FileSourceCodecSuite with SharedSparkSession ` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite
SparkQA commented on pull request #33912: URL: https://github.com/apache/spark/pull/33912#issuecomment-913989299 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47539/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark
HyukjinKwon commented on pull request #33877: URL: https://github.com/apache/spark/pull/33877#issuecomment-913986456 I cleaned up a bit more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon removed a comment on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark
HyukjinKwon removed a comment on pull request #33877: URL: https://github.com/apache/spark/pull/33877#issuecomment-913964232 I cleaned up a bit more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark
SparkQA commented on pull request #33877: URL: https://github.com/apache/spark/pull/33877#issuecomment-913983644 **[Test build #143038 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143038/testReport)** for PR 33877 at commit [`f83511b`](https://github.com/apache/spark/commit/f83511b4729f5d4906205f400cba57f5ab0dcd3b). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing
AmplabJenkins commented on pull request #33922: URL: https://github.com/apache/spark/pull/33922#issuecomment-913980993 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite
AmplabJenkins removed a comment on pull request #33912: URL: https://github.com/apache/spark/pull/33912#issuecomment-913980223 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47535/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property
AmplabJenkins removed a comment on pull request #33923: URL: https://github.com/apache/spark/pull/33923#issuecomment-913980222 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47537/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property
AmplabJenkins commented on pull request #33923: URL: https://github.com/apache/spark/pull/33923#issuecomment-913980222 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47537/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite
AmplabJenkins commented on pull request #33912: URL: https://github.com/apache/spark/pull/33912#issuecomment-913980223 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47535/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark
SparkQA commented on pull request #33877: URL: https://github.com/apache/spark/pull/33877#issuecomment-913977412 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47540/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite
SparkQA commented on pull request #33912: URL: https://github.com/apache/spark/pull/33912#issuecomment-913976289 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47539/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property
SparkQA commented on pull request #33923: URL: https://github.com/apache/spark/pull/33923#issuecomment-913975805 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47537/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite
SparkQA commented on pull request #33912: URL: https://github.com/apache/spark/pull/33912#issuecomment-913973852 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47535/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sleep1661 commented on a change in pull request #33872: [SPARK-36575][CORE] Should ignore task finished event if its task set is gone in TaskSchedulerImpl.handleSuccessfulTask
sleep1661 commented on a change in pull request #33872: URL: https://github.com/apache/spark/pull/33872#discussion_r703156809 ## File path: core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala ## @@ -1995,6 +2000,61 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext with B assert(!normalTSM.runningTasksSet.contains(taskId)) } + test("SPARK-36575: Executor lost cause task hang") { +val taskScheduler = setupScheduler() + +val resultGetter = new TaskResultGetter(sc.env, taskScheduler) { + override protected val getTaskResultExecutor: ExecutorService = +ThreadUtils.newDaemonFixedThreadPool(1, "task-result-getter") + def taskResultExecutor() : ExecutorService = getTaskResultExecutor +} +taskScheduler.taskResultGetter = resultGetter + +val workerOffers = IndexedSeq(new WorkerOffer("executor0", "host0", 1), + new WorkerOffer("executor1", "host1", 1)) +val task1 = new ShuffleMapTask(1, 0, null, new Partition { + override def index: Int = 0 +}, Seq(TaskLocation("host0", "executor0")), new Properties, null) + +val task2 = new ShuffleMapTask(1, 0, null, new Partition { + override def index: Int = 0 +}, Seq(TaskLocation("host1", "executor1")), new Properties, null) + +val taskSet = new TaskSet(Array(task1, task2), 0, 0, 0, null, 0) + +taskScheduler.submitTasks(taskSet) +val taskDescriptions = taskScheduler.resourceOffers(workerOffers).flatten +assert(2 === taskDescriptions.length) + +val ser = sc.env.serializer.newInstance() +val directResult = new DirectTaskResult[Int](ser.serialize(1), Seq(), Array.empty) +val resultBytes = ser.serialize(directResult) + +// make getTaskResultExecutor busy +import scala.language.reflectiveCalls +resultGetter.taskResultExecutor().submit( new Runnable { + override def run(): Unit = Thread.sleep(100) +}) + +// task1 finished +taskScheduler.statusUpdate( + tid = taskDescriptions(0).taskId, + state = TaskState.FINISHED, + serializedData = resultBytes +) + +// mark executor heartbeat timed out +taskScheduler.executorLost(taskDescriptions(0).executorId, ExecutorProcessLost("Executor " + + "heartbeat timed out")) + +// Wait a while until all events are processed +Thread.sleep(100) Review comment: > > The event, IIUC, is a successful event, which will do `tasksSuccessful +=1` first, right? Yes. But the second `executorLost` made `tasksSuccessful -= 1`. With `successful(index)` was true now, the task would never been scheduled again, which resulted in "tasksSuccessful always less than numTask" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark
HyukjinKwon commented on pull request #33877: URL: https://github.com/apache/spark/pull/33877#issuecomment-913964232 I cleaned up a bit more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33877: [SPARK-36625][SPARK-36661][PYTHON] Support TimestampNTZ in pandas API on Spark
SparkQA commented on pull request #33877: URL: https://github.com/apache/spark/pull/33877#issuecomment-913964045 **[Test build #143037 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143037/testReport)** for PR 33877 at commit [`6620ade`](https://github.com/apache/spark/commit/6620adeb54fe95870c6de1604dfc2cd32028bed2). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite
SparkQA commented on pull request #33912: URL: https://github.com/apache/spark/pull/33912#issuecomment-913962814 **[Test build #143036 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143036/testReport)** for PR 33912 at commit [`b6f20cf`](https://github.com/apache/spark/commit/b6f20cf3380a3295efbcc53e1394a96ccce9f013). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
AmplabJenkins removed a comment on pull request #33893: URL: https://github.com/apache/spark/pull/33893#issuecomment-913962126 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47536/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
AmplabJenkins commented on pull request #33893: URL: https://github.com/apache/spark/pull/33893#issuecomment-913962126 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47536/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
SparkQA commented on pull request #33893: URL: https://github.com/apache/spark/pull/33893#issuecomment-913960171 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47536/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property
SparkQA commented on pull request #33923: URL: https://github.com/apache/spark/pull/33923#issuecomment-913959121 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47537/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
SparkQA commented on pull request #33893: URL: https://github.com/apache/spark/pull/33893#issuecomment-913956801 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47536/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite
SparkQA commented on pull request #33912: URL: https://github.com/apache/spark/pull/33912#issuecomment-913956607 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47535/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #33872: [SPARK-36575][CORE] Should ignore task finished event if its task set is gone in TaskSchedulerImpl.handleSuccessfulTask
Ngone51 commented on a change in pull request #33872: URL: https://github.com/apache/spark/pull/33872#discussion_r703140913 ## File path: core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala ## @@ -1995,6 +2000,61 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext with B assert(!normalTSM.runningTasksSet.contains(taskId)) } + test("SPARK-36575: Executor lost cause task hang") { +val taskScheduler = setupScheduler() + +val resultGetter = new TaskResultGetter(sc.env, taskScheduler) { + override protected val getTaskResultExecutor: ExecutorService = +ThreadUtils.newDaemonFixedThreadPool(1, "task-result-getter") + def taskResultExecutor() : ExecutorService = getTaskResultExecutor +} +taskScheduler.taskResultGetter = resultGetter + +val workerOffers = IndexedSeq(new WorkerOffer("executor0", "host0", 1), + new WorkerOffer("executor1", "host1", 1)) +val task1 = new ShuffleMapTask(1, 0, null, new Partition { + override def index: Int = 0 +}, Seq(TaskLocation("host0", "executor0")), new Properties, null) + +val task2 = new ShuffleMapTask(1, 0, null, new Partition { + override def index: Int = 0 +}, Seq(TaskLocation("host1", "executor1")), new Properties, null) + +val taskSet = new TaskSet(Array(task1, task2), 0, 0, 0, null, 0) + +taskScheduler.submitTasks(taskSet) +val taskDescriptions = taskScheduler.resourceOffers(workerOffers).flatten +assert(2 === taskDescriptions.length) + +val ser = sc.env.serializer.newInstance() +val directResult = new DirectTaskResult[Int](ser.serialize(1), Seq(), Array.empty) +val resultBytes = ser.serialize(directResult) + +// make getTaskResultExecutor busy +import scala.language.reflectiveCalls +resultGetter.taskResultExecutor().submit( new Runnable { + override def run(): Unit = Thread.sleep(100) +}) + +// task1 finished +taskScheduler.statusUpdate( + tid = taskDescriptions(0).taskId, + state = TaskState.FINISHED, + serializedData = resultBytes +) + +// mark executor heartbeat timed out +taskScheduler.executorLost(taskDescriptions(0).executorId, ExecutorProcessLost("Executor " + + "heartbeat timed out")) + +// Wait a while until all events are processed +Thread.sleep(100) Review comment: > Finally, it was found that TaskSetManager.executorLost was executed twice, and the second time resulted in tasksSuccessful -= 1, resulting in tasksSuccessful always less than numTask. Then, how could it result in "tasksSuccessful always less than numTask" if there's a task finish event? The event, IIUC, is a successful event, which will do `tasksSuccessful +=1` first, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite
viirya commented on a change in pull request #33912: URL: https://github.com/apache/spark/pull/33912#discussion_r703140216 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceCodecSuite.scala ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.test.{SharedSparkSession, SQLTestUtils} + +trait FileSourceCodecSuite extends QueryTest with SQLTestUtils { + + protected def dataSourceName: String + protected val codecConfigName: String + protected def availableCodecs: Seq[String] + + def testWithAllCodecs(name: String)(f: => Unit): Unit = { +for (codec <- availableCodecs) { + test(s"$name - data source $dataSourceName - codec: $codec") { +withSQLConf(codecConfigName -> codec) { + f +} + } +} + } + + testWithAllCodecs("write and read") { +withTempPath { dir => + testData +.repartition(5) +.write +.format(dataSourceName) +.save(dir.getCanonicalPath) + + val df = spark.read.format(dataSourceName).load(dir.getCanonicalPath) + checkAnswer(df, testData) +} + } +} + +class ParquetCodecSuite extends FileSourceCodecSuite with SharedSparkSession { + + override def dataSourceName: String = "parquet" + override val codecConfigName = SQLConf.PARQUET_COMPRESSION.key + // Exclude "lzo" because it is GPL-licenced so not included in Hadoop. + override protected def availableCodecs: Seq[String] = Seq("none", "uncompressed", "snappy", Review comment: Excluded "brotli" codec for non-supported arch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33160: [SPARK-35959][BUILD][test-maven][test-hadoop3.2][test-java11] Add a new Maven profile "no-shaded-hadoop-client" for Hadoop versions older tha
SparkQA commented on pull request #33160: URL: https://github.com/apache/spark/pull/33160#issuecomment-913950541 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47538/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33160: [SPARK-35959][BUILD][test-maven][test-hadoop3.2][test-java11] Add a new Maven profile "no-shaded-hadoop-client" for Hadoop versions old
AmplabJenkins commented on pull request #33160: URL: https://github.com/apache/spark/pull/33160#issuecomment-913950554 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47538/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property
AmplabJenkins removed a comment on pull request #33923: URL: https://github.com/apache/spark/pull/33923#issuecomment-913949321 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143035/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property
AmplabJenkins commented on pull request #33923: URL: https://github.com/apache/spark/pull/33923#issuecomment-913949321 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143035/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property
SparkQA removed a comment on pull request #33923: URL: https://github.com/apache/spark/pull/33923#issuecomment-913945885 **[Test build #143035 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143035/testReport)** for PR 33923 at commit [`4d1f7ad`](https://github.com/apache/spark/commit/4d1f7adc718fef8d2564201ed90f45d836e248d2). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property
SparkQA commented on pull request #33923: URL: https://github.com/apache/spark/pull/33923#issuecomment-913949230 **[Test build #143035 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143035/testReport)** for PR 33923 at commit [`4d1f7ad`](https://github.com/apache/spark/commit/4d1f7adc718fef8d2564201ed90f45d836e248d2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property
HyukjinKwon closed pull request #33923: URL: https://github.com/apache/spark/pull/33923 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property
HyukjinKwon commented on pull request #33923: URL: https://github.com/apache/spark/pull/33923#issuecomment-913947801 i manually built the docs and checked this. Merged to master and branch-3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a change in pull request #33744: [SPARK-36403][PYTHON] Implement `Index.putmask`
itholic commented on a change in pull request #33744: URL: https://github.com/apache/spark/pull/33744#discussion_r703134281 ## File path: python/pyspark/pandas/typedef/typehints.py ## @@ -323,7 +323,7 @@ def infer_pd_series_spark_type(pser: pd.Series, dtype: Dtype) -> types.DataType: if dtype == np.dtype("object"): if len(pser) == 0 or pser.isnull().all(): return types.NullType() -elif hasattr(pser.iloc[0], "__UDT__"): +elif hasattr(pser, "iloc") and hasattr(pser.iloc[0], "__UDT__"): Review comment: I think we should not pass the `Index` to the `infer_pd_series_spark_type`. At least we should change the function name (such as `infer_pd_indexops_spark_type`) and input & output type of the function, or add a new function and use it as you mentioned. BTW, actually I think we don't really need to use `pandas_udf` here, though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a change in pull request #33744: [SPARK-36403][PYTHON] Implement `Index.putmask`
itholic commented on a change in pull request #33744: URL: https://github.com/apache/spark/pull/33744#discussion_r703134281 ## File path: python/pyspark/pandas/typedef/typehints.py ## @@ -323,7 +323,7 @@ def infer_pd_series_spark_type(pser: pd.Series, dtype: Dtype) -> types.DataType: if dtype == np.dtype("object"): if len(pser) == 0 or pser.isnull().all(): return types.NullType() -elif hasattr(pser.iloc[0], "__UDT__"): +elif hasattr(pser, "iloc") and hasattr(pser.iloc[0], "__UDT__"): Review comment: I think we should not pass the `Index` to the `infer_pd_series_spark_type`. At least we should change the function name (such as `infer_pd_indexops_spark_type`) and type of input `pser` and mention this fix in PR description, or add a new function and use it, as you mentioned. BTW, actually I think we don't really need to use `pandas_udf` here, though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a change in pull request #33744: [SPARK-36403][PYTHON] Implement `Index.putmask`
itholic commented on a change in pull request #33744: URL: https://github.com/apache/spark/pull/33744#discussion_r703134281 ## File path: python/pyspark/pandas/typedef/typehints.py ## @@ -323,7 +323,7 @@ def infer_pd_series_spark_type(pser: pd.Series, dtype: Dtype) -> types.DataType: if dtype == np.dtype("object"): if len(pser) == 0 or pser.isnull().all(): return types.NullType() -elif hasattr(pser.iloc[0], "__UDT__"): +elif hasattr(pser, "iloc") and hasattr(pser.iloc[0], "__UDT__"): Review comment: I think we should not pass the `Index` to the `infer_pd_series_spark_type`. At least we should change the function name and type of input `pser` and mention this fix in PR description, or add a new function and use it, as you mentioned. BTW, actually I think we don't really need to use `pandas_udf` here, though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a change in pull request #33744: [SPARK-36403][PYTHON] Implement `Index.putmask`
itholic commented on a change in pull request #33744: URL: https://github.com/apache/spark/pull/33744#discussion_r703134281 ## File path: python/pyspark/pandas/typedef/typehints.py ## @@ -323,7 +323,7 @@ def infer_pd_series_spark_type(pser: pd.Series, dtype: Dtype) -> types.DataType: if dtype == np.dtype("object"): if len(pser) == 0 or pser.isnull().all(): return types.NullType() -elif hasattr(pser.iloc[0], "__UDT__"): +elif hasattr(pser, "iloc") and hasattr(pser.iloc[0], "__UDT__"): Review comment: I think we should not pass the `Index` to the `infer_pd_series_spark_type`. At least we should change the function name and type of input `pser`, and mention this fix in PR description, or add a new function and use it, as you mentioned. BTW, actually I think we don't really need to use `pandas_udf` here, though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property
SparkQA commented on pull request #33923: URL: https://github.com/apache/spark/pull/33923#issuecomment-913945885 **[Test build #143035 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143035/testReport)** for PR 33923 at commit [`4d1f7ad`](https://github.com/apache/spark/commit/4d1f7adc718fef8d2564201ed90f45d836e248d2). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #33923: [SPARK-36153][SQL][DOCS][FOLLOWUP] Fix the description about the possible values of `spark.sql.catalogImplementation` property
sarutak commented on pull request #33923: URL: https://github.com/apache/spark/pull/33923#issuecomment-913945882 cc: @HyukjinKwon @srowen @AngersZh who are involved in that PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a change in pull request #33744: [SPARK-36403][PYTHON] Implement `Index.putmask`
itholic commented on a change in pull request #33744: URL: https://github.com/apache/spark/pull/33744#discussion_r703134281 ## File path: python/pyspark/pandas/typedef/typehints.py ## @@ -323,7 +323,7 @@ def infer_pd_series_spark_type(pser: pd.Series, dtype: Dtype) -> types.DataType: if dtype == np.dtype("object"): if len(pser) == 0 or pser.isnull().all(): return types.NullType() -elif hasattr(pser.iloc[0], "__UDT__"): +elif hasattr(pser, "iloc") and hasattr(pser.iloc[0], "__UDT__"): Review comment: Got it! Then I think we should create the ticket and handle this separately. (Or, at least we should change the function name and type of input `pser`, and mention this fix in PR description with before & after examples) btw, actually I think we don't need to use `pandas_udf` here, though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a change in pull request #33744: [SPARK-36403][PYTHON] Implement `Index.putmask`
itholic commented on a change in pull request #33744: URL: https://github.com/apache/spark/pull/33744#discussion_r703134281 ## File path: python/pyspark/pandas/typedef/typehints.py ## @@ -323,7 +323,7 @@ def infer_pd_series_spark_type(pser: pd.Series, dtype: Dtype) -> types.DataType: if dtype == np.dtype("object"): if len(pser) == 0 or pser.isnull().all(): return types.NullType() -elif hasattr(pser.iloc[0], "__UDT__"): +elif hasattr(pser, "iloc") and hasattr(pser.iloc[0], "__UDT__"): Review comment: Got it! Then I think we should create the ticket and handle this separately. (Or, at least we should change the function name and type of input `pser`, and mention this fix in PR description with before after examples) btw, actually I think we don't need to use `pandas_udf` here, though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a change in pull request #33744: [SPARK-36403][PYTHON] Implement `Index.putmask`
itholic commented on a change in pull request #33744: URL: https://github.com/apache/spark/pull/33744#discussion_r703134281 ## File path: python/pyspark/pandas/typedef/typehints.py ## @@ -323,7 +323,7 @@ def infer_pd_series_spark_type(pser: pd.Series, dtype: Dtype) -> types.DataType: if dtype == np.dtype("object"): if len(pser) == 0 or pser.isnull().all(): return types.NullType() -elif hasattr(pser.iloc[0], "__UDT__"): +elif hasattr(pser, "iloc") and hasattr(pser.iloc[0], "__UDT__"): Review comment: Got it! Then I think we should create the ticket and handle this separately. (Or, at least we should change the function name and type of `pser`, and mention this fix in PR description with before after examples) btw, actually I think we don't need to use `pandas_udf` here, though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a change in pull request #33744: [SPARK-36403][PYTHON] Implement `Index.putmask`
itholic commented on a change in pull request #33744: URL: https://github.com/apache/spark/pull/33744#discussion_r703134281 ## File path: python/pyspark/pandas/typedef/typehints.py ## @@ -323,7 +323,7 @@ def infer_pd_series_spark_type(pser: pd.Series, dtype: Dtype) -> types.DataType: if dtype == np.dtype("object"): if len(pser) == 0 or pser.isnull().all(): return types.NullType() -elif hasattr(pser.iloc[0], "__UDT__"): +elif hasattr(pser, "iloc") and hasattr(pser.iloc[0], "__UDT__"): Review comment: Yeah, I think we should create the ticket and handle this separately. (Or, at least we should change the function name and type of `pser`, and mention this fix in PR description with before after examples) btw, actually I think we don't need to use `pandas_udf` here, though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak opened a new pull request #33923: [SPARK-36153][SQL][DOCS] Fix the description about the possible values of `spark.sql.catalogImplementation` property
sarutak opened a new pull request #33923: URL: https://github.com/apache/spark/pull/33923 ### What changes were proposed in this pull request? This PR fixes the description about the possible values of `spark.sql.catalogImplementation` property. It was added in SPARK-36153 (#33362) but the possible values are `hive` or `in-memory` rather than `true` or `false`. ### Why are the changes needed? To fix wrong description. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I just confirmed `in-memory` and `hive` are the valid values with SparkShell. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33893: [SPARK-36638][SQL] Generalize OptimizeSkewedJoin
SparkQA commented on pull request #33893: URL: https://github.com/apache/spark/pull/33893#issuecomment-913944664 **[Test build #143034 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143034/testReport)** for PR 33893 at commit [`545bf9b`](https://github.com/apache/spark/commit/545bf9bc0d2463869fdc46833366cb53c6a9e2fa). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite
SparkQA commented on pull request #33912: URL: https://github.com/apache/spark/pull/33912#issuecomment-913944602 **[Test build #143033 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143033/testReport)** for PR 33912 at commit [`0029f33`](https://github.com/apache/spark/commit/0029f332b4b91efbaed1fee0a7fb9dcac7c183cf). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33921: [SPARK-36677][SQL] NestedColumnAliasing should not push down aggregate functions into projections
AmplabJenkins removed a comment on pull request #33921: URL: https://github.com/apache/spark/pull/33921#issuecomment-913944244 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47534/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33803: [SPARK-36556][SQL] Add DSV2 filters
AmplabJenkins removed a comment on pull request #33803: URL: https://github.com/apache/spark/pull/33803#issuecomment-913944140 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143029/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33921: [SPARK-36677][SQL] NestedColumnAliasing should not push down aggregate functions into projections
AmplabJenkins commented on pull request #33921: URL: https://github.com/apache/spark/pull/33921#issuecomment-913944244 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47534/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33921: [SPARK-36677][SQL] NestedColumnAliasing should not push down aggregate functions into projections
SparkQA commented on pull request #33921: URL: https://github.com/apache/spark/pull/33921#issuecomment-913944229 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47534/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33803: [SPARK-36556][SQL] Add DSV2 filters
AmplabJenkins commented on pull request #33803: URL: https://github.com/apache/spark/pull/33803#issuecomment-913944140 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143029/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33803: [SPARK-36556][SQL] Add DSV2 filters
SparkQA removed a comment on pull request #33803: URL: https://github.com/apache/spark/pull/33803#issuecomment-913863758 **[Test build #143029 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143029/testReport)** for PR 33803 at commit [`ed0b009`](https://github.com/apache/spark/commit/ed0b009e973789517a9cce7db581617cf577a38d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing
AmplabJenkins removed a comment on pull request #33922: URL: https://github.com/apache/spark/pull/33922#issuecomment-913943448 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47533/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33803: [SPARK-36556][SQL] Add DSV2 filters
SparkQA commented on pull request #33803: URL: https://github.com/apache/spark/pull/33803#issuecomment-913943512 **[Test build #143029 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143029/testReport)** for PR 33803 at commit [`ed0b009`](https://github.com/apache/spark/commit/ed0b009e973789517a9cce7db581617cf577a38d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing
AmplabJenkins commented on pull request #33922: URL: https://github.com/apache/spark/pull/33922#issuecomment-913943448 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47533/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33921: [SPARK-36677][SQL] NestedColumnAliasing should not push down aggregate functions into projections
SparkQA commented on pull request #33921: URL: https://github.com/apache/spark/pull/33921#issuecomment-913941593 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47534/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing
SparkQA commented on pull request #33922: URL: https://github.com/apache/spark/pull/33922#issuecomment-913941564 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47533/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite
viirya commented on a change in pull request #33912: URL: https://github.com/apache/spark/pull/33912#discussion_r703128884 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceCodecSuite.scala ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.test.{SharedSparkSession, SQLTestUtils} + +trait FileSourceCodecSuite extends QueryTest with SQLTestUtils { + + protected def dataSourceName: String + protected val codecConfigName: String + protected def availableCodecs: Seq[String] + + def testWithAllCodecs(name: String)(f: => Unit): Unit = { +for (codec <- availableCodecs) { + test(s"$name - data source $dataSourceName - codec: $codec") { +withSQLConf(codecConfigName -> codec) { + f +} + } +} + } + + testWithAllCodecs("write and read") { +withTempPath { dir => + testData +.repartition(5) +.write +.format(dataSourceName) +.save(dir.getCanonicalPath) + + val df = spark.read.format(dataSourceName).load(dir.getCanonicalPath) + checkAnswer(df, testData) +} + } +} + +class ParquetCodecSuite extends FileSourceCodecSuite with SharedSparkSession { + + override def dataSourceName: String = "parquet" + override val codecConfigName = SQLConf.PARQUET_COMPRESSION.key + // Exclude "lzo" because it is GPL-licenced so not included in Hadoop. + override protected def availableCodecs: Seq[String] = Seq("none", "uncompressed", "snappy", Review comment: Hm, should we skip brotli-codec test for ARM64? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing
SparkQA commented on pull request #33922: URL: https://github.com/apache/spark/pull/33922#issuecomment-913938612 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47533/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite
viirya commented on a change in pull request #33912: URL: https://github.com/apache/spark/pull/33912#discussion_r703126645 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceCodecSuite.scala ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.test.{SharedSparkSession, SQLTestUtils} + +trait FileSourceCodecSuite extends QueryTest with SQLTestUtils { + + protected def dataSourceName: String + protected val codecConfigName: String + protected def availableCodecs: Seq[String] + + def testWithAllCodecs(name: String)(f: => Unit): Unit = { +for (codec <- availableCodecs) { + test(s"$name - data source $dataSourceName - codec: $codec") { +withSQLConf(codecConfigName -> codec) { + f +} + } +} + } + + testWithAllCodecs("write and read") { +withTempPath { dir => + testData +.repartition(5) +.write +.format(dataSourceName) +.save(dir.getCanonicalPath) + + val df = spark.read.format(dataSourceName).load(dir.getCanonicalPath) + checkAnswer(df, testData) +} + } +} + +class ParquetCodecSuite extends FileSourceCodecSuite with SharedSparkSession { + + override def dataSourceName: String = "parquet" + override val codecConfigName = SQLConf.PARQUET_COMPRESSION.key + // Exclude "lzo" because it is GPL-licenced so not included in Hadoop. + override protected def availableCodecs: Seq[String] = Seq("none", "uncompressed", "snappy", Review comment: For gzip, Parquet will use GzipCompressOutputStream. Hadoop doesn't have GZIP compressor yet, but Parquet still can write gzip compressed output. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add LZ4 hadoop wrapper and FileSourceCodecSuite
sunchao commented on a change in pull request #33912: URL: https://github.com/apache/spark/pull/33912#discussion_r703125181 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceCodecSuite.scala ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.test.{SharedSparkSession, SQLTestUtils} + +trait FileSourceCodecSuite extends QueryTest with SQLTestUtils { + + protected def format: String + protected val codecConfigName: String + protected def availableCodecs: Seq[String] + + def testWithAllCodecs(name: String)(f: => Unit): Unit = { +for (codec <- availableCodecs) { + test(s"$name - file source $format - codec: $codec") { +withSQLConf(codecConfigName -> codec) { + f +} + } +} + } + + testWithAllCodecs("write and read") { +withTempPath { dir => + testData +.repartition(5) +.write +.format(format) +.save(dir.getCanonicalPath) + + val df = spark.read.format(format).load(dir.getCanonicalPath) + checkAnswer(df, testData) +} + } +} + +class ParquetCodecSuite extends FileSourceCodecSuite with SharedSparkSession { + + override def format: String = "parquet" + override val codecConfigName = SQLConf.PARQUET_COMPRESSION.key + // Exclude "lzo" because it is GPL-licenced so not included in Hadoop. + override protected def availableCodecs: Seq[String] = Seq("none", "uncompressed", "snappy", +"gzip", "brotli", "zstd", "lz4") +} + +class OrcCodecSuite extends FileSourceCodecSuite with SharedSparkSession{ + + override def format: String = "orc" + override val codecConfigName = SQLConf.ORC_COMPRESSION.key Review comment: nit: add type annotation for public member? ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceCodecSuite.scala ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.test.{SharedSparkSession, SQLTestUtils} + +trait FileSourceCodecSuite extends QueryTest with SQLTestUtils { + + protected def format: String + protected val codecConfigName: String + protected def availableCodecs: Seq[String] + + def testWithAllCodecs(name: String)(f: => Unit): Unit = { +for (codec <- availableCodecs) { + test(s"$name - file source $format - codec: $codec") { +withSQLConf(codecConfigName -> codec) { + f +} + } +} + } + + testWithAllCodecs("write and read") { +withTempPath { dir => + testData +.repartition(5) +.write +.format(format) +.save(dir.getCanonicalPath) + + val df = spark.read.format(format).load(dir.getCanonicalPath) + checkAnswer(df, testData) +} + } +} + +class ParquetCodecSuite extends FileSourceCodecSuite with SharedSparkSession { + + override def format: String = "parquet" + override val codecConfigName = SQLConf.PARQUET_COMPRESSION.key + // Exclude "lzo" because it is GPL-licenced so not included in Hadoop. + override protected def availableCodecs: Seq[String] = Seq("none", "uncompressed", "snappy", +
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add end-to-end codec test cases for ORC/Parquet datasources and LZ4 hadoop wrapper
dongjoon-hyun commented on a change in pull request #33912: URL: https://github.com/apache/spark/pull/33912#discussion_r703120313 ## File path: core/src/main/java/org/apache/hadoop/shaded/net/jpountz/lz4/LZ4Compressor.java ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.shaded.net.jpountz.lz4; Review comment: You are right. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33914: [SPARK-32268][SQL] Dynamic bloom filter join pruning
HyukjinKwon commented on pull request #33914: URL: https://github.com/apache/spark/pull/33914#issuecomment-913928239 I think we should probably at least have a design doc to explain this .. from a cursory look the change looks huge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33917: [SPARK-36622][CORE] Making spark.history.kerberos.principal _HOST compliant
HyukjinKwon commented on pull request #33917: URL: https://github.com/apache/spark/pull/33917#issuecomment-913927177 I am not sure who's the best person to review .. maybe @gaborgsomogyi and @bersprockets ... ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #33917: [SPARK-36622][CORE] Making spark.history.kerberos.principal _HOST compliant
HyukjinKwon commented on a change in pull request #33917: URL: https://github.com/apache/spark/pull/33917#discussion_r703118302 ## File path: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala ## @@ -147,6 +148,15 @@ private[spark] class SparkHadoopUtil extends Logging { } } + /** + * Review comment: Maybe remove this line or add some description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #33917: [SPARK-36622][CORE] Making spark.history.kerberos.principal _HOST compliant
HyukjinKwon commented on a change in pull request #33917: URL: https://github.com/apache/spark/pull/33917#discussion_r703118164 ## File path: core/src/test/scala/org/apache/spark/deploy/SparkHadoopUtilSuite.scala ## @@ -80,6 +82,18 @@ class SparkHadoopUtilSuite extends SparkFunSuite { assertConfigValue(hadoopConf, "fs.s3a.endpoint", null) } + /** + * test for _HOST pattern replacement with Server cannonical address + */ + test("server principal with _HOST pattern") { +assert(SparkHadoopUtil.get.getServerPrincipal("spark/_h...@realm.com") + === "spark/%s...@realm.com".format(InetAddress.getLocalHost.getCanonicalHostName()) + , s"Mismatch in expected value") +assert(SparkHadoopUtil.get.getServerPrincipal("spark/0.0@realm.com") + === "spark/0.0@realm.com".format(InetAddress.getLocalHost.getCanonicalHostName()) + , s"Mismatch in expected value") Review comment: ```suggestion assert(SparkHadoopUtil.get.getServerPrincipal("spark/_h...@realm.com") === "spark/%s...@realm.com".format(InetAddress.getLocalHost.getCanonicalHostName()), "Mismatch in expected value") assert(SparkHadoopUtil.get.getServerPrincipal("spark/0.0@realm.com") === "spark/0.0@realm.com".format(InetAddress.getLocalHost.getCanonicalHostName()), "Mismatch in expected value") ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #33917: [SPARK-36622][CORE] Making spark.history.kerberos.principal _HOST compliant
HyukjinKwon commented on a change in pull request #33917: URL: https://github.com/apache/spark/pull/33917#discussion_r703118025 ## File path: core/src/test/scala/org/apache/spark/deploy/SparkHadoopUtilSuite.scala ## @@ -80,6 +82,18 @@ class SparkHadoopUtilSuite extends SparkFunSuite { assertConfigValue(hadoopConf, "fs.s3a.endpoint", null) } + /** + * test for _HOST pattern replacement with Server cannonical address + */ + test("server principal with _HOST pattern") { Review comment: Maybe let's add a JIRA prefix in the test title: ```suggestion test("SPARK-36622: server principal with _HOST pattern") { ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #33910: [SPARK-36666][SQL] Fix regression in AQEShuffleReadExec
sunchao commented on a change in pull request #33910: URL: https://github.com/apache/spark/pull/33910#discussion_r703117054 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEShuffleReadExec.scala ## @@ -82,8 +82,11 @@ case class AQEShuffleReadExec private( // `RoundRobinPartitioning` but we don't need to retain the number of partitions. case r: RoundRobinPartitioning => r.copy(numPartitions = partitionSpecs.length) -case other => throw new IllegalStateException( - "Unexpected partitioning for coalesced shuffle read: " + other) +case _ => + // Spark plugins may have custom partitioning and may replace this operator + // during the postStageOptimization phase, so return UnknownPartitioning here + // rather than throw an exception + UnknownPartitioning(partitionSpecs.length) Review comment: ya that'll do -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing
AmplabJenkins removed a comment on pull request #33922: URL: https://github.com/apache/spark/pull/33922#issuecomment-913910489 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33921: [SPARK-36677][SQL] NestedColumnAliasing should not push down aggregate functions into projections
AmplabJenkins removed a comment on pull request #33921: URL: https://github.com/apache/spark/pull/33921#issuecomment-913802314 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing
SparkQA commented on pull request #33922: URL: https://github.com/apache/spark/pull/33922#issuecomment-913925523 **[Test build #143031 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143031/testReport)** for PR 33922 at commit [`181c5d1`](https://github.com/apache/spark/commit/181c5d19d819debef1ebe50a078acbb4bfe512a8). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33921: [SPARK-36677][SQL] NestedColumnAliasing should not push down aggregate functions into projections
SparkQA commented on pull request #33921: URL: https://github.com/apache/spark/pull/33921#issuecomment-913925558 **[Test build #143032 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143032/testReport)** for PR 33921 at commit [`b7dbdc8`](https://github.com/apache/spark/commit/b7dbdc82477c6d6c6ea6c2085294c611b14833af). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33921: [SPARK-36677][SQL] NestedColumnAliasing should not push down aggregate functions into projections
HyukjinKwon commented on pull request #33921: URL: https://github.com/apache/spark/pull/33921#issuecomment-913923694 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing
HyukjinKwon commented on a change in pull request #33922: URL: https://github.com/apache/spark/pull/33922#discussion_r703114054 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala ## @@ -83,4 +90,54 @@ private[sql] object DataSourceV2Utils extends Logging { options.asCaseSensitiveMap()) } } + + def loadV2Source(sparkSession: SparkSession, provider: TableProvider, + userSpecifiedSchema: Option[StructType], extraOptions: CaseInsensitiveMap[String], + source: String, paths: String*): Option[DataFrame] = { +val catalogManager = sparkSession.sessionState.catalogManager +val sessionOptions = DataSourceV2Utils.extractSessionConfigs( + source = provider, conf = sparkSession.sessionState.conf) + +val optionsWithPath = getOptionsWithPaths(extraOptions, paths: _*) + +val finalOptions = sessionOptions.filterKeys(!optionsWithPath.contains(_)).toMap ++ + optionsWithPath.originalMap +val dsOptions = new CaseInsensitiveStringMap(finalOptions.asJava) +val (table, catalog, ident) = provider match { + case _: SupportsCatalogOptions if userSpecifiedSchema.nonEmpty => +throw new IllegalArgumentException( + s"$source does not support user specified schema. Please don't specify the schema.") + case hasCatalog: SupportsCatalogOptions => +val ident = hasCatalog.extractIdentifier(dsOptions) +val catalog = CatalogV2Util.getTableProviderCatalog( + hasCatalog, + catalogManager, + dsOptions) +(catalog.loadTable(ident), Some(catalog), Some(ident)) + case _ => +// TODO: Non-catalog paths for DSV2 are currently not well defined. Review comment: I know this comment was already existent before but wanted to make a note. This isn't a good example of a comment. There's no JIRA. and we don't know what's not well defined. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing
HyukjinKwon commented on a change in pull request #33922: URL: https://github.com/apache/spark/pull/33922#discussion_r703113664 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala ## @@ -83,4 +90,54 @@ private[sql] object DataSourceV2Utils extends Logging { options.asCaseSensitiveMap()) } } + + def loadV2Source(sparkSession: SparkSession, provider: TableProvider, + userSpecifiedSchema: Option[StructType], extraOptions: CaseInsensitiveMap[String], + source: String, paths: String*): Option[DataFrame] = { Review comment: ```suggestion def loadV2Source( sparkSession: SparkSession, provider: TableProvider, userSpecifiedSchema: Option[StructType], extraOptions: CaseInsensitiveMap[String], source: String, paths: String*): Option[DataFrame] = { ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #33710: [SPARK-36481][ML] Expose LogisticRegression.setInitialModel, like KMeans et al do
zhengruifeng commented on a change in pull request #33710: URL: https://github.com/apache/spark/pull/33710#discussion_r703113604 ## File path: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala ## @@ -486,7 +486,8 @@ class LogisticRegression @Since("1.2.0") ( private var optInitialModel: Option[LogisticRegressionModel] = None - private[spark] def setInitialModel(model: LogisticRegressionModel): this.type = { + @Since("3.3.0") + def setInitialModel(model: LogisticRegressionModel): this.type = { Review comment: ok -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #33916: [SPARK-36667][SS][TEST] Close resources properly in StateStoreSuite/RocksDBStateStoreSuite
HeartSaVioR commented on pull request #33916: URL: https://github.com/apache/spark/pull/33916#issuecomment-913921322 Thanks all for reviewing and merging! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing
HyukjinKwon commented on pull request #33922: URL: https://github.com/apache/spark/pull/33922#issuecomment-913921196 ok to test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine
HyukjinKwon commented on pull request #33858: URL: https://github.com/apache/spark/pull/33858#issuecomment-913918373 @itholic can you review this one please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya closed pull request #33916: [SPARK-36667][SS][TEST] Close resources properly in StateStoreSuite/RocksDBStateStoreSuite
viirya closed pull request #33916: URL: https://github.com/apache/spark/pull/33916 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #33916: [SPARK-36667][SS][TEST] Close resources properly in StateStoreSuite/RocksDBStateStoreSuite
viirya commented on pull request #33916: URL: https://github.com/apache/spark/pull/33916#issuecomment-913914092 Thanks. Merging to master/3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33922: [SPARK-35803][SQL] Support DataSource V2 CreateTempViewUsing
AmplabJenkins commented on pull request #33922: URL: https://github.com/apache/spark/pull/33922#issuecomment-913910489 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add end-to-end codec test cases for ORC/Parquet datasources and LZ4 hadoop wrapper
AmplabJenkins removed a comment on pull request #33912: URL: https://github.com/apache/spark/pull/33912#issuecomment-913909867 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143028/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33916: [SPARK-36667][SS][TEST] Close resources properly in StateStoreSuite/RocksDBStateStoreSuite
AmplabJenkins removed a comment on pull request #33916: URL: https://github.com/apache/spark/pull/33916#issuecomment-913909866 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143027/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33912: [SPARK-36670][SPARK-36669][CORE][SQL] Add end-to-end codec test cases for ORC/Parquet datasources and LZ4 hadoop wrapper
AmplabJenkins commented on pull request #33912: URL: https://github.com/apache/spark/pull/33912#issuecomment-913909867 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143028/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33916: [SPARK-36667][SS][TEST] Close resources properly in StateStoreSuite/RocksDBStateStoreSuite
AmplabJenkins commented on pull request #33916: URL: https://github.com/apache/spark/pull/33916#issuecomment-913909866 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143027/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org