Re: [PR] [SPARK-45934][DOCS] Fix `Spark Standalone` documentation table layout [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on code in PR #43814:
URL: https://github.com/apache/spark/pull/43814#discussion_r1394824967


##
docs/running-on-kubernetes.md:
##
@@ -1203,17 +1203,17 @@ See the [configuration page](configuration.html) for 
information on Spark config
   3.0.0
 
 
-  memoryOverheadFactor
+  spark.kubernetes.memoryOverheadFactor
   0.1
   
-This sets the Memory Overhead Factor that will allocate memory to non-JVM 
memory, which includes off-heap memory allocations, non-JVM tasks, various 
systems processes, and tmpfs-based local directories when 
local.dirs.tmpfs is true. For JVM-based jobs this 
value will default to 0.10 and 0.40 for non-JVM jobs.
+This sets the Memory Overhead Factor that will allocate memory to non-JVM 
memory, which includes off-heap memory allocations, non-JVM tasks, various 
systems processes, and tmpfs-based local directories when 
spark.kubernetes.local.dirs.tmpfs is true. For 
JVM-based jobs this value will default to 0.10 and 0.40 for non-JVM jobs.
 This is done as non-JVM tasks need more non-JVM heap space and such tasks 
commonly fail with "Memory Overhead Exceeded" errors. This preempts this error 
with a higher default.
 This will be overridden by the value set by 
spark.driver.memoryOverheadFactor and 
spark.executor.memoryOverheadFactor explicitly.

Review Comment:
   ~This is `Spark Standalone` documetation, @bjornjorgensen ~  ~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45934][DOCS] Fix `Spark Standalone` documentation table layout [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on code in PR #43814:
URL: https://github.com/apache/spark/pull/43814#discussion_r1394824967


##
docs/running-on-kubernetes.md:
##
@@ -1203,17 +1203,17 @@ See the [configuration page](configuration.html) for 
information on Spark config
   3.0.0
 
 
-  memoryOverheadFactor
+  spark.kubernetes.memoryOverheadFactor
   0.1
   
-This sets the Memory Overhead Factor that will allocate memory to non-JVM 
memory, which includes off-heap memory allocations, non-JVM tasks, various 
systems processes, and tmpfs-based local directories when 
local.dirs.tmpfs is true. For JVM-based jobs this 
value will default to 0.10 and 0.40 for non-JVM jobs.
+This sets the Memory Overhead Factor that will allocate memory to non-JVM 
memory, which includes off-heap memory allocations, non-JVM tasks, various 
systems processes, and tmpfs-based local directories when 
spark.kubernetes.local.dirs.tmpfs is true. For 
JVM-based jobs this value will default to 0.10 and 0.40 for non-JVM jobs.
 This is done as non-JVM tasks need more non-JVM heap space and such tasks 
commonly fail with "Memory Overhead Exceeded" errors. This preempts this error 
with a higher default.
 This will be overridden by the value set by 
spark.driver.memoryOverheadFactor and 
spark.executor.memoryOverheadFactor explicitly.

Review Comment:
   This is `Spark Standalone` documetation, @bjornjorgensen ~  



##
docs/running-on-kubernetes.md:
##
@@ -1203,17 +1203,17 @@ See the [configuration page](configuration.html) for 
information on Spark config
   3.0.0
 
 
-  memoryOverheadFactor
+  spark.kubernetes.memoryOverheadFactor
   0.1
   
-This sets the Memory Overhead Factor that will allocate memory to non-JVM 
memory, which includes off-heap memory allocations, non-JVM tasks, various 
systems processes, and tmpfs-based local directories when 
local.dirs.tmpfs is true. For JVM-based jobs this 
value will default to 0.10 and 0.40 for non-JVM jobs.
+This sets the Memory Overhead Factor that will allocate memory to non-JVM 
memory, which includes off-heap memory allocations, non-JVM tasks, various 
systems processes, and tmpfs-based local directories when 
spark.kubernetes.local.dirs.tmpfs is true. For 
JVM-based jobs this value will default to 0.10 and 0.40 for non-JVM jobs.
 This is done as non-JVM tasks need more non-JVM heap space and such tasks 
commonly fail with "Memory Overhead Exceeded" errors. This preempts this error 
with a higher default.
 This will be overridden by the value set by 
spark.driver.memoryOverheadFactor and 
spark.executor.memoryOverheadFactor explicitly.

Review Comment:
   This is `Spark Standalone` documetation, @bjornjorgensen ~  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45934][DOCS] Fix `Spark Standalone` documentation table layout [spark]

2023-11-15 Thread via GitHub


bjornjorgensen commented on code in PR #43814:
URL: https://github.com/apache/spark/pull/43814#discussion_r1394823713


##
docs/running-on-kubernetes.md:
##
@@ -1203,17 +1203,17 @@ See the [configuration page](configuration.html) for 
information on Spark config
   3.0.0
 
 
-  memoryOverheadFactor
+  spark.kubernetes.memoryOverheadFactor
   0.1
   
-This sets the Memory Overhead Factor that will allocate memory to non-JVM 
memory, which includes off-heap memory allocations, non-JVM tasks, various 
systems processes, and tmpfs-based local directories when 
local.dirs.tmpfs is true. For JVM-based jobs this 
value will default to 0.10 and 0.40 for non-JVM jobs.
+This sets the Memory Overhead Factor that will allocate memory to non-JVM 
memory, which includes off-heap memory allocations, non-JVM tasks, various 
systems processes, and tmpfs-based local directories when 
spark.kubernetes.local.dirs.tmpfs is true. For 
JVM-based jobs this value will default to 0.10 and 0.40 for non-JVM jobs.
 This is done as non-JVM tasks need more non-JVM heap space and such tasks 
commonly fail with "Memory Overhead Exceeded" errors. This preempts this error 
with a higher default.
 This will be overridden by the value set by 
spark.driver.memoryOverheadFactor and 
spark.executor.memoryOverheadFactor explicitly.

Review Comment:
   should spark.executor.memoryOverheadFactor this be 
spark.kubernetes.executor.memoryOverheadFactor ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45941][PS] Upgrade `pandas` to version 2.1.3 [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun closed pull request #43822: [SPARK-45941][PS] Upgrade `pandas` to 
version 2.1.3
URL: https://github.com/apache/spark/pull/43822


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45941][PS] Upgrade `pandas` to version 2.1.3 [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on PR #43822:
URL: https://github.com/apache/spark/pull/43822#issuecomment-1813276250

   Ya, it looks like that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45941][PS] Upgrade `pandas` to version 2.1.3 [spark]

2023-11-15 Thread via GitHub


bjornjorgensen commented on PR #43822:
URL: https://github.com/apache/spark/pull/43822#issuecomment-1813274288

   https://github.com/bjornjorgensen/spark/actions/runs/6881177899
   
   I pushed this 3 hours ago and waited to all the tests passed before I open 
this PR.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45941][PS] Upgrade `pandas` to version 2.1.3 [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on PR #43822:
URL: https://github.com/apache/spark/pull/43822#issuecomment-1813270978

   BTW, your image looks like an old one (3 hours ago). This PR is created 18 
minutes ago, isn't it?
   https://github.com/apache/spark/assets/9700541/a736d2d5-0885-4d91-8beb-211d97674f55;>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45941][PS] Upgrade `pandas` to version 2.1.3 [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on PR #43822:
URL: https://github.com/apache/spark/pull/43822#issuecomment-1813269062

   You can provide the link here, @bjornjorgensen ~ That would be enough if the 
CI is running in your side successfully.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45941][PS] Upgrade `pandas` to version 2.1.3 [spark]

2023-11-15 Thread via GitHub


bjornjorgensen commented on PR #43822:
URL: https://github.com/apache/spark/pull/43822#issuecomment-1813263774

   it seams to be happy 
   
![image](https://github.com/apache/spark/assets/47577197/2b9a1588-2337-402b-84c0-4337078d8b20)
   
   are there anything I can do with this then? 
   
![image](https://github.com/apache/spark/assets/47577197/cd05d991-c212-4508-9e93-59b708397b47)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45934][DOCS] Fix `Spark Standalone` documentation table layout [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on PR #43814:
URL: https://github.com/apache/spark/pull/43814#issuecomment-1813263099

   Could you review this PR, @bjornjorgensen ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45527][CORE] Use fraction to do the resource calculation [spark]

2023-11-15 Thread via GitHub


tgravescs commented on code in PR #43494:
URL: https://github.com/apache/spark/pull/43494#discussion_r1384058605


##
core/src/main/scala/org/apache/spark/scheduler/ExecutorResourcesAmounts.scala:
##
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import scala.collection.mutable.HashMap
+
+import org.apache.spark.SparkException
+import org.apache.spark.resource.ResourceAmountUtils.RESOURCE_TOTAL_AMOUNT
+import org.apache.spark.resource.ResourceProfile
+
+/**
+ * Class to hold information about a series of resources belonging to an 
executor.
+ * A resource could be a GPU, FPGA, etc. And it is used as a temporary
+ * class to calculate the resources amounts when offering resources to
+ * the tasks in the [[TaskSchedulerImpl]]
+ *
+ * One example is GPUs, where the addresses would be the indices of the GPUs
+ *
+ * @param resources The executor available resources and amount. eg,
+ *  Map("gpu" -> Map("0" -> 0.2*RESOURCE_TOTAL_AMOUNT,
+ *   "1" -> 1.0*RESOURCE_TOTAL_AMOUNT),
+ *  "fpga" -> Map("a" -> 0.3*RESOURCE_TOTAL_AMOUNT,
+ *"b" -> 0.9*RESOURCE_TOTAL_AMOUNT)
+ *  )
+ */
+private[spark] class ExecutorResourcesAmounts(
+private val resources: Map[String, Map[String, Long]]) extends 
Serializable {
+
+  /**
+   * convert the resources to be mutable HashMap
+   */
+  private val internalResources: Map[String, HashMap[String, Long]] = {
+resources.map { case (rName, addressAmounts) =>
+  rName -> HashMap(addressAmounts.toSeq: _*)
+}
+  }
+
+  /**
+   * The total address count of each resource. Eg,
+   * Map("gpu" -> Map("0" -> 0.5 * RESOURCE_TOTAL_AMOUNT,
+   *  "1" -> 0.5 * RESOURCE_TOTAL_AMOUNT,
+   *  "2" -> 0.5 * RESOURCE_TOTAL_AMOUNT),
+   * "fpga" -> Map("a" -> 0.5 * RESOURCE_TOTAL_AMOUNT,
+   *   "b" -> 0.5 * RESOURCE_TOTAL_AMOUNT))
+   * the resourceAmount will be Map("gpu" -> 3, "fpga" -> 2)
+   */
+  lazy val resourceAmount: Map[String, Int] = internalResources.map { case 
(rName, addressMap) =>
+rName -> addressMap.size
+  }
+
+  /**
+   * For testing purpose. convert internal resources back to the "fraction" 
resources.
+   */
+  private[spark] def availableResources: Map[String, Map[String, Double]] = {
+internalResources.map { case (rName, addressMap) =>
+  rName -> addressMap.map { case (address, amount) =>
+address -> amount.toDouble / RESOURCE_TOTAL_AMOUNT
+  }.toMap
+}
+  }
+
+  /**
+   * Acquire the resource and update the resource
+   * @param assignedResource the assigned resource information
+   */
+  def acquire(assignedResource: Map[String, Map[String, Long]]): Unit = {
+assignedResource.foreach { case (rName, taskResAmounts) =>
+  val availableResourceAmounts = internalResources.getOrElse(rName,
+throw new SparkException(s"Try to acquire an address from $rName that 
doesn't exist"))
+  taskResAmounts.foreach { case (address, amount) =>
+val prevInternalTotalAmount = 
availableResourceAmounts.getOrElse(address,
+  throw new SparkException(s"Try to acquire an address that doesn't 
exist. $rName " +
+s"address $address doesn't exist."))
+
+val left = prevInternalTotalAmount - amount
+if (left < 0) {
+  throw new SparkException(s"The total amount ${left.toDouble / 
RESOURCE_TOTAL_AMOUNT} " +
+s"after acquiring $rName address $address should be >= 0")
+}
+internalResources(rName)(address) = left
+  }
+}
+  }
+
+  /**
+   * Release the assigned resources to the resource pool
+   * @param assignedResource resource to be released
+   */
+  def release(assignedResource: Map[String, Map[String, Long]]): Unit = {
+assignedResource.foreach { case (rName, taskResAmounts) =>
+  val availableResourceAmounts = internalResources.getOrElse(rName,
+throw new SparkException(s"Try to release an address from $rName that 
doesn't exist"))
+  taskResAmounts.foreach { case (address, amount) =>
+val prevInternalTotalAmount = 

Re: [PR] [SPARK-43393][SQL] Address sequence expression overflow bug. [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on PR #41072:
URL: https://github.com/apache/spark/pull/41072#issuecomment-1813257544

   Could you fix the compilation of your PRs, @thepinetree ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45941][PS] Upgrade `pandas` to version 2.1.3 [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on PR #43822:
URL: https://github.com/apache/spark/pull/43822#issuecomment-1813254752

   Thanks. Could you make CI happy, @bjornjorgensen ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-45941][PS] Update `pandas` to version 2.1.3 [spark]

2023-11-15 Thread via GitHub


bjornjorgensen opened a new pull request, #43822:
URL: https://github.com/apache/spark/pull/43822

   ### What changes were proposed in this pull request?
   Update pandas from 2.1.2 to 2.1.3
   
   ### Why are the changes needed?
   Fixed infinite recursion from operations that return a new object on some 
DataFrame subclasses ([GH 
55763](https://github.com/pandas-dev/pandas/issues/55763))
   and Fix 
[read_parquet()](https://pandas.pydata.org/docs/reference/api/pandas.read_parquet.html#pandas.read_parquet)
 and 
[read_feather()](https://pandas.pydata.org/docs/reference/api/pandas.read_feather.html#pandas.read_feather)
 for [CVE-2023-47248](https://www.cve.org/CVERecord?id=CVE-2023-47248) ([GH 
55894](https://github.com/pandas-dev/pandas/issues/55894))
   
   [Release notes for 
2.1.3](https://pandas.pydata.org/docs/whatsnew/v2.1.3.html)
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Pass GA
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45762][CORE] Support shuffle managers defined in user jars by changing startup order [spark]

2023-11-15 Thread via GitHub


abellina commented on PR #43627:
URL: https://github.com/apache/spark/pull/43627#issuecomment-1813246792

   Thanks @mridulm, yes the commits make sense, it brings back the late 
initialization in the driver. I tested the change, the main difference from 
your patch @mridulm is I had to still get the shuffle manage class names using 
the method we added to the `ShuffleManager` object here 
https://github.com/apache/spark/pull/43627/files#diff-42a673b8fa5f2b999371dc97a5de7ebd2c2ec19447353d39efb7e8ebc012fe32R592,
 because the `shuffleManager` is not set yet at this point.
   
   @tgravescs fyi


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45527][CORE] Use fraction to do the resource calculation [spark]

2023-11-15 Thread via GitHub


tgravescs commented on PR #43494:
URL: https://github.com/apache/spark/pull/43494#issuecomment-1813235365

   >  23/11/13 09:40:22 INFO TaskSetManager: Starting task 1.0 in stage 0.0 
(TID 1) (10.51.70.102, executor 0, partition 1, 
   PROCESS_LOCAL, 7823 bytes) taskResourceAssignments Map(gpu -> Map(0 -> 
2000))
   
   > Should I change this behavior to align with the original one?
   
   Yeah that log message isn't very readable, for now lets just change it back 
to be similar to before format something like "resource" -> [Addresses], ie 
(gpu -> [0]).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45927][PYTHON] Update path handling in Python data source [spark]

2023-11-15 Thread via GitHub


allisonwang-db commented on code in PR #43809:
URL: https://github.com/apache/spark/pull/43809#discussion_r1394689877


##
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala:
##
@@ -246,7 +246,15 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
 val builder = 
sparkSession.sharedState.dataSourceManager.lookupDataSource(source)
 // Unless the legacy path option behavior is enabled, the extraOptions here
 // should not include "path" or "paths" as keys.
-val plan = builder(sparkSession, source, paths, userSpecifiedSchema, 
extraOptions)
+// Add path to the options field. Note currently it only supports a single 
path.
+val optionsWithPath = if (paths.isEmpty) {
+  extraOptions
+} else if (paths.length == 1) {
+extraOptions + ("path" -> paths.head)
+} else {
+  throw QueryCompilationErrors.multiplePathsUnsupportedError(source, paths)

Review Comment:
   Yes we only apply this check to Python data sources (it's in 
`loadUserDefinedDataSource). The behavior is indeed different from the v1 
source, but I find it more user-friendly to raise an explicit error than 
silently ignoring multiple paths.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45927][PYTHON] Update path handling in Python data source [spark]

2023-11-15 Thread via GitHub


allisonwang-db commented on code in PR #43809:
URL: https://github.com/apache/spark/pull/43809#discussion_r1394685914


##
python/pyspark/sql/datasource.py:
##
@@ -45,30 +45,19 @@ class DataSource(ABC):
 """
 
 @final
-def __init__(
-self,
-paths: List[str],
-userSpecifiedSchema: Optional[StructType],

Review Comment:
   This field is actually not used. Both the `reader` and `writer` functions 
take in the `schema` parameter, and we can pass in the actual schema there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45810][Python] Create Python UDTF API to stop consuming rows from the input table [spark]

2023-11-15 Thread via GitHub


dtenedor commented on code in PR #43682:
URL: https://github.com/apache/spark/pull/43682#discussion_r1394630877


##
python/pyspark/sql/tests/test_udtf.py:
##
@@ -2482,6 +2533,7 @@ def tearDownClass(cls):
 super(UDTFTests, cls).tearDownClass()
 
 
+'''

Review Comment:
   My mistake on this, reverted.



##
python/pyspark/sql/tests/test_udtf.py:
##
@@ -2775,7 +2827,7 @@ def tearDownClass(cls):
 
cls.spark.conf.unset("spark.sql.execution.pythonUDTF.arrow.enabled")
 finally:
 super(UDTFArrowTests, cls).tearDownClass()
-
+'''

Review Comment:
   My mistake on this, reverted.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-43393][SQL][3.5] Address sequence expression overflow bug. [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on PR #43820:
URL: https://github.com/apache/spark/pull/43820#issuecomment-1813034968

   Could you fix the compilation?
   ```
   [error] 
/home/runner/work/spark/spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala:787:27:
 not found: value toSQLId
   [error] "functionName" -> toSQLId("sequence"),
   [error]   
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-43393][SQL][3.3] Address sequence expression overflow bug. [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on PR #43821:
URL: https://github.com/apache/spark/pull/43821#issuecomment-1813032065

   Is GitHub Action triggered on this PR, @thepinetree ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-43393][SQL] Address sequence expression overflow bug. [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on PR #41072:
URL: https://github.com/apache/spark/pull/41072#issuecomment-1813024170

   Thank you so much!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-43393][SQL] Address sequence expression overflow bug. [spark]

2023-11-15 Thread via GitHub


thepinetree commented on PR #41072:
URL: https://github.com/apache/spark/pull/41072#issuecomment-1813023418

   @dongjoon-hyun @cloud-fan 
   
   Backport PRs:
   * 3.3: https://github.com/apache/spark/pull/43821
   * 3.4: https://github.com/apache/spark/pull/43819
   * 3.5: https://github.com/apache/spark/pull/43820


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-43393][SQL][3.3] Address sequence expression overflow bug. [spark]

2023-11-15 Thread via GitHub


thepinetree opened a new pull request, #43821:
URL: https://github.com/apache/spark/pull/43821

   ### What changes were proposed in this pull request?
   Spark has a (long-standing) overflow bug in the `sequence` expression.
   
   Consider the following operations:
   ```
   spark.sql("CREATE TABLE foo (l LONG);")
   spark.sql(s"INSERT INTO foo VALUES (${Long.MaxValue});")
   spark.sql("SELECT sequence(0, l) FROM foo;").collect()
   ```
   
   The result of these operations will be:
   ```
   Array[org.apache.spark.sql.Row] = Array([WrappedArray()])
   ```
   an unintended consequence of overflow.
   
   The sequence is applied to values `0` and `Long.MaxValue` with a step size 
of `1` which uses a length computation defined 
[here](https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3451).
 In this calculation, with `start = 0`, `stop = Long.MaxValue`, and `step = 1`, 
the calculated `len` overflows to `Long.MinValue`. The computation, in binary 
looks like:
   
   ```
 0111
   -  
   --
 0111
   / 0001
   --
 0111
   + 0001
   --
 1000
   ```
   
   The following 
[check](https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3454)
 passes as the negative `Long.MinValue` is still `<= MAX_ROUNDED_ARRAY_LENGTH`. 
The following cast to `toInt` uses this representation and [truncates the upper 
bits](https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3457)
 resulting in an empty length of `0`.
   
   Other overflows are similarly problematic.
   
   This PR addresses the issue by checking numeric operations in the length 
computation for overflow.
   
   ### Why are the changes needed?
   There is a correctness bug from overflow in the `sequence` expression.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Tests added in `CollectionExpressionsSuite.scala`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-43393][SQL][3.5] Address sequence expression overflow bug. [spark]

2023-11-15 Thread via GitHub


thepinetree opened a new pull request, #43820:
URL: https://github.com/apache/spark/pull/43820

   ### What changes were proposed in this pull request?
   Spark has a (long-standing) overflow bug in the `sequence` expression.
   
   Consider the following operations:
   ```
   spark.sql("CREATE TABLE foo (l LONG);")
   spark.sql(s"INSERT INTO foo VALUES (${Long.MaxValue});")
   spark.sql("SELECT sequence(0, l) FROM foo;").collect()
   ```
   
   The result of these operations will be:
   ```
   Array[org.apache.spark.sql.Row] = Array([WrappedArray()])
   ```
   an unintended consequence of overflow.
   
   The sequence is applied to values `0` and `Long.MaxValue` with a step size 
of `1` which uses a length computation defined 
[here](https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3451).
 In this calculation, with `start = 0`, `stop = Long.MaxValue`, and `step = 1`, 
the calculated `len` overflows to `Long.MinValue`. The computation, in binary 
looks like:
   
   ```
 0111
   -  
   --
 0111
   / 0001
   --
 0111
   + 0001
   --
 1000
   ```
   
   The following 
[check](https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3454)
 passes as the negative `Long.MinValue` is still `<= MAX_ROUNDED_ARRAY_LENGTH`. 
The following cast to `toInt` uses this representation and [truncates the upper 
bits](https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3457)
 resulting in an empty length of `0`.
   
   Other overflows are similarly problematic.
   
   This PR addresses the issue by checking numeric operations in the length 
computation for overflow.
   
   ### Why are the changes needed?
   There is a correctness bug from overflow in the `sequence` expression.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Tests added in `CollectionExpressionsSuite.scala`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-43393][SQL][3.4] Address sequence expression overflow bug. [spark]

2023-11-15 Thread via GitHub


thepinetree opened a new pull request, #43819:
URL: https://github.com/apache/spark/pull/43819

   ### What changes were proposed in this pull request?
   Spark has a (long-standing) overflow bug in the `sequence` expression.
   
   Consider the following operations:
   ```
   spark.sql("CREATE TABLE foo (l LONG);")
   spark.sql(s"INSERT INTO foo VALUES (${Long.MaxValue});")
   spark.sql("SELECT sequence(0, l) FROM foo;").collect()
   ```
   
   The result of these operations will be:
   ```
   Array[org.apache.spark.sql.Row] = Array([WrappedArray()])
   ```
   an unintended consequence of overflow.
   
   The sequence is applied to values `0` and `Long.MaxValue` with a step size 
of `1` which uses a length computation defined 
[here](https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3451).
 In this calculation, with `start = 0`, `stop = Long.MaxValue`, and `step = 1`, 
the calculated `len` overflows to `Long.MinValue`. The computation, in binary 
looks like:
   
   ```
 0111
   -  
   --
 0111
   / 0001
   --
 0111
   + 0001
   --
 1000
   ```
   
   The following 
[check](https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3454)
 passes as the negative `Long.MinValue` is still `<= MAX_ROUNDED_ARRAY_LENGTH`. 
The following cast to `toInt` uses this representation and [truncates the upper 
bits](https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3457)
 resulting in an empty length of `0`.
   
   Other overflows are similarly problematic.
   
   This PR addresses the issue by checking numeric operations in the length 
computation for overflow.
   
   ### Why are the changes needed?
   There is a correctness bug from overflow in the `sequence` expression.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Tests added in `CollectionExpressionsSuite.scala`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45934][DOCS] Fix `Spark Standalone` documentation table layout [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on code in PR #43814:
URL: https://github.com/apache/spark/pull/43814#discussion_r1394491810


##
docs/running-on-kubernetes.md:
##
@@ -590,41 +590,41 @@ Some of these include:
 
 See the [configuration page](configuration.html) for information on Spark 
configurations.  The following configurations are specific to Spark on 
Kubernetes.
 
- Spark Properties
+ Spark Properties (`spark.kubernetes.` prefix is omitted.)
 
 
 Property NameDefaultMeaningSince 
Version
 
-  spark.kubernetes.context

Review Comment:
   That's true. Let me spin-off `kubernetes` part from this PR, @yaooqinn .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-43393][SQL] Address sequence expression overflow bug. [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on PR #41072:
URL: https://github.com/apache/spark/pull/41072#issuecomment-1812937385

   Oh this seems to break branch-3.5.
   - https://github.com/apache/spark/actions/runs/6873765275
   
   Let me revert this from branch-3.5. Given the situation, we can start 
backport from branch-3.5 to branch-3.3 as three separate PRs, @thepinetree .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45562][DOCS] Regenerate `docs/sql-error-conditions.md` and add `42KDF` to `SQLSTATE table` in `error/README.md` [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun closed pull request #43817: [SPARK-45562][DOCS] Regenerate 
`docs/sql-error-conditions.md` and add `42KDF` to `SQLSTATE table` in 
`error/README.md`
URL: https://github.com/apache/spark/pull/43817


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45934][DOCS] Fix `spark-standalone.md` and `running-on-kubernetes.md` table layout [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on code in PR #43814:
URL: https://github.com/apache/spark/pull/43814#discussion_r1394491810


##
docs/running-on-kubernetes.md:
##
@@ -590,41 +590,41 @@ Some of these include:
 
 See the [configuration page](configuration.html) for information on Spark 
configurations.  The following configurations are specific to Spark on 
Kubernetes.
 
- Spark Properties
+ Spark Properties (`spark.kubernetes.` prefix is omitted.)
 
 
 Property NameDefaultMeaningSince 
Version
 
-  spark.kubernetes.context

Review Comment:
   That's true. Let me split `kubernetes` part from this PR, @yaooqinn .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45873][CORE][YARN][K8S] Make ExecutorFailureTracker more tolerant when app remains sufficient resources [spark]

2023-11-15 Thread via GitHub


tgravescs commented on PR #43746:
URL: https://github.com/apache/spark/pull/43746#issuecomment-1812838085

   > > Preemption on yarn shouldn't be going against the number of failed 
executors. If it is then something has changed and we should fix that.
   
   > Yes, you are right
   
   What do you mean by this, are you saying the Spark on YARN handling of 
preempted containers is not working properly? Meaning if the container is 
preempted it should not show up as an executor failure.  Are you seeing those 
preempted containers show up as failed?
Or are you saying that yes Spark on YARN doesn't mark preempted as failed? 
   
   > What does 'this feature' point to?
   
   Sorry I misunderstood your environment here, I thought you were running on 
k8s but it looks like you running on YARN.  by feature I mean the 
spark.yarn.max.executor.failures/spark.executor.maxNumFailures config and its 
functionality.
   
   So unless yarn preemption handling is broken (please answer question above), 
 you gave one very specific use case where user added a bad JAR, in that use 
case it seems like you just don't want spark.executor.maxNumFailures enabled at 
all.  You said you don't want the app to fail so admins can come fix things up 
and not have it affect other users.  If that is the case then Spark should 
allow users to turn spark.executor.maxNumFailures off or I assume you could do 
the same thing by setting it to int.maxvalue.  
   
   As implemented this seems very arbitrary and I would think hard for a normal 
user to set and use this feature.  You have it as a ratio, which normally I'm 
in favor of but really only works if you have max executors set so it is really 
just a hardcoded number. That number seems arbitrary as its just depends on if 
you get lucky and happen to have that before some users pushes a bad jar.   I 
don't understand why this isn't the same as minimum number of executors as that 
seems more in line  - saying you need some minimum number for this application 
to run and by the way its ok to keep running with this is launching new 
executors is failing. 
   
   If there is some other issues with Spark Connect and add jars maybe that is 
a different conversation about isolation 
(https://issues.apache.org/jira/browse/SPARK-44146).  Or maybe it needs to 
better prevent users from adding jars with the same name.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45873][CORE][YARN][K8S] Make ExecutorFailureTracker more tolerant when app remains sufficient resources [spark]

2023-11-15 Thread via GitHub


tgravescs commented on code in PR #43746:
URL: https://github.com/apache/spark/pull/43746#discussion_r1394411937


##
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##
@@ -2087,6 +2087,17 @@ package object config {
   .doubleConf
   .createOptional
 
+  private[spark] val SCHEDULER_MIN_RESOURCES_TO_SURVIVE_RATIO =
+ConfigBuilder("spark.scheduler.minResourcesToSurviveRatio")

Review Comment:
   I misread the title, sorry ignore this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

2023-11-15 Thread via GitHub


mridulm commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1394409363


##
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java:
##
@@ -182,23 +193,34 @@ public boolean skip(long n) {
 
   @Override
   public synchronized void close() throws IOException {
-db.notifyIteratorClosed(this);
+db.notifyIteratorClosed(it);
 if (!closed) {
-  it.close();
-  closed = true;
-  next = null;
+  try {
+it.close();
+  } finally {
+closed = true;
+next = null;
+cancelResourceClean();

Review Comment:
   This was perhaps discussed earlier - please point me to it if I am missing 
it.
   Why not simply call ` this.cleanable.clean();` in `close` (in place of 
notify + it.close) ?



##
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##
@@ -322,26 +323,15 @@ public void close() throws IOException {
 }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI 
LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation 
does not happen.
-   */
-  void closeIterator(LevelDBIterator it) throws IOException {
-notifyIteratorClosed(it);
-synchronized (this._db) {
-  DB _db = this._db.get();
-  if (_db != null) {
-it.close();
-  }
-}
-  }

Review Comment:
   resource cleaner has a reference to both `dbIterator` and `levelDB` - not 
sure what I am missing ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

2023-11-15 Thread via GitHub


zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1394398171


##
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##
@@ -322,26 +323,15 @@ public void close() throws IOException {
 }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI 
LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation 
does not happen.
-   */
-  void closeIterator(LevelDBIterator it) throws IOException {
-notifyIteratorClosed(it);
-synchronized (this._db) {
-  DB _db = this._db.get();
-  if (_db != null) {
-it.close();
-  }
-}
-  }

Review Comment:
   The main reason is ResourceCleaner can't hold references to the 
LevelDBIterator, so change it. If you have a better suggestion, please let me 
know and I will be happy to change it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

2023-11-15 Thread via GitHub


mridulm commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1394375372


##
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##
@@ -322,26 +323,15 @@ public void close() throws IOException {
 }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI 
LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation 
does not happen.
-   */
-  void closeIterator(LevelDBIterator it) throws IOException {
-notifyIteratorClosed(it);
-synchronized (this._db) {
-  DB _db = this._db.get();
-  if (_db != null) {
-it.close();
-  }
-}
-  }

Review Comment:
   The logic is consistent - and would remain the same with and without this 
move - I was wondering why the change ? Whether this was simply refactoring or 
there is any other reason for it.
   
   (Note: I am not advocating for reverting, just want to understand why this 
was done).
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

2023-11-15 Thread via GitHub


mridulm commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1394375372


##
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##
@@ -322,26 +323,15 @@ public void close() throws IOException {
 }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI 
LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation 
does not happen.
-   */
-  void closeIterator(LevelDBIterator it) throws IOException {
-notifyIteratorClosed(it);
-synchronized (this._db) {
-  DB _db = this._db.get();
-  if (_db != null) {
-it.close();
-  }
-}
-  }

Review Comment:
   The logic is consistent - and would remain the same with and without this 
move - I was wondering why the change ? Whether this was simply refactoring or 
there is any other reason for it.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45938][INFRA] Add `utils` to the dependencies of the `core` module in `module.py` [spark]

2023-11-15 Thread via GitHub


LuciferYang commented on code in PR #43818:
URL: https://github.com/apache/spark/pull/43818#discussion_r1394365136


##
dev/sparktestsupport/modules.py:
##
@@ -178,7 +178,7 @@ def __hash__(self):
 
 core = Module(
 name="core",
-dependencies=[kvstore, network_common, network_shuffle, unsafe, launcher],
+dependencies=[kvstore, network_common, network_shuffle, unsafe, launcher, 
utils],

Review Comment:
   cc @zhengruifeng @HyukjinKwon ,  `utils` module is also a direct dependency 
of unsafe and network-common. Do we need to modify the dependencies of `unsafe` 
and `network-common`? Or is the current change sufficient?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-45938][INFRA] Add `utils` to the dependency list of the `core` module in `module.py` [spark]

2023-11-15 Thread via GitHub


LuciferYang opened a new pull request, #43818:
URL: https://github.com/apache/spark/pull/43818

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45747][SS] Use prefix key information in state metadata to handle reading state for session window aggregation [spark]

2023-11-15 Thread via GitHub


HeartSaVioR commented on PR #43788:
URL: https://github.com/apache/spark/pull/43788#issuecomment-1812633610

   @chaoqin-li1123 Could you please rebase your change with latest master 
branch? merge script is confusing that I'm the main author due to my commits 
listed here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

2023-11-15 Thread via GitHub


zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1394277068


##
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##
@@ -322,26 +323,15 @@ public void close() throws IOException {
 }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI 
LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation 
does not happen.
-   */
-  void closeIterator(LevelDBIterator it) throws IOException {
-notifyIteratorClosed(it);
-synchronized (this._db) {
-  DB _db = this._db.get();
-  if (_db != null) {
-it.close();
-  }
-}
-  }

Review Comment:
   It was originally in finalize method, now it has been moved to 
ResourceCleaner.run. I think this logic is consistent



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

2023-11-15 Thread via GitHub


zhaomin1423 commented on code in PR #43502:
URL: https://github.com/apache/spark/pull/43502#discussion_r1394277068


##
common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java:
##
@@ -322,26 +323,15 @@ public void close() throws IOException {
 }
   }
 
-  /**
-   * Closes the given iterator if the DB is still open. Trying to close a JNI 
LevelDB handle
-   * with a closed DB can cause JVM crashes, so this ensures that situation 
does not happen.
-   */
-  void closeIterator(LevelDBIterator it) throws IOException {
-notifyIteratorClosed(it);
-synchronized (this._db) {
-  DB _db = this._db.get();
-  if (_db != null) {
-it.close();
-  }
-}
-  }

Review Comment:
   It was originally used in finalize method, but now it has been moved to 
ResourceCleaner.run. I think this logic is consistent



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45747][SS] Use prefix key information in state metadata to handle reading state for session window aggregation [spark]

2023-11-15 Thread via GitHub


HeartSaVioR commented on PR #43788:
URL: https://github.com/apache/spark/pull/43788#issuecomment-1812628737

   Thanks! Merging to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45562][DOCS] Regenerate `docs/sql-error-conditions.md` and add `42KDF` to `SQLSTATE table` in `error/README.md` [spark]

2023-11-15 Thread via GitHub


sandip-db commented on PR #43817:
URL: https://github.com/apache/spark/pull/43817#issuecomment-1812621901

   LGTM. @LuciferYang  Thanks for the quick fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45919][CORE][SQL] Use Java 16 `record` to simplify Java class definition [spark]

2023-11-15 Thread via GitHub


LuciferYang commented on PR #43796:
URL: https://github.com/apache/spark/pull/43796#issuecomment-1812528892

   wait https://github.com/apache/spark/pull/43817


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-45562][CORE][DOCS] Regenerate `docs/sql-error-conditions.md` and update `SQLSTATE table` in `error/README.md` [spark]

2023-11-15 Thread via GitHub


LuciferYang opened a new pull request, #43817:
URL: https://github.com/apache/spark/pull/43817

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45562][CORE][DOCS] Regenerate `docs/sql-error-conditions.md` and add `42KDF` to `SQLSTATE table` in `error/README.md` [spark]

2023-11-15 Thread via GitHub


LuciferYang commented on PR #43817:
URL: https://github.com/apache/spark/pull/43817#issuecomment-1812524632

   cc @HyukjinKwon @beliefer @sandip-db 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45562][SQL][FOLLOW-UP] XML: Fix SQLSTATE for missing rowTag error [spark]

2023-11-15 Thread via GitHub


LuciferYang commented on code in PR #43804:
URL: https://github.com/apache/spark/pull/43804#discussion_r1394181483


##
docs/sql-error-conditions.md:
##
@@ -2375,9 +2375,3 @@ The operation `` requires a ``. 
But `` is a
 The `` requires `` parameters but the actual number 
is ``.
 
 For more details see 
[WRONG_NUM_ARGS](sql-error-conditions-wrong-num-args-error-class.html)
-
-### XML_ROW_TAG_MISSING
-
-[SQLSTATE: 
42000](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)

Review Comment:
   Give a fix: https://github.com/apache/spark/pull/43817



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45562][SQL][FOLLOW-UP] XML: Fix SQLSTATE for missing rowTag error [spark]

2023-11-15 Thread via GitHub


LuciferYang commented on code in PR #43804:
URL: https://github.com/apache/spark/pull/43804#discussion_r1394156014


##
docs/sql-error-conditions.md:
##
@@ -2375,9 +2375,3 @@ The operation `` requires a ``. 
But `` is a
 The `` requires `` parameters but the actual number 
is ``.
 
 For more details see 
[WRONG_NUM_ARGS](sql-error-conditions-wrong-num-args-error-class.html)
-
-### XML_ROW_TAG_MISSING
-
-[SQLSTATE: 
42000](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)

Review Comment:
   Yes, this needs to be fixed. Additionally, the ` SQLSTATE table start ` in 
the `error/README.md` file also needs to be fixed, as there is no definition 
for the `42KDF` status. @sandip-db , could you quickly fix this issue? Or 
should we revert this pr first @HyukjinKwon ? GA keeps failing because of this 
issue.
   
   https://github.com/apache/spark/actions/runs/6874376893/job/18695941494
   
   ```
   [info] - SQLSTATE invariants *** FAILED *** (30 milliseconds)
   [info]   fx.apply(s) was false 42KDF (SparkThrowableSuite.scala:74)
   [info]   org.scalatest.exceptions.TestFailedException:
   [info]   at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
   [info]   at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
   [info]   at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
   [info]   at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
   [info]   at 
org.apache.spark.SparkThrowableSuite.$anonfun$checkCondition$1(SparkThrowableSuite.scala:74)
   [info]   at scala.collection.immutable.List.foreach(List.scala:333)
   [info]   at 
org.apache.spark.SparkThrowableSuite.checkCondition(SparkThrowableSuite.scala:73)
   [info]   at 
org.apache.spark.SparkThrowableSuite.$anonfun$new$6(SparkThrowableSuite.scala:138)
   [info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
   [info]   at 
org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
   
   [info] - Error classes match with document *** FAILED *** (71 milliseconds)
   [info]   "...gs-error-class.html)[
   [info]   
   [info]   ### XML_ROW_TAG_MISSING
   [info]   
   [info]   [SQLSTATE: 
42KDF](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)
   [info]   
   [info]   `` option is required for reading files in XML format.]" 
did not equal "...gs-error-class.html)[]" The error class document is not up to 
date. Please regenerate it. (SparkThrowableSuite.scala:346)
   [info]   Analysis:
   [info]   "...gs-error-class.html)[
   [info] 
   [info] ### XML_ROW_TAG_MISSING
   [info] 
   [info] [SQLSTATE: 
42KDF](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)
   [info] 
   [info] `` option is required for reading files in XML format.]" -> 
"...gs-error-class.html)[]"
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45905][SQL] Least common type between decimal types should retain integral digits first [spark]

2023-11-15 Thread via GitHub


cloud-fan closed pull request #43781: [SPARK-45905][SQL] Least common type 
between decimal types should retain integral digits first
URL: https://github.com/apache/spark/pull/43781


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45905][SQL] Least common type between decimal types should retain integral digits first [spark]

2023-11-15 Thread via GitHub


cloud-fan commented on PR #43781:
URL: https://github.com/apache/spark/pull/43781#issuecomment-1812465509

   The failure is unrelated
   ```
   Extension error:
   Could not import extension sphinx_copybutton (exception: No module named 
'sphinx_copybutton')
   make: *** [Makefile:35: html] Error 2
   ```
   
   I'm merging it to master, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45924][SQL] Fixing the canonicalization of SubqueryAdaptiveBroadcastExec and making it equivalent with SubqueryBroadcastExec [spark]

2023-11-15 Thread via GitHub


beliefer commented on PR #43806:
URL: https://github.com/apache/spark/pull/43806#issuecomment-1812439348

   > Ideally that should have happened, but what I see is one stage containing 
subquery adaptive exec and incoming exchange contains subqueryexec. Also this 
is just 1 of the issues. The main pr will be the one which requires additional 
functions to be implemented by DataSourceV2impl. Pls refer to the ticket which 
depends on this pr. Mat be this itself can be fixed by ensuring incoming 
exchange contains right exec. Then tinkering with canonicalize may not be 
needed. But in that case the buildPlan should be canonicalized
   
   That is impossible. `SubqueryAdaptiveBroadcastExec` will be eliminate with 
`DynamicPruningExpression`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45934][DOCS] Fix `spark-standalone.md` and `running-on-kubernetes.md` table layout [spark]

2023-11-15 Thread via GitHub


yaooqinn commented on code in PR #43814:
URL: https://github.com/apache/spark/pull/43814#discussion_r1394086066


##
docs/running-on-kubernetes.md:
##
@@ -590,41 +590,41 @@ Some of these include:
 
 See the [configuration page](configuration.html) for information on Spark 
configurations.  The following configurations are specific to Spark on 
Kubernetes.
 
- Spark Properties
+ Spark Properties (`spark.kubernetes.` prefix is omitted.)
 
 
 Property NameDefaultMeaningSince 
Version
 
-  spark.kubernetes.context

Review Comment:
   This might break the doc search?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45919][CORE][SQL] Use Java 16 `record` to simplify Java class definition [spark]

2023-11-15 Thread via GitHub


LuciferYang commented on code in PR #43796:
URL: https://github.com/apache/spark/pull/43796#discussion_r1394074332


##
common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java:
##
@@ -1612,19 +1612,8 @@ private void verifyMetrics(
 assertEquals(expectedIgnoredBlocksBytes, ignoredBlockBytes.getCount(), 
"ignored block bytes");
   }
 
-  private static class PushBlock {
-private final int shuffleId;
-private final int shuffleMergeId;
-private final int mapIndex;
-private final int reduceId;
-private final ByteBuffer buffer;
-PushBlock(int shuffleId, int shuffleMergeId, int mapIndex, int reduceId, 
ByteBuffer buffer) {
-  this.shuffleId = shuffleId;
-  this.shuffleMergeId = shuffleMergeId;
-  this.mapIndex = mapIndex;
-  this.reduceId = reduceId;
-  this.buffer = buffer;
-}
+  private record PushBlock(
+  int shuffleId, int shuffleMergeId, int mapIndex, int reduceId, 
ByteBuffer buffer) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45919][CORE][SQL] Use Java 16 `record` to simplify Java class definition [spark]

2023-11-15 Thread via GitHub


LuciferYang commented on code in PR #43796:
URL: https://github.com/apache/spark/pull/43796#discussion_r1394064308


##
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleServiceMetrics.java:
##
@@ -153,24 +153,6 @@ private static ShuffleServiceMetricsInfo 
getShuffleServiceMetricsInfoForGenericV
   valueName + " value of " + baseName);
   }
 
-  private static class ShuffleServiceMetricsInfo implements MetricsInfo {

Review Comment:
   Yes, Modifier `static` is redundant for inner records 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45919][CORE][SQL] Use Java 16 `record` to simplify Java class definition [spark]

2023-11-15 Thread via GitHub


LuciferYang commented on code in PR #43796:
URL: https://github.com/apache/spark/pull/43796#discussion_r1394067701


##
common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java:
##
@@ -1612,19 +1612,8 @@ private void verifyMetrics(
 assertEquals(expectedIgnoredBlocksBytes, ignoredBlockBytes.getCount(), 
"ignored block bytes");
   }
 
-  private static class PushBlock {
-private final int shuffleId;
-private final int shuffleMergeId;
-private final int mapIndex;
-private final int reduceId;
-private final ByteBuffer buffer;
-PushBlock(int shuffleId, int shuffleMergeId, int mapIndex, int reduceId, 
ByteBuffer buffer) {
-  this.shuffleId = shuffleId;
-  this.shuffleMergeId = shuffleMergeId;
-  this.mapIndex = mapIndex;
-  this.reduceId = reduceId;
-  this.buffer = buffer;
-}
+  private record PushBlock(
+  int shuffleId, int shuffleMergeId, int mapIndex, int reduceId, 
ByteBuffer buffer) {

Review Comment:
   Ok, let me revert this one. The previous commit did not revert it due to 
this in test case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45919][CORE][SQL] Use Java 16 `record` to simplify Java class definition [spark]

2023-11-15 Thread via GitHub


LuciferYang commented on code in PR #43796:
URL: https://github.com/apache/spark/pull/43796#discussion_r1394064308


##
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleServiceMetrics.java:
##
@@ -153,24 +153,6 @@ private static ShuffleServiceMetricsInfo 
getShuffleServiceMetricsInfoForGenericV
   valueName + " value of " + baseName);
   }
 
-  private static class ShuffleServiceMetricsInfo implements MetricsInfo {

Review Comment:
   Yes, Modifier 'static' is redundant for inner records 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45851][CONNECT][SCALA] Support multiple policies in scala client [spark]

2023-11-15 Thread via GitHub


cdkrot commented on PR #43757:
URL: https://github.com/apache/spark/pull/43757#issuecomment-1812291366

   cc @hvanhovell let's merge when tests pass :). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45919][CORE][SQL] Use Java 16 `record` to simplify Java class definition [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on code in PR #43796:
URL: https://github.com/apache/spark/pull/43796#discussion_r1394022682


##
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleServiceMetrics.java:
##
@@ -153,24 +153,6 @@ private static ShuffleServiceMetricsInfo 
getShuffleServiceMetricsInfoForGenericV
   valueName + " value of " + baseName);
   }
 
-  private static class ShuffleServiceMetricsInfo implements MetricsInfo {

Review Comment:
   `static class` is equivalent to `record`? Specifically, `static` part.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45919][CORE][SQL] Use Java 16 `record` to simplify Java class definition [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on code in PR #43796:
URL: https://github.com/apache/spark/pull/43796#discussion_r1394018312


##
common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java:
##
@@ -1612,19 +1612,8 @@ private void verifyMetrics(
 assertEquals(expectedIgnoredBlocksBytes, ignoredBlockBytes.getCount(), 
"ignored block bytes");
   }
 
-  private static class PushBlock {
-private final int shuffleId;
-private final int shuffleMergeId;
-private final int mapIndex;
-private final int reduceId;
-private final ByteBuffer buffer;
-PushBlock(int shuffleId, int shuffleMergeId, int mapIndex, int reduceId, 
ByteBuffer buffer) {
-  this.shuffleId = shuffleId;
-  this.shuffleMergeId = shuffleMergeId;
-  this.mapIndex = mapIndex;
-  this.reduceId = reduceId;
-  this.buffer = buffer;
-}
+  private record PushBlock(
+  int shuffleId, int shuffleMergeId, int mapIndex, int reduceId, 
ByteBuffer buffer) {

Review Comment:
   These fields should be `private`, shouldn't these?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45934][DOCS] Fix `spark-standalone.md` and `running-on-kubernetes.md` table layout [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on PR #43814:
URL: https://github.com/apache/spark/pull/43814#issuecomment-1812232936

   Could you review this when you have some time, @yaooqinn ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45915][SQL] Treat decimal(x, 0) the same as IntegralType in `PromoteStrings` [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on PR #43812:
URL: https://github.com/apache/spark/pull/43812#issuecomment-1812231907

   Merged to master for Apache Spark 4.0.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45915][SQL] Treat decimal(x, 0) the same as IntegralType in `PromoteStrings` [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun closed pull request #43812: [SPARK-45915][SQL] Treat decimal(x, 
0) the same as IntegralType in `PromoteStrings`
URL: https://github.com/apache/spark/pull/43812


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-45935][PYTHON][DOCS] Fix RST files link substitutions error [spark]

2023-11-15 Thread via GitHub


panbingkun opened a new pull request, #43815:
URL: https://github.com/apache/spark/pull/43815

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45851][CONNECT][SCALA] Support multiple policies in scala client [spark]

2023-11-15 Thread via GitHub


cdkrot commented on code in PR #43757:
URL: https://github.com/apache/spark/pull/43757#discussion_r1394006876


##
connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/RetryPolicy.scala:
##
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connect.client
+
+import scala.concurrent.duration.{Duration, FiniteDuration}
+import scala.util.Random
+
+import io.grpc.{Status, StatusRuntimeException}
+
+/**
+ * [[RetryPolicy]] configure the retry mechanism in [[GrpcRetryHandler]]
+ *
+ * @param maxRetries
+ * Maximum number of retries.
+ * @param initialBackoff
+ * Start value of the exponential backoff (ms).
+ * @param maxBackoff
+ * Maximal value of the exponential backoff (ms).
+ * @param backoffMultiplier
+ * Multiplicative base of the exponential backoff.
+ * @param canRetry
+ * Function that determines whether a retry is to be performed in the event of 
an error.
+ */
+case class RetryPolicy(
+  maxRetries: Option[Int] = None,
+  initialBackoff: FiniteDuration = FiniteDuration(1000, "ms"),
+  maxBackoff: Option[FiniteDuration] = None,
+  backoffMultiplier: Double = 1.0,
+  jitter: FiniteDuration = FiniteDuration(0, "s"),
+  minJitterThreshold: FiniteDuration = FiniteDuration(0, "s"),
+  canRetry: Throwable => Boolean,
+  name: String) {
+
+  def getName: String = name
+
+  def toState: RetryPolicy.RetryPolicyState = new 
RetryPolicy.RetryPolicyState(this)
+}
+
+object RetryPolicy {
+  def defaultPolicy(): RetryPolicy = RetryPolicy(
+name = "DefaultPolicy",
+// Please synchronize changes here with Python side:
+// pyspark/sql/connect/client/core.py
+//
+// Note: these constants are selected so that the maximum tolerated wait 
is guaranteed
+// to be at least 10 minutes
+maxRetries = Some(15),
+initialBackoff = FiniteDuration(50, "ms"),
+maxBackoff = Some(FiniteDuration(1, "min")),
+backoffMultiplier = 4.0,
+jitter = FiniteDuration(500, "ms"),
+minJitterThreshold = FiniteDuration(2, "s"),
+canRetry = defaultPolicyRetryException)
+
+  // list of policies to be used by this client
+  def defaultPolicies(): Seq[RetryPolicy] = List(defaultPolicy())
+
+  // represents a state of the specific policy
+  // (how many retries have happened and how much to wait until next one)
+  private class RetryPolicyState(val policy: RetryPolicy) {
+private var numberAttempts = 0
+private var nextWait: Duration = policy.initialBackoff
+
+// return waiting time until next attempt, or None if has exceeded max 
retries
+def nextAttempt(): Option[Duration] = {
+  if (policy.maxRetries.isDefined && numberAttempts >= 
policy.maxRetries.get) {
+return None
+  }
+
+  numberAttempts += 1
+
+  var currentWait = nextWait
+  nextWait = nextWait * policy.backoffMultiplier
+  if (policy.maxBackoff.isDefined) {
+nextWait = nextWait min policy.maxBackoff.get
+  }
+
+  if (currentWait >= policy.minJitterThreshold) {
+currentWait += Random.nextDouble() * policy.jitter
+  }
+
+  Some(currentWait)
+}
+
+def canRetry(throwable: Throwable): Boolean = policy.canRetry(throwable)
+
+def getName: String = policy.getName
+  }
+
+  /**
+   * Default canRetry in [[RetryPolicy]].
+   *
+   * @param e
+   * The exception to check.
+   * @return
+   * true if the exception is a [[StatusRuntimeException]] with code 
UNAVAILABLE.
+   */
+  private[client] def defaultPolicyRetryException(e: Throwable): Boolean = {
+e match {
+  case _: RetryPolicy.RetryException => true
+  case e: StatusRuntimeException =>
+val statusCode: Status.Code = e.getStatus.getCode
+
+if (statusCode == Status.Code.INTERNAL) {
+  val msg: String = e.toString
+
+  // This error happens if another RPC preempts this RPC.
+  if (msg.contains("INVALID_CURSOR.DISCONNECTED")) {
+return true
+  }
+}
+
+if (statusCode == Status.Code.UNAVAILABLE) {
+  return true
+}
+false
+  case _ => false
+}
+  }
+
+  /**
+   * An exception that can be thrown upstream when inside retry and which will 
be always 

Re: [PR] [SPARK-45851][CONNECT][SCALA] Support multiple policies in scala client [spark]

2023-11-15 Thread via GitHub


cdkrot commented on code in PR #43757:
URL: https://github.com/apache/spark/pull/43757#discussion_r1394006876


##
connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/RetryPolicy.scala:
##
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connect.client
+
+import scala.concurrent.duration.{Duration, FiniteDuration}
+import scala.util.Random
+
+import io.grpc.{Status, StatusRuntimeException}
+
+/**
+ * [[RetryPolicy]] configure the retry mechanism in [[GrpcRetryHandler]]
+ *
+ * @param maxRetries
+ * Maximum number of retries.
+ * @param initialBackoff
+ * Start value of the exponential backoff (ms).
+ * @param maxBackoff
+ * Maximal value of the exponential backoff (ms).
+ * @param backoffMultiplier
+ * Multiplicative base of the exponential backoff.
+ * @param canRetry
+ * Function that determines whether a retry is to be performed in the event of 
an error.
+ */
+case class RetryPolicy(
+  maxRetries: Option[Int] = None,
+  initialBackoff: FiniteDuration = FiniteDuration(1000, "ms"),
+  maxBackoff: Option[FiniteDuration] = None,
+  backoffMultiplier: Double = 1.0,
+  jitter: FiniteDuration = FiniteDuration(0, "s"),
+  minJitterThreshold: FiniteDuration = FiniteDuration(0, "s"),
+  canRetry: Throwable => Boolean,
+  name: String) {
+
+  def getName: String = name
+
+  def toState: RetryPolicy.RetryPolicyState = new 
RetryPolicy.RetryPolicyState(this)
+}
+
+object RetryPolicy {
+  def defaultPolicy(): RetryPolicy = RetryPolicy(
+name = "DefaultPolicy",
+// Please synchronize changes here with Python side:
+// pyspark/sql/connect/client/core.py
+//
+// Note: these constants are selected so that the maximum tolerated wait 
is guaranteed
+// to be at least 10 minutes
+maxRetries = Some(15),
+initialBackoff = FiniteDuration(50, "ms"),
+maxBackoff = Some(FiniteDuration(1, "min")),
+backoffMultiplier = 4.0,
+jitter = FiniteDuration(500, "ms"),
+minJitterThreshold = FiniteDuration(2, "s"),
+canRetry = defaultPolicyRetryException)
+
+  // list of policies to be used by this client
+  def defaultPolicies(): Seq[RetryPolicy] = List(defaultPolicy())
+
+  // represents a state of the specific policy
+  // (how many retries have happened and how much to wait until next one)
+  private class RetryPolicyState(val policy: RetryPolicy) {
+private var numberAttempts = 0
+private var nextWait: Duration = policy.initialBackoff
+
+// return waiting time until next attempt, or None if has exceeded max 
retries
+def nextAttempt(): Option[Duration] = {
+  if (policy.maxRetries.isDefined && numberAttempts >= 
policy.maxRetries.get) {
+return None
+  }
+
+  numberAttempts += 1
+
+  var currentWait = nextWait
+  nextWait = nextWait * policy.backoffMultiplier
+  if (policy.maxBackoff.isDefined) {
+nextWait = nextWait min policy.maxBackoff.get
+  }
+
+  if (currentWait >= policy.minJitterThreshold) {
+currentWait += Random.nextDouble() * policy.jitter
+  }
+
+  Some(currentWait)
+}
+
+def canRetry(throwable: Throwable): Boolean = policy.canRetry(throwable)
+
+def getName: String = policy.getName
+  }
+
+  /**
+   * Default canRetry in [[RetryPolicy]].
+   *
+   * @param e
+   * The exception to check.
+   * @return
+   * true if the exception is a [[StatusRuntimeException]] with code 
UNAVAILABLE.
+   */
+  private[client] def defaultPolicyRetryException(e: Throwable): Boolean = {
+e match {
+  case _: RetryPolicy.RetryException => true
+  case e: StatusRuntimeException =>
+val statusCode: Status.Code = e.getStatus.getCode
+
+if (statusCode == Status.Code.INTERNAL) {
+  val msg: String = e.toString
+
+  // This error happens if another RPC preempts this RPC.
+  if (msg.contains("INVALID_CURSOR.DISCONNECTED")) {
+return true
+  }
+}
+
+if (statusCode == Status.Code.UNAVAILABLE) {
+  return true
+}
+false
+  case _ => false
+}
+  }
+
+  /**
+   * An exception that can be thrown upstream when inside retry and which will 
be always 

[PR] [SPARK-45934][DOCS] Fix `spark-standalone.md` and `running-on-kubernetes.md` table layout [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun opened a new pull request, #43814:
URL: https://github.com/apache/spark/pull/43814

   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45851][CONNECT][SCALA] Support multiple policies in scala client [spark]

2023-11-15 Thread via GitHub


cdkrot commented on code in PR #43757:
URL: https://github.com/apache/spark/pull/43757#discussion_r1394001717


##
connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/RetryPolicy.scala:
##
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connect.client
+
+import scala.concurrent.duration.{Duration, FiniteDuration}
+import scala.util.Random
+
+import io.grpc.{Status, StatusRuntimeException}
+
+/**
+ * [[RetryPolicy]] configure the retry mechanism in [[GrpcRetryHandler]]
+ *
+ * @param maxRetries
+ * Maximum number of retries.
+ * @param initialBackoff
+ * Start value of the exponential backoff (ms).
+ * @param maxBackoff
+ * Maximal value of the exponential backoff (ms).
+ * @param backoffMultiplier
+ * Multiplicative base of the exponential backoff.
+ * @param canRetry
+ * Function that determines whether a retry is to be performed in the event of 
an error.
+ */
+case class RetryPolicy(
+  maxRetries: Option[Int] = None,
+  initialBackoff: FiniteDuration = FiniteDuration(1000, "ms"),
+  maxBackoff: Option[FiniteDuration] = None,
+  backoffMultiplier: Double = 1.0,
+  jitter: FiniteDuration = FiniteDuration(0, "s"),
+  minJitterThreshold: FiniteDuration = FiniteDuration(0, "s"),
+  canRetry: Throwable => Boolean,
+  name: String) {
+
+  def getName: String = name
+
+  def toState: RetryPolicy.RetryPolicyState = new 
RetryPolicy.RetryPolicyState(this)
+}
+
+object RetryPolicy {
+  def defaultPolicy(): RetryPolicy = RetryPolicy(
+name = "DefaultPolicy",
+// Please synchronize changes here with Python side:
+// pyspark/sql/connect/client/core.py

Review Comment:
   Might make sense :). I had a reasoning that Prs should be linearizable and 
did the change there and let it stay there then



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45919][CORE][SQL] Use Java 16 `record` to simplify Java class definition [spark]

2023-11-15 Thread via GitHub


LuciferYang commented on PR #43796:
URL: https://github.com/apache/spark/pull/43796#issuecomment-1812139559

   > > @dongjoon-hyun I want to clarify the issue. We don't want to use 
`record` here because `field` in the original class doesn't provide an 
Accessor, but since `record` automatically generates an Accessor method, it 
exposes the information of `field`. So, in similar scenarios, we shouldn't use 
`record`, right?
   > 
   > Yes, exactly. That's the point.
   
   Is it ok now?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-44496][SQL][FOLLOW-UP] CalendarIntervalType is also orderable [spark]

2023-11-15 Thread via GitHub


cloud-fan commented on PR #43805:
URL: https://github.com/apache/spark/pull/43805#issuecomment-1812104254

   So we just need to update the doc


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [MINOR] Fix some typo [spark]

2023-11-15 Thread via GitHub


panbingkun commented on code in PR #43724:
URL: https://github.com/apache/spark/pull/43724#discussion_r1393903025


##
docs/cloud-integration.md:
##
@@ -121,12 +121,12 @@ for talking to cloud infrastructures, in which case this 
module may not be neede
 Spark jobs must authenticate with the object stores to access data within them.
 
 1. When Spark is running in a cloud infrastructure, the credentials are 
usually automatically set up.
-1. `spark-submit` is able to read the `AWS_ENDPOINT_URL`, `AWS_ACCESS_KEY_ID`, 
`AWS_SECRET_ACCESS_KEY`
+2. `spark-submit` is able to read the `AWS_ENDPOINT_URL`, `AWS_ACCESS_KEY_ID`, 
`AWS_SECRET_ACCESS_KEY`

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45904][SQL][CONNECT] Mode function should supports sort with order direction [spark]

2023-11-15 Thread via GitHub


cloud-fan commented on code in PR #43786:
URL: https://github.com/apache/spark/pull/43786#discussion_r1393889797


##
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala:
##
@@ -842,19 +842,20 @@ object functions {
* @group agg_funcs
* @since 3.4.0
*/
-  def mode(e: Column): Column = mode(e, deterministic = false)
+  def mode(e: Column): Column = Column.fn("mode", e)
 
   /**
* Aggregate function: returns the most frequent value in a group.
*
-   * When multiple values have the same greatest frequency then either any of 
values is returned
-   * if deterministic is false or is not defined, or the lowest value is 
returned if deterministic
-   * is true.
+   * If there are multiple values with the greatest frequency only one value 
will be returned. The
+   * value will be chosen based on optional sort direction. Use ascending 
order to get smallest
+   * value or descending order to get largest value from multiple values with 
the same frequency.
+   * If this clause is not specified the exact chosen value is not determined.
*
* @group agg_funcs
* @since 4.0.0
*/
-  def mode(e: Column, deterministic: Boolean): Column = Column.fn("mode", e, 
lit(deterministic))
+  def mode(e: Column, isSortAsc: Boolean): Column = Column.fn("mode", e, 
lit(isSortAsc))

Review Comment:
   If we do want to support specify ordering, we should support ordering 
columns. I don't agree with this partial API change. Let's finish the SQL side 
first, then we can think about how the Scala API should look like.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45915][SQL] Treat decimal(x, 0) the same as IntegralType in `PromoteStrings` [spark]

2023-11-15 Thread via GitHub


wangyum commented on code in PR #43812:
URL: https://github.com/apache/spark/pull/43812#discussion_r1393881574


##
sql/core/src/test/resources/sql-tests/analyzer-results/typeCoercion/native/binaryComparison.sql.out:
##
@@ -1330,7 +1330,7 @@ Project [NOT (cast(cast(null as string) as bigint) = 
cast(1 as bigint)) AS (NOT
 -- !query
 SELECT cast(1 as decimal(10, 0)) = '1' FROM t
 -- !query analysis
-Project [(cast(cast(1 as decimal(10,0)) as double) = cast(1 as double)) AS 
(CAST(1 AS DECIMAL(10,0)) = 1)#x]
+Project [(cast(1 as decimal(10,0)) = cast(1 as decimal(10,0))) AS (CAST(1 AS 
DECIMAL(10,0)) = 1)#x]

Review Comment:
   Yes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45924][SQL] Fixing the canonicalization of SubqueryAdaptiveBroadcastExec and making it equivalent with SubqueryBroadcastExec [spark]

2023-11-15 Thread via GitHub


ahshahid commented on PR #43806:
URL: https://github.com/apache/spark/pull/43806#issuecomment-1812050963

   > AFAIK, the `SubqueryAdaptiveBroadcastExec` only used for dynamic partition 
pruning. `SubqueryAdaptiveBroadcastExec` will be replaced with 
`SubqueryBroadcastExec` and the later must reuse the broadcast exchange.
   
   Ideally that should have happened, but what I see is one stage containing 
subquery adaptive exec and incoming exchange contains subqueryexec.
   Also this is just 1 of the issues. The main pr will be the one which 
requires additional functions to be implemented by DataSourceV2impl.
   Pls refer to the ticket which depends on this pr.
   Mat be this itself can be fixed by ensuring incoming exchange contains right 
exec. Then tinkering with canonicalize may not be needed.
   But in that case the buildPlan should be canonicalized


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [MINOR][DOCS] Clarify collect_list -> ArrayType [spark]

2023-11-15 Thread via GitHub


landlord-matt commented on PR #43787:
URL: https://github.com/apache/spark/pull/43787#issuecomment-1812049206

   @HyukjinKwon: I now also did a similar thing for collect_set. Any comments 
on the proposal? I thought you would appreciate this suggestion :'(


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45924][SQL] Fixing the canonicalization of SubqueryAdaptiveBroadcastExec and making it equivalent with SubqueryBroadcastExec [spark]

2023-11-15 Thread via GitHub


beliefer commented on code in PR #43806:
URL: https://github.com/apache/spark/pull/43806#discussion_r1393856527


##
sql/core/src/main/scala/org/apache/spark/sql/execution/SubqueryAdaptiveBroadcastExec.scala:
##
@@ -46,7 +46,8 @@ case class SubqueryAdaptiveBroadcastExec(
 
   protected override def doCanonicalize(): SparkPlan = {
 val keys = buildKeys.map(k => QueryPlan.normalizeExpressions(k, 
child.output))
-copy(name = "dpp", buildKeys = keys, child = child.canonicalized)
+SubqueryBroadcastExec(name = "dpp", index = index, buildKeys = keys,

Review Comment:
   The implementation looks some against the design.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45924][SQL] Fixing the canonicalization of SubqueryAdaptiveBroadcastExec and making it equivalent with SubqueryBroadcastExec [spark]

2023-11-15 Thread via GitHub


beliefer commented on PR #43806:
URL: https://github.com/apache/spark/pull/43806#issuecomment-1812033293

   AFAIK, the `SubqueryAdaptiveBroadcastExec` only used for dynamic partition 
pruning.
   `SubqueryAdaptiveBroadcastExec` will be replaced with 
`SubqueryBroadcastExec` and the later must reuse the broadcast exchange.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45764][PYTHON][DOCS] Make code block copyable [spark]

2023-11-15 Thread via GitHub


HyukjinKwon commented on PR #43799:
URL: https://github.com/apache/spark/pull/43799#issuecomment-1812018431

   @panbingkun would you mind creating a backporting PR? Actually yeah I think 
it's an important improvement in docs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45919][CORE][SQL] Use Java 16 `record` to simplify Java class definition [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on PR #43796:
URL: https://github.com/apache/spark/pull/43796#issuecomment-1812013863

   > @dongjoon-hyun I want to clarify the issue. We don't want to use `record` 
here because `field` in the original class doesn't provide an Accessor, but 
since `record` automatically generates an Accessor method, it exposes the 
information of `field`. So, in similar scenarios, we shouldn't use `record`, 
right?
   
   Yes, exactly. That's the point.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45562][SQL][FOLLOW-UP] XML: Fix SQLSTATE for missing rowTag error [spark]

2023-11-15 Thread via GitHub


beliefer commented on code in PR #43804:
URL: https://github.com/apache/spark/pull/43804#discussion_r1393832007


##
docs/sql-error-conditions.md:
##
@@ -2375,9 +2375,3 @@ The operation `` requires a ``. 
But `` is a
 The `` requires `` parameters but the actual number 
is ``.
 
 For more details see 
[WRONG_NUM_ARGS](sql-error-conditions-wrong-num-args-error-class.html)
-
-### XML_ROW_TAG_MISSING
-
-[SQLSTATE: 
42000](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)

Review Comment:
   We don't need update this one?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45926][SQL] Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans [spark]

2023-11-15 Thread via GitHub


beliefer commented on PR #43808:
URL: https://github.com/apache/spark/pull/43808#issuecomment-1812006973

   It seems this PR is unrelated to runtime filter. I guess you mean is DS V2 
filter pushdown


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45915][SQL] Treat decimal(x, 0) the same as IntegralType in `PromoteStrings` [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on code in PR #43812:
URL: https://github.com/apache/spark/pull/43812#discussion_r1393823245


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala:
##
@@ -934,8 +934,8 @@ object TypeCoercion extends TypeCoercionBase {
 // There is no proper decimal type we can pick,
 // using double type is the best we can do.
 // See SPARK-22469 for details.
-case (n: DecimalType, s: StringType) => Some(DoubleType)
-case (s: StringType, n: DecimalType) => Some(DoubleType)
+case (DecimalType.Fixed(_, s), _: StringType) if s > 0 => Some(DoubleType)
+case (_: StringType, DecimalType.Fixed(_, s)) if s > 0 => Some(DoubleType)

Review Comment:
   Could you add some description about `s == 0` case after the line 936, 
please?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-45898][SQL] Support groupingSets operation in dataframe api [spark]

2023-11-15 Thread via GitHub


JacobZheng0927 opened a new pull request, #43813:
URL: https://github.com/apache/spark/pull/43813

   ### What changes were proposed in this pull request?
   Add groupingSets method in dataset api.
   
   `select col1, col2, col3, sum(col4) FROM t GROUP col1, col2, col3 BY 
GROUPING SETS ((col1, col2), ())`
   This SQL can be equivalently replaced with the following code: 
   `df.groupingSets(Seq(Seq("col1", "col2"), Seq()), "col1", "col2", 
"col3").sum("col4")`
   
   ### Why are the changes needed?
   Currently grouping sets can only be used in spark sql. This feature is not 
available when developing with the dataset api.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. This PR introduces the use of groupingSets in the dataset api.
   
   ### How was this patch tested?
   Tests added in `DataFrameAggregateSuite.scala`.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45915][SQL] Treat decimal(x, 0) the same as IntegralType in `PromoteStrings` [spark]

2023-11-15 Thread via GitHub


dongjoon-hyun commented on code in PR #43812:
URL: https://github.com/apache/spark/pull/43812#discussion_r1393820692


##
sql/core/src/test/resources/sql-tests/analyzer-results/typeCoercion/native/binaryComparison.sql.out:
##
@@ -1330,7 +1330,7 @@ Project [NOT (cast(cast(null as string) as bigint) = 
cast(1 as bigint)) AS (NOT
 -- !query
 SELECT cast(1 as decimal(10, 0)) = '1' FROM t
 -- !query analysis
-Project [(cast(cast(1 as decimal(10,0)) as double) = cast(1 as double)) AS 
(CAST(1 AS DECIMAL(10,0)) = 1)#x]
+Project [(cast(1 as decimal(10,0)) = cast(1 as decimal(10,0))) AS (CAST(1 AS 
DECIMAL(10,0)) = 1)#x]

Review Comment:
   So, is this the main motivation? `cast(1 as double)` -> `cast(1 as 
decimal(10,0))`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-44496][SQL][FOLLOW-UP] CalendarIntervalType is also orderable [spark]

2023-11-15 Thread via GitHub


cloud-fan commented on PR #43805:
URL: https://github.com/apache/spark/pull/43805#issuecomment-1811991530

   it's not really comparable as we can't compare `30 days` and `1 month`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-43393][SQL] Address sequence expression overflow bug. [spark]

2023-11-15 Thread via GitHub


cloud-fan commented on PR #41072:
URL: https://github.com/apache/spark/pull/41072#issuecomment-1811989510

   SGTM. @thepinetree can you help to create backport PRs? thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45764][PYTHON][DOCS] Make code block copyable [spark]

2023-11-15 Thread via GitHub


zhengruifeng commented on PR #43799:
URL: https://github.com/apache/spark/pull/43799#issuecomment-1811986470

   shall we backport this to other branches? so that new maintenance releases 
will benefit from this fix


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45925][SQL] Making SubqueryBroadcastExec equivalent to SubqueryAdaptiveBroadcastExec [spark]

2023-11-15 Thread via GitHub


ahshahid commented on PR #43807:
URL: https://github.com/apache/spark/pull/43807#issuecomment-1811975176

   I think if I get clean run on PR 
[SPARK-45924](https://github.com/apache/spark/pull/43806), then this PR can be 
closed without merging


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45924][SQL] Fixing the canonicalization of SubqueryAdaptiveBroadcastExec and making it equivalent with SubqueryBroadcastExec [spark]

2023-11-15 Thread via GitHub


ahshahid commented on PR #43806:
URL: https://github.com/apache/spark/pull/43806#issuecomment-1811971501

   I have reworked the PR to just canonicalize the 
SubqueryAdaptiveBroadcastExec as SubqueryBroadcastExec. This also fixes the 
reuse of exchange issue and seems to be lesser impacting change.
   It also means that PR 
[SPARK-45925](https://github.com/apache/spark/pull/43807) can be closed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2