[GitHub] [spark] zhengruifeng commented on a diff in pull request #39602: [SPARK-42021][CONNECT][PYTHON] Make `createDataFrame` support `array.array`

2023-01-15 Thread GitBox


zhengruifeng commented on code in PR #39602:
URL: https://github.com/apache/spark/pull/39602#discussion_r1070927723


##
python/pyspark/sql/connect/session.py:
##
@@ -299,9 +296,9 @@ def createDataFrame(
 # we need to convert it to [[1], [2], [3]] to be able to infer 
schema.
 _data = [[d] for d in _data]
 
-try:
-_inferred_schema = self._inferSchemaFromList(_data, _cols)
-except Exception:
+_inferred_schema = self._inferSchemaFromList(_data, _cols)

Review Comment:
   changes in this file is to make the error thrown in `test_array_types`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng opened a new pull request, #39602: [SPARK-42021][CONNECT][PYTHON] Make `createDataFrame` support `array.array`

2023-01-15 Thread GitBox


zhengruifeng opened a new pull request, #39602:
URL: https://github.com/apache/spark/pull/39602

   
   ### What changes were proposed in this pull request?
   Make `createDataFrame` support `array.array`
   
   
   ### Why are the changes needed?
   for parity
   
   
   ### Does this PR introduce _any_ user-facing change?
   yes
   
   
   ### How was this patch tested?
   enabled UT and added UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a diff in pull request #39598: [SPARK-41708][SQL][FOLLOWUP] Override `equals` and `hashCode` of `TableIdentifier` to make `CatalogFileIndex` print the same re

2023-01-15 Thread GitBox


ulysses-you commented on code in PR #39598:
URL: https://github.com/apache/spark/pull/39598#discussion_r1070918433


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala:
##
@@ -104,6 +104,13 @@ case class TableIdentifier(table: String, database: 
Option[String], catalog: Opt
 
   def this(table: String) = this(table, None, None)
   def this(table: String, database: Option[String]) = this(table, database, 
None)
+
+  override def equals(obj: Any): Boolean = obj match {

Review Comment:
   it had already used



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] peter-toth commented on a diff in pull request #38034: [SPARK-40599][SQL] Add multiTransform methods to TreeNode to generate alternatives

2023-01-15 Thread GitBox


peter-toth commented on code in PR #38034:
URL: https://github.com/apache/spark/pull/38034#discussion_r1070234296


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala:
##
@@ -618,6 +618,165 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] 
extends Product with Tre
 }
   }
 
+  /**
+   * Returns alternative copies of this node where `rule` has been recursively 
applied to the tree.
+   *
+   * Users should not expect a specific directionality. If a specific 
directionality is needed,
+   * multiTransformDown or multiTransformUp should be used.
+   *
+   * @param rule a function used to generate transformed alternatives for a 
node
+   * @return the stream of alternatives
+   */
+  def multiTransform(rule: PartialFunction[BaseType, Seq[BaseType]]): 
Stream[BaseType] = {
+multiTransformDown(rule)
+  }
+
+  /**
+   * Returns alternative copies of this node where `rule` has been recursively 
applied to the tree.
+   *
+   * Users should not expect a specific directionality. If a specific 
directionality is needed,
+   * multiTransformDownWithPruning or multiTransformUpWithPruning should be 
used.
+   *
+   * @param rule   a function used to generate transformed alternatives for a 
node
+   * @param cond   a Lambda expression to prune tree traversals. If 
`cond.apply` returns false
+   *   on a TreeNode T, skips processing T and its subtree; 
otherwise, processes
+   *   T and its subtree recursively.
+   * @param ruleId is a unique Id for `rule` to prune unnecessary tree 
traversals. When it is
+   *   UnknownRuleId, no pruning happens. Otherwise, if `rule` 
(with id `ruleId`)
+   *   has been marked as in effective on a TreeNode T, skips 
processing T and its
+   *   subtree. Do not pass it if the rule is not purely 
functional and reads a
+   *   varying initial state for different invocations.
+   * @return   the stream of alternatives
+   */
+  def multiTransformWithPruning(
+  cond: TreePatternBits => Boolean,
+  ruleId: RuleId = UnknownRuleId
+)(rule: PartialFunction[BaseType, Seq[BaseType]]): Stream[BaseType] = {
+multiTransformDownWithPruning(cond, ruleId)(rule).map(_._1)
+  }
+
+  /**
+   * Returns alternative copies of this node where `rule` has been recursively 
applied to it and all
+   * of its children (pre-order).
+   *
+   * @param rule the function used to generate transformed alternatives for a 
node
+   * @return the stream of alternatives
+   */
+  def multiTransformDown(rule: PartialFunction[BaseType, Seq[BaseType]]): 
Stream[BaseType] = {
+multiTransformDownWithPruning(AlwaysProcess.fn, 
UnknownRuleId)(rule).map(_._1)
+  }
+
+  /**
+   * Returns alternative copies of this node where `rule` has been recursively 
applied to it and all
+   * of its children (pre-order).
+   *
+   * As it is very easy to generate enormous number of alternatives when the 
input tree is huge or
+   * when the rule returns large number of alternatives, this function returns 
the alternatives as a
+   * lazy `Stream` to be able to limit the number of alternatives generated at 
the caller side as
+   * needed.
+   *
+   * To indicate that the original node without any transformation is a valid 
alternative the rule
+   * can either:
+   * - not apply or
+   * - return an empty `Seq` or
+   * - a `Seq` that contains a node that is equal to the original node.
+   *
+   * Please note that this function always consider the original node as a 
valid alternative (even
+   * if the original node is not included in the returned `Seq`) if the rule 
can transform any of
+   * the descendants of the node. E.g. consider a simple expression:
+   *   `Add(a, b)`
+   * and a rule that returns:
+   *   `Seq(1, 2)` for `a` and
+   *   `Seq(10, 20)` for `b` and
+   *   `Seq(11, 12, 21, 22)` for `Add(a, b)` (note that the original `Add(a, 
b)` is not returned)
+   * then the result of `multiTransform` is:
+   *   `Seq(11, 12, 21, 22, Add(1, 10), Add(2, 10), Add(1, 20), Add(2, 20))`.
+   * This feature makes the usage of `multiTransform` easier as a non-leaf 
transforming rule doesn't
+   * need to take into account that it can transform a descendant node of the 
non-leaf node as well
+   * and so it doesn't need return the non-leaf node itself in the list of 
alternatives to not stop
+   * generating alternatives.
+   *
+   * @param rule   a function used to generate transformed alternatives for a 
node
+   * @param cond   a Lambda expression to prune tree traversals. If 
`cond.apply` returns false
+   *   on a TreeNode T, skips processing T and its subtree; 
otherwise, processes
+   *   T and its subtree recursively.
+   * @param ruleId is a unique Id for `rule` to prune unnecessary tree 
traversals. When it is
+   *   UnknownRuleId, no pruning happens. Otherwise, if `rule` 
(with id `ruleId`)
+   *   has been marked 

[GitHub] [spark] dongjoon-hyun opened a new pull request, #39601: [SPARK-42087][SQL][TESTS] Use `--no-same-owner` when `HiveExternalCatalogVersionsSuite` untars

2023-01-15 Thread GitBox


dongjoon-hyun opened a new pull request, #39601:
URL: https://github.com/apache/spark/pull/39601

   ### What changes were proposed in this pull request?
   
   This PR aims to use a new parameter `--no-same-owner` when 
`HiveExternalCatalogVersionsSuite` untars
   
   ### Why are the changes needed?
   
   **BEFORE**
   ```
   root@edeccfee6d53:/spark# tar xfz spark-3.3.1-bin-hadoop3.tgz
   root@edeccfee6d53:/spark# ls -al spark-3.3.1-bin-hadoop3
   total 96
   drwxr-xr-x  17 110302528 1000   544 Oct 15 10:32 .
   drwxr-xr-x   4 root  root   128 Jan 16 07:30 ..
   -rw-r--r--   1 root  root 22940 Oct 15 10:32 LICENSE
   -rw-r--r--   1 root  root 57842 Oct 15 10:32 NOTICE
   drwxr-xr-x   3 110302528 100096 Oct 15 10:32 R
   -rw-r--r--   1 root  root  4461 Oct 15 10:32 README.md
   -rw-r--r--   1 root  root   165 Oct 15 10:32 RELEASE
   drwxr-xr-x  29 110302528 1000   928 Oct 15 10:32 bin
   drwxr-xr-x   8 110302528 1000   256 Oct 15 10:32 conf
   drwxr-xr-x   5 110302528 1000   160 Oct 15 10:32 data
   drwxr-xr-x   4 110302528 1000   128 Oct 15 10:32 examples
   drwxr-xr-x 250 110302528 1000  8000 Oct 15 10:32 jars
   drwxr-xr-x   4 110302528 1000   128 Oct 15 10:32 kubernetes
   drwxr-xr-x  60 110302528 1000  1920 Oct 15 10:32 licenses
   drwxr-xr-x  19 110302528 1000   608 Oct 15 10:32 python
   drwxr-xr-x  29 110302528 1000   928 Oct 15 10:32 sbin
   drwxr-xr-x   3 110302528 100096 Oct 15 10:32 yarn
   ```
   
   **AFTER**
   ```
   root@edeccfee6d53:/spark# tar xfz spark-3.3.1-bin-hadoop3.tgz --no-same-owner
   root@edeccfee6d53:/spark# ls -al spark-3.3.1-bin-hadoop3
   total 96
   drwxr-xr-x  17 root root   544 Oct 15 10:32 .
   drwxr-xr-x   4 root root   128 Jan 16 07:34 ..
   -rw-r--r--   1 root root 22940 Oct 15 10:32 LICENSE
   -rw-r--r--   1 root root 57842 Oct 15 10:32 NOTICE
   drwxr-xr-x   3 root root96 Oct 15 10:32 R
   -rw-r--r--   1 root root  4461 Oct 15 10:32 README.md
   -rw-r--r--   1 root root   165 Oct 15 10:32 RELEASE
   drwxr-xr-x  29 root root   928 Oct 15 10:32 bin
   drwxr-xr-x   8 root root   256 Oct 15 10:32 conf
   drwxr-xr-x   5 root root   160 Oct 15 10:32 data
   drwxr-xr-x   4 root root   128 Oct 15 10:32 examples
   drwxr-xr-x 250 root root  8000 Oct 15 10:32 jars
   drwxr-xr-x   4 root root   128 Oct 15 10:32 kubernetes
   drwxr-xr-x  60 root root  1920 Oct 15 10:32 licenses
   drwxr-xr-x  19 root root   608 Oct 15 10:32 python
   drwxr-xr-x  29 root root   928 Oct 15 10:32 sbin
   drwxr-xr-x   3 root root96 Oct 15 10:32 yarn
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. This is a test-only improvement.
   
   ### How was this patch tested?
   
   Pass the CIs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #39598: [SPARK-41708][SQL][FOLLOWUP] Override `equals` and `hashCode` of `TableIdentifier` to make `CatalogFileIndex` print the same re

2023-01-15 Thread GitBox


LuciferYang commented on code in PR #39598:
URL: https://github.com/apache/spark/pull/39598#discussion_r1070909295


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala:
##
@@ -104,6 +104,13 @@ case class TableIdentifier(table: String, database: 
Option[String], catalog: Opt
 
   def this(table: String) = this(table, None, None)
   def this(table: String, database: Option[String]) = this(table, database, 
None)
+
+  override def equals(obj: Any): Boolean = obj match {

Review Comment:
   @ulysses-you do you mean 
   
   
https://github.com/apache/spark/blob/2062c743df40c81ee2347ead8e9ae8d078f10c57/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala#L173
   
   change to use  `SQLQueryTestHelper.replaceNotIncludedMsg(output)` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a diff in pull request #39598: [SPARK-41708][SQL][FOLLOWUP] Override `equals` and `hashCode` of `TableIdentifier` to make `CatalogFileIndex` print the same re

2023-01-15 Thread GitBox


ulysses-you commented on code in PR #39598:
URL: https://github.com/apache/spark/pull/39598#discussion_r1070905951


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala:
##
@@ -104,6 +104,13 @@ case class TableIdentifier(table: String, database: 
Option[String], catalog: Opt
 
   def this(table: String) = this(table, None, None)
   def this(table: String, database: Option[String]) = this(table, database, 
None)
+
+  override def equals(obj: Any): Boolean = obj match {

Review Comment:
   We can replace `CatalogFileIndex@xxx` to `CatalogFileIndex` in 
`SQLQueryTestHelper#replaceNotIncludedMsg` to recover test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] EnricoMi commented on a diff in pull request #39556: [SPARK-42049][SQL] Improve AliasAwareOutputExpression

2023-01-15 Thread GitBox


EnricoMi commented on code in PR #39556:
URL: https://github.com/apache/spark/pull/39556#discussion_r1070900791


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -435,6 +435,15 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val EXPRESSION_PROJECTION_CANDIDATE_LIMIT =
+buildConf("spark.sql.optimizer.expressionProjectionCandidateLimit")
+  .doc("The maximum number of the candidate of out put expressions whose 
alias are replaced." +
+" It can preserve the output partitioning and ordering." +
+" Negative value means disable this optimization.")

Review Comment:
   ```suggestion
 .doc("The maximum number of candidates for output expressions whose 
aliases are replaced." +
   " This can preserve the output partitioning and ordering." +
   " A negative value means to disable this optimization.")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #39598: [SPARK-41708][SQL][FOLLOWUP] Override `equals` and `hashCode` of `TableIdentifier` to make `CatalogFileIndex` print the same results when

2023-01-15 Thread GitBox


LuciferYang commented on PR #39598:
URL: https://github.com/apache/spark/pull/39598#issuecomment-1383586233

   > Could you make a separate test PR to scrub the hashCode instead of this 
follow-up, @LuciferYang ?
   > 
   > 
https://github.com/apache/spark/blob/2062c743df40c81ee2347ead8e9ae8d078f10c57/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L843-L845
   > 
   > I believe you can do that here in SQLQueryTestSuite.
   > 
   > 
https://github.com/apache/spark/blob/2062c743df40c81ee2347ead8e9ae8d078f10c57/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala#L173
   
   ok, let me try this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] EnricoMi commented on a diff in pull request #37525: [WIP][SPARK-40086][SPARK-42049][SQL] Improve AliasAwareOutputPartitioning and AliasAwareQueryOutputOrdering to take all aliases in

2023-01-15 Thread GitBox


EnricoMi commented on code in PR #37525:
URL: https://github.com/apache/spark/pull/37525#discussion_r1070898729


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -435,6 +435,15 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val EXPRESSION_PROJECTION_CANDIDATE_LIMIT =
+buildConf("spark.sql.optimizer.expressionProjectionCandidateLimit")
+  .doc("The maximum number of the candidate of out put expressions whose 
alias are replaced." +
+" It can preserve the output partitioning and ordering." +
+" Negative value means disable this optimization.")

Review Comment:
   ```suggestion
 .doc("The maximum number of candidates for output expressions whose 
aliases are replaced." +
   " This can preserve the output partitioning and ordering." +
   " Negative value means to disable this optimization.")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #39600: [SPARK-42032][SPARK-41988][CONNECT][PYTHON][TESTS] `MapType` related doctests sort the dicts

2023-01-15 Thread GitBox


zhengruifeng commented on PR #39600:
URL: https://github.com/apache/spark/pull/39600#issuecomment-1383583994

   @HyukjinKwon @beliefer 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng opened a new pull request, #39600: [SPARK-42032][SPARK-41988][CONNECT][PYTHON][TESTS] `MapType` related doctests sort the dicts

2023-01-15 Thread GitBox


zhengruifeng opened a new pull request, #39600:
URL: https://github.com/apache/spark/pull/39600

   ### What changes were proposed in this pull request?
   `MapType` related doctests sort the dicts
   
   ### Why are the changes needed?
   `MapType` in Spark doesn't guarantee the ordering of entries
   
   
   ### Does this PR introduce _any_ user-facing change?
   no, test-only
   
   ### How was this patch tested?
   updated UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #39598: [SPARK-41708][SQL][FOLLOWUP] Override `equals` and `hashCode` of `TableIdentifier` to make `CatalogFileIndex` print the same re

2023-01-15 Thread GitBox


LuciferYang commented on code in PR #39598:
URL: https://github.com/apache/spark/pull/39598#discussion_r1070897426


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala:
##
@@ -104,6 +104,13 @@ case class TableIdentifier(table: String, database: 
Option[String], catalog: Opt
 
   def this(table: String) = this(table, None, None)
   def this(table: String, database: Option[String]) = this(table, database, 
None)
+
+  override def equals(obj: Any): Boolean = obj match {

Review Comment:
   just print `org.apache.spark.sql.execution.datasources.CatalogFileIndex`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39598: [SPARK-41708][SQL][FOLLOWUP] Override `equals` and `hashCode` of `TableIdentifier` to make `CatalogFileIndex` print the same

2023-01-15 Thread GitBox


dongjoon-hyun commented on code in PR #39598:
URL: https://github.com/apache/spark/pull/39598#discussion_r1070896775


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala:
##
@@ -104,6 +104,13 @@ case class TableIdentifier(table: String, database: 
Option[String], catalog: Opt
 
   def this(table: String) = this(table, None, None)
   def this(table: String, database: Option[String]) = this(table, database, 
None)
+
+  override def equals(obj: Any): Boolean = obj match {

Review Comment:
   The better solution would be to normalize the output by removing 
`CatalogFileIndex@xxx` part.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #39598: [SPARK-41708][SQL][FOLLOWUP] Override `equals` and `hashCode` of `TableIdentifier` to make `CatalogFileIndex` print the same re

2023-01-15 Thread GitBox


LuciferYang commented on code in PR #39598:
URL: https://github.com/apache/spark/pull/39598#discussion_r1070893736


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala:
##
@@ -104,6 +104,13 @@ case class TableIdentifier(table: String, database: 
Option[String], catalog: Opt
 
   def this(table: String) = this(table, None, None)
   def this(table: String, database: Option[String]) = this(table, database, 
None)
+
+  override def equals(obj: Any): Boolean = obj match {

Review Comment:
   Currently printing is 
`org.apache.spark.sql.execution.datasources.CatalogFileIndex@hashCode`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #39598: [SPARK-41708][SQL][FOLLOWUP] Override `equals` and `hashCode` of `TableIdentifier` to make `CatalogFileIndex` print the same re

2023-01-15 Thread GitBox


LuciferYang commented on code in PR #39598:
URL: https://github.com/apache/spark/pull/39598#discussion_r1070893144


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala:
##
@@ -104,6 +104,13 @@ case class TableIdentifier(table: String, database: 
Option[String], catalog: Opt
 
   def this(table: String) = this(table, None, None)
   def this(table: String, database: Option[String]) = this(table, database, 
None)
+
+  override def equals(obj: Any): Boolean = obj match {

Review Comment:
   cc @cloud-fan @HyukjinKwon @dongjoon-hyun @ulysses-you 
   
   The daily build GA `Master Scala 2.13 + Hadoop 3 + JDK 8` failed after 
https://github.com/apache/spark/pull/39468 merged.
   
   - https://github.com/apache/spark/actions/runs/3924877161
   - https://github.com/apache/spark/actions/runs/3919886270
   - https://github.com/apache/spark/actions/runs/3914162890
   - https://github.com/apache/spark/actions/runs/3905252989
   - https://github.com/apache/spark/actions/runs/3895875210
   - https://github.com/apache/spark/actions/runs/3895875210
   
   Are there any suggestions for better solutions? For example, override 
`toString` of `CatalogFileIndex`? What is better to print?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39599: [SPARK-42086][SQL][TESTS] Sort test cases in SQLQueryTestSuite

2023-01-15 Thread GitBox


dongjoon-hyun commented on PR #39599:
URL: https://github.com/apache/spark/pull/39599#issuecomment-1383569534

   Thank you, @HyukjinKwon !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #39593: [SPARK-42083][K8S] Make `(Executor|StatefulSet)PodsAllocator` extendable

2023-01-15 Thread GitBox


dongjoon-hyun closed pull request #39593: [SPARK-42083][K8S] Make 
`(Executor|StatefulSet)PodsAllocator` extendable
URL: https://github.com/apache/spark/pull/39593


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39593: [SPARK-42083][K8S] Make `(Executor|StatefulSet)PodsAllocator` extendable

2023-01-15 Thread GitBox


dongjoon-hyun commented on PR #39593:
URL: https://github.com/apache/spark/pull/39593#issuecomment-1383568837

   All K8s tests passed. Merged to master for Apache Spark 3.4.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request, #39599: [SPARK-42086][SQL][TESTS] Sort test cases in SQLQueryTestSuite

2023-01-15 Thread GitBox


dongjoon-hyun opened a new pull request, #39599:
URL: https://github.com/apache/spark/pull/39599

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39492: [SPARK-41876][CONNECT][PYTHON] `test_dataframe` should catch both `AnalysisException` and `SparkConnectAnalysisException`

2023-01-15 Thread GitBox


HyukjinKwon commented on code in PR #39492:
URL: https://github.com/apache/spark/pull/39492#discussion_r1070885000


##
python/pyspark/sql/tests/test_dataframe.py:
##
@@ -42,6 +42,7 @@
 DayTimeIntervalType,
 )
 from pyspark.sql.utils import AnalysisException, IllegalArgumentException
+from pyspark.sql.connect.client import SparkConnectAnalysisException

Review Comment:
   https://github.com/apache/spark/pull/39387 is merged. Could we move the 
Spark Connect errors into there, see fix the test cases here? See also 
https://github.com/apache/spark/pull/39591. cc @itholic 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39450: [SPARK-41897][CONNECT][TESTS] Enable tests with error mismatch in connect/test_parity_functions.py

2023-01-15 Thread GitBox


HyukjinKwon commented on code in PR #39450:
URL: https://github.com/apache/spark/pull/39450#discussion_r1070884942


##
python/pyspark/sql/tests/test_functions.py:
##
@@ -24,6 +24,7 @@
 
 from py4j.protocol import Py4JJavaError
 from pyspark.sql import Row, Window, types
+from pyspark.sql.connect.client import SparkConnectException

Review Comment:
   https://github.com/apache/spark/pull/39387 is merged. Could we move the 
Spark Connect errors into there, see fix the test cases here? See also 
https://github.com/apache/spark/pull/39591. cc @itholic 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39591: [SPARK-42078][PYTHON] Migrate errors thrown by JVM into `PySparkException`.

2023-01-15 Thread GitBox


HyukjinKwon commented on code in PR #39591:
URL: https://github.com/apache/spark/pull/39591#discussion_r1070884205


##
python/pyspark/errors/exceptions.py:
##
@@ -74,3 +79,187 @@ def getMessageParameters(self) -> Optional[Dict[str, str]]:
 
 def __str__(self) -> str:
 return f"[{self.getErrorClass()}] {self.message}"
+
+
+class CapturedException(PySparkException):
+def __init__(
+self,
+desc: Optional[str] = None,
+stackTrace: Optional[str] = None,
+cause: Optional[Py4JJavaError] = None,
+origin: Optional[Py4JJavaError] = None,
+):
+# desc & stackTrace vs origin are mutually exclusive.
+# cause is optional.
+assert (origin is not None and desc is None and stackTrace is None) or 
(
+origin is None and desc is not None and stackTrace is not None
+)
+
+self.desc = desc if desc is not None else cast(Py4JJavaError, 
origin).getMessage()
+assert SparkContext._jvm is not None
+self.stackTrace = (
+stackTrace
+if stackTrace is not None
+else 
(SparkContext._jvm.org.apache.spark.util.Utils.exceptionString(origin))
+)
+self.cause = convert_exception(cause) if cause is not None else None
+if self.cause is None and origin is not None and origin.getCause() is 
not None:
+self.cause = convert_exception(origin.getCause())
+self._origin = origin
+
+def __str__(self) -> str:
+assert SparkContext._jvm is not None
+
+jvm = SparkContext._jvm
+sql_conf = jvm.org.apache.spark.sql.internal.SQLConf.get()
+debug_enabled = sql_conf.pysparkJVMStacktraceEnabled()
+desc = self.desc
+if debug_enabled:
+desc = desc + "\n\nJVM stacktrace:\n%s" % self.stackTrace
+return str(desc)
+
+def getErrorClass(self) -> Optional[str]:
+assert SparkContext._gateway is not None
+
+gw = SparkContext._gateway
+if self._origin is not None and is_instance_of(
+gw, self._origin, "org.apache.spark.SparkThrowable"
+):
+return self._origin.getErrorClass()
+else:
+return None
+
+def getSqlState(self) -> Optional[str]:
+assert SparkContext._gateway is not None
+
+gw = SparkContext._gateway
+if self._origin is not None and is_instance_of(
+gw, self._origin, "org.apache.spark.SparkThrowable"
+):
+return self._origin.getSqlState()
+else:
+return None
+
+
+def convert_exception(e: Py4JJavaError) -> CapturedException:

Review Comment:
   If you're exposing these classes as API, should list up the classes with 
`__all__`, and make it documented.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang opened a new pull request, #39598: [SPARK-41708][SQL][FOLLOWUP] Override `equals` and `hashCode` to make `CatalogFileIndex` print the same results when using Scala 2.12 an

2023-01-15 Thread GitBox


LuciferYang opened a new pull request, #39598:
URL: https://github.com/apache/spark/pull/39598

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39593: [SPARK-42083][K8S] Make `(Executor|StatefulSet)PodsAllocator` extendable

2023-01-15 Thread GitBox


dongjoon-hyun commented on PR #39593:
URL: https://github.com/apache/spark/pull/39593#issuecomment-1383542749

   Thank you, @viirya and @huaxingao !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39597: [SPARK-41990][SQL][FOLLOWUP] Add comments to explain why `FieldReference.column` is used when `ParseException` happens

2023-01-15 Thread GitBox


dongjoon-hyun commented on PR #39597:
URL: https://github.com/apache/spark/pull/39597#issuecomment-1383542350

   +1 from my side. I'll leave this PR to @cloud-fan . Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #39597: [SPARK-41990][SQL][FOLLOWUP] Add comments to explain why `FieldReference.column` is used when `ParseException` happens

2023-01-15 Thread GitBox


huaxingao commented on PR #39597:
URL: https://github.com/apache/spark/pull/39597#issuecomment-1383529292

   cc @cloud-fan @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao opened a new pull request, #39597: [SPARK-41990][SQL][FOLLOWUP] Add comments to explain why `FieldReference.column` is used when `ParseException` happens

2023-01-15 Thread GitBox


huaxingao opened a new pull request, #39597:
URL: https://github.com/apache/spark/pull/39597

   
   
   ### What changes were proposed in this pull request?
   
   Add comments to explain why `FieldReference.column` is used when 
`ParseException` happens
   
   ### Why are the changes needed?
   To address this 
[comment](https://github.com/apache/spark/pull/39564#discussion_r1070788648)
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Only comments are added. No need to test.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #39596: [SPARK-42084][SQL] Avoid leaking the qualified-access-only restriction

2023-01-15 Thread GitBox


cloud-fan commented on PR #39596:
URL: https://github.com/apache/spark/pull/39596#issuecomment-1383528602

   cc @viirya @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan opened a new pull request, #39596: [SPARK-42084][SQL] Avoid leaking the qualified-access-only restriction

2023-01-15 Thread GitBox


cloud-fan opened a new pull request, #39596:
URL: https://github.com/apache/spark/pull/39596

   
   
   ### What changes were proposed in this pull request?
   
   This is a better fix than https://github.com/apache/spark/pull/39077 and 
https://github.com/apache/spark/pull/38862
   
   The special attribute metadata `__qualified_access_only` is very risky, as 
it breaks normal column resolution. The aforementioned 2 PRs remove the 
restriction in `SubqueryAlias` and `Alias`, but it's not good enough as we may 
forget to do the same thing for new logical plans/expressions in the future. 
It's also problematic if advanced users manipulate logical plans and 
expressions directly, when there is no `SubqueryAlias` and `Alias` to remove 
the restriction.
   
   To be safe, we should only apply this restriction when resolving join hidden 
columns, which means the plan node right above `Project(Join(using or natural 
join))`. This PR simply removes the restriction when a column is resolved from 
a sequence of `Attributes`, and also when adding the `Project` hidden columns 
to its output. This makes sure that the qualified-access-only restriction will 
not be leaked to normal column resolution, but only metadata column resolution.
   
   ### Why are the changes needed?
   
   To make the join hidden column feature more robust
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   existing tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xinrong-meng commented on a diff in pull request #39585: [WIP] Unregistered Python UDF in Spark Connect

2023-01-15 Thread GitBox


xinrong-meng commented on code in PR #39585:
URL: https://github.com/apache/spark/pull/39585#discussion_r1070861631


##
connector/connect/common/src/main/protobuf/spark/connect/expressions.proto:
##
@@ -43,6 +43,8 @@ message Expression {
 Window window = 11;
 UnresolvedExtractValue unresolved_extract_value = 12;
 UpdateFields update_fields = 13;
+PythonFunction python_function = 14;

Review Comment:
   Sounds good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xinrong-meng commented on pull request #39585: [WIP] Unregistered Python UDF in Spark Connect

2023-01-15 Thread GitBox


xinrong-meng commented on PR #39585:
URL: https://github.com/apache/spark/pull/39585#issuecomment-1383524848

   Thanks @grundprinzip! I didn't notice your reviews before the force push. I 
was going to make commits history clear by categorizing them into client, 
server (via commit messages) etc before the PR is ready for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a diff in pull request #39564: [SPARK-41990][SQL] Use `FieldReference.column` instead of `apply` in V1 to V2 filter conversion

2023-01-15 Thread GitBox


huaxingao commented on code in PR #39564:
URL: https://github.com/apache/spark/pull/39564#discussion_r1070860837


##
sql/catalyst/src/main/scala/org/apache/spark/sql/sources/filters.scala:
##
@@ -74,6 +75,15 @@ sealed abstract class Filter {
* Converts V1 filter to V2 filter
*/
   private[sql] def toV2: Predicate
+
+  protected def toV2Column(attribute: String): NamedReference = {
+try {
+  FieldReference(attribute)

Review Comment:
   I will have a follow up to add some comments.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a diff in pull request #39564: [SPARK-41990][SQL] Use `FieldReference.column` instead of `apply` in V1 to V2 filter conversion

2023-01-15 Thread GitBox


huaxingao commented on code in PR #39564:
URL: https://github.com/apache/spark/pull/39564#discussion_r1070860368


##
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala:
##
@@ -1963,4 +1977,9 @@ class JDBCSuite extends QueryTest with SharedSparkSession 
{
   }
 }
   }
+
+  test("SPARK-41990: Filter with composite name") {
+val df = sql("SELECT * FROM composite_name WHERE `last name` = 'smith'")

Review Comment:
   Right. Support for nested predicate pushdown is calculated by this 
[method](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala#L108),
 which returns false for JDBC V1.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xinrong-meng commented on a diff in pull request #39585: [WIP] Unregistered Python UDF in Spark Connect

2023-01-15 Thread GitBox


xinrong-meng commented on code in PR #39585:
URL: https://github.com/apache/spark/pull/39585#discussion_r1070860268


##
python/pyspark/sql/connect/udf.py:
##
@@ -0,0 +1,184 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""
+User-defined function related classes and functions
+"""
+import functools
+from typing import Callable, Any, TYPE_CHECKING, Optional
+
+from pyspark import cloudpickle
+from pyspark.sql.connect.expressions import ColumnReference, PythonFunction, 
PythonUDF
+from pyspark.sql.connect.column import Column
+from pyspark.sql.types import DataType
+
+
+if TYPE_CHECKING:
+from pyspark.sql.connect._typing import (
+ColumnOrName,
+DataTypeOrString,
+UserDefinedFunctionLike,
+)
+from pyspark.sql.types import StringType
+from pyspark.sql.session import SparkSession
+
+
+def _wrap_function(

Review Comment:
   Removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39593: [SPARK-42083][K8S] Make `ExecutorPodsAllocator` extendable

2023-01-15 Thread GitBox


dongjoon-hyun commented on PR #39593:
URL: https://github.com/apache/spark/pull/39593#issuecomment-1383521322

   Thank you. Sure, let me open it too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] czxm opened a new pull request, #39595: [SPARK-38230][SQL] InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions in most cases

2023-01-15 Thread GitBox


czxm opened a new pull request, #39595:
URL: https://github.com/apache/spark/pull/39595

   ### What changes were proposed in this pull request?
   This is based on @coalchan 's  previous work of 
https://github.com/apache/spark/pull/35549 . I enhanced it as per @jackylee-ch 
's comment below so that no new parameter is needed.
   https://github.com/apache/spark/pull/35549#issuecomment-1062790013
   
   ### Why are the changes needed?
   1) method listPartitions is order to get locations of partitions and compute 
custom partition locations(variable customPartitionLocations), but 
customPartitionLocations is only used while overwriting static hive partitions
   2) method listPartitions is costly and only used in one case. So we can use 
listPartitionNames to reduce requests on hive metastore db.
   
   ### Does this PR introduce _any_ user-facing change?
   No. 
   
   ### How was this patch tested?
   1) created a nested partitioned table
   2) run the insert/overwrite static/dynamic partition cases such as:
   insert overwrite table p1 partition(p1="1",p2="1.1",p3="1.1.1") select 1
   insert into table p1 partition(p1="1",p2="1.1",p3="1.1.1") select 1
   insert overwrite table p1 partition(p1="1",p2,p3) select 1,"1.1","1.1.1"
   insert into table p1 partition(p1="1",p2,p3) select 1,"1.1","1.1.1"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng opened a new pull request, #39594: [SPARK-42085][CONNECT][PYTHON] Make `from_arrow_schema` support nested types

2023-01-15 Thread GitBox


zhengruifeng opened a new pull request, #39594:
URL: https://github.com/apache/spark/pull/39594

   ### What changes were proposed in this pull request?
   Make `from_arrow_schema` support nested types
   
   
   ### Why are the changes needed?
   do not need to get the schema by `self.schema` due to `from_arrow_schema` 
not supporting some nested types
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   existing UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39593: [SPARK-42083][K8S] Make `ExecutorPodsAllocator` extendable

2023-01-15 Thread GitBox


dongjoon-hyun commented on PR #39593:
URL: https://github.com/apache/spark/pull/39593#issuecomment-1383513453

   Could you review this please, @viirya ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request, #39593: [SPARK-42083][K8S] Make `ExecutorPodsAllocator` extendable

2023-01-15 Thread GitBox


dongjoon-hyun opened a new pull request, #39593:
URL: https://github.com/apache/spark/pull/39593

   ### What changes were proposed in this pull request?
   
   This PR aims to make `ExecutorPodsAllocator` by switching `private` to 
`protected`.
   
   ### Why are the changes needed?
   
   This will help the users to use `ExternalClusterManager` interface in K8s 
environment by reusing more.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Pass the CIs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #39592: [SPARK-42081][SQL] Improve the plan change validation

2023-01-15 Thread GitBox


cloud-fan commented on PR #39592:
URL: https://github.com/apache/spark/pull/39592#issuecomment-1383506642

   cc @dongjinleekr @viirya @gengliangwang @dtenedor 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan opened a new pull request, #39592: [SPARK-42081][SQL] Improve the plan change validation

2023-01-15 Thread GitBox


cloud-fan opened a new pull request, #39592:
URL: https://github.com/apache/spark/pull/39592

   
   
   ### What changes were proposed in this pull request?
   
   Today, we validate the plan change after each rule execution, which is very 
useful to find issues in the rule. However, it has 2 problems:
   1. it can only be used in the tests
   2. it just returns a boolean, making it hard to known what's wrong with the 
rule
   
   This PR addresses these 2 problems:
   1. adds a new config to allow people to turn on it to debug production 
workloads
   2. throws an exception immediately if a violation is found.
   
   ### Why are the changes needed?
   
   To make it easier to debug queries using plan change validation.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, the new config is internal.
   
   ### How was this patch tested?
   
   updated tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #39588: [SPARK-42077][CONNECT][PYTHON] Literal should throw TypeError for unsupported DataType

2023-01-15 Thread GitBox


zhengruifeng commented on PR #39588:
URL: https://github.com/apache/spark/pull/39588#issuecomment-1383504836

   merged into master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng closed pull request #39588: [SPARK-42077][CONNECT][PYTHON] Literal should throw TypeError for unsupported DataType

2023-01-15 Thread GitBox


zhengruifeng closed pull request #39588: [SPARK-42077][CONNECT][PYTHON] Literal 
should throw TypeError for unsupported DataType
URL: https://github.com/apache/spark/pull/39588


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #39589: [MINOR][CORE][TESTS] Suppress unchecked warnings in OneForOneStreamManagerSuite

2023-01-15 Thread GitBox


dongjoon-hyun closed pull request #39589: [MINOR][CORE][TESTS] Suppress 
unchecked warnings in OneForOneStreamManagerSuite
URL: https://github.com/apache/spark/pull/39589


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39589: [MINOR][CORE][TESTS] Suppress unchecked warnings in OneForOneStreamManagerSuite

2023-01-15 Thread GitBox


dongjoon-hyun commented on PR #39589:
URL: https://github.com/apache/spark/pull/39589#issuecomment-1383503641

   Thank you. Building tests passed. Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #39587: [SPARK-42076][CONNECT][PYTHON] Factor data conversion `arrow -> rows` out to `conversion.py`

2023-01-15 Thread GitBox


zhengruifeng commented on PR #39587:
URL: https://github.com/apache/spark/pull/39587#issuecomment-1383502337

   merged into master, thanks @HyukjinKwon for the reviews!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng closed pull request #39587: [SPARK-42076][CONNECT][PYTHON] Factor data conversion `arrow -> rows` out to `conversion.py`

2023-01-15 Thread GitBox


zhengruifeng closed pull request #39587: [SPARK-42076][CONNECT][PYTHON] Factor 
data conversion `arrow -> rows` out to `conversion.py`
URL: https://github.com/apache/spark/pull/39587


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic opened a new pull request, #39591: [SPARK-42078][PYTHON] Migrate errors thrown by JVM into `PySparkException`.

2023-01-15 Thread GitBox


itholic opened a new pull request, #39591:
URL: https://github.com/apache/spark/pull/39591

   ### What changes were proposed in this pull request?
   
   This PR proposes to migrate errors thrown by JVM into `PySparkException`.
   
   
   ### Why are the changes needed?
   
   Because the errors thrown by JVM side (e.g. `AnalysisException`, 
`ParseException `, `IllegalArgumentException ` ... ) are managed by 
`pyspark/sql/utils.py` which is not very correct place to handle the PySpark 
errors so far.
   
   Now we have separate error package `pyspark.errors` which is introduced from 
https://github.com/apache/spark/pull/39387, we should centralize whole errors 
generated by PySpark side into single management point.
   
   So, here I suggest that all errors occurring in the JVM inherit 
PySparkException so that errors can be managed consistently.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, this PR suggests to improve an internal error handling structure without 
any user-facing change.
   
   
   ### How was this patch tested?
   
   
   The existing CI should pass.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng opened a new pull request, #39590: [SPARK-42079][CONNECT][PYTHON] Rename proto messages for `toDF` and `withColumnsRenamed`

2023-01-15 Thread GitBox


zhengruifeng opened a new pull request, #39590:
URL: https://github.com/apache/spark/pull/39590

   ### What changes were proposed in this pull request?
   Rename proto messages for `toDF` and `withColumnsRenamed`
   
   
   ### Why are the changes needed?
   to be consistent with other messages: we usually name the proto messages 
after the corresponding logical plan or dataframe API.
   
   
   ### Does this PR introduce _any_ user-facing change?
   no, dev-only
   
   
   ### How was this patch tested?
   existing UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #39558: [SPARK-41982][SQL] Partitions of type string should not be treated as numeric types

2023-01-15 Thread GitBox


cloud-fan commented on code in PR #39558:
URL: https://github.com/apache/spark/pull/39558#discussion_r1070835545


##
sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala:
##
@@ -351,6 +351,36 @@ trait SQLInsertTestSuite extends QueryTest with 
SQLTestUtils {
   }
 }
   }
+
+  test("SPARK-41982: When the partition is not enclosed with quotation marks, 
" +
+"the partition type is string and keepPartitionSpecAsStringLiteral is 
enabled, " +
+"we need to treat it as a string type partition") {

Review Comment:
   can we shorten the test name?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #39558: [SPARK-41982][SQL] Partitions of type string should not be treated as numeric types

2023-01-15 Thread GitBox


cloud-fan commented on code in PR #39558:
URL: https://github.com/apache/spark/pull/39558#discussion_r1070835291


##
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala:
##
@@ -363,7 +363,8 @@ class SparkSqlAstBuilder extends AstBuilder {
* Convert a constants list into a String sequence.
*/
   override def visitConstantList(ctx: ConstantListContext): Seq[String] = 
withOrigin(ctx) {
-ctx.constant.asScala.map(v => visitStringConstant(v, legacyNullAsString = 
false)).toSeq
+ctx.constant.asScala.map(v => visitStringConstant(v,

Review Comment:
   can we add default parameter value for `visitStringConstant`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #39558: [SPARK-41982][SQL] Partitions of type string should not be treated as numeric types

2023-01-15 Thread GitBox


cloud-fan commented on code in PR #39558:
URL: https://github.com/apache/spark/pull/39558#discussion_r1070835079


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -4059,6 +4059,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val LEGACY_KEEP_PARTITION_SPEC_AS_STRING_LITERAL =

Review Comment:
   Can we put it near LEGACY_PARSE_NULL_PARTITION_SPEC_AS_STRING_LITERAL and 
follow its cofig doc as well?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request, #39589: [MINOR][CORE][TESTS] Suppress unchecked warnings in OneForOneStreamManagerSuite

2023-01-15 Thread GitBox


dongjoon-hyun opened a new pull request, #39589:
URL: https://github.com/apache/spark/pull/39589

   ### What changes were proposed in this pull request?
   
   This PR aims to suppress the two test code compilation warnings in 
`network-common` module.
   ```
   [warn] 
/Users/dongjoon/APACHE/spark-merge/common/network-common/src/test/java/org/apache/spark/network/server/OneForOneStreamManagerSuite.java:105:1:
  [unchecked] unchecked conversion
   [warn] Iterator buffers = Mockito.mock(Iterator.class);
   [warn]   ^  required: 
Iterator
   [warn]   found:Iterator
   [warn] 
/Users/dongjoon/APACHE/spark-merge/common/network-common/src/test/java/org/apache/spark/network/server/OneForOneStreamManagerSuite.java:111:1:
  [unchecked] unchecked conversion
   [warn] Iterator buffers2 = Mockito.mock(Iterator.class);
   [warn]^  required: 
Iterator
   [warn]   found:Iterator
   ```
   
   ### Why are the changes needed?
   
   To suppress warnings in `common` directory.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Manually check the output with the following.
   ```
   $ build/sbt "network-common/test"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #39558: [SPARK-41982][SQL] Partitions of type string should not be treated as numeric types

2023-01-15 Thread GitBox


cloud-fan commented on code in PR #39558:
URL: https://github.com/apache/spark/pull/39558#discussion_r1070834762


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##
@@ -507,14 +507,19 @@ class AstBuilder extends SqlBaseParserBaseVisitor[AnyRef] 
with SQLConfHelper wit
   ctx: PartitionSpecContext): Map[String, Option[String]] = 
withOrigin(ctx) {
 val legacyNullAsString =
   conf.getConf(SQLConf.LEGACY_PARSE_NULL_PARTITION_SPEC_AS_STRING_LITERAL)
+val keepPartitionTypeAsString =

Review Comment:
   ```suggestion
   val keepPartitionSpecAsString =
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] grundprinzip commented on a diff in pull request #39585: [WIP] Unregistered Python UDF in Spark Connect

2023-01-15 Thread GitBox


grundprinzip commented on code in PR #39585:
URL: https://github.com/apache/spark/pull/39585#discussion_r1070822033


##
python/pyspark/sql/connect/udf.py:
##
@@ -0,0 +1,184 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""
+User-defined function related classes and functions
+"""
+import functools
+from typing import Callable, Any, TYPE_CHECKING, Optional
+
+from pyspark import cloudpickle
+from pyspark.sql.connect.expressions import ColumnReference, PythonFunction, 
PythonUDF
+from pyspark.sql.connect.column import Column
+from pyspark.sql.types import DataType
+
+
+if TYPE_CHECKING:
+from pyspark.sql.connect._typing import (
+ColumnOrName,
+DataTypeOrString,
+UserDefinedFunctionLike,
+)
+from pyspark.sql.types import StringType
+from pyspark.sql.session import SparkSession
+
+
+def _wrap_function(

Review Comment:
   is this still called? Sincere the client does not have a JVM, it should not. 



##
connector/connect/common/src/main/protobuf/spark/connect/expressions.proto:
##
@@ -216,6 +218,19 @@ message Expression {
 bool is_user_defined_function = 4;
   }
 
+  message PythonUDF {

Review Comment:
   I'm wondering if it makes sense to generalize this message to something like 
"session UDF" in which there is a oneof for the different languages. 
   
   This will avoid logical duplication for the Scala UDF work (and other 
languages). 
   
   @hvanhovell  what's your take?



##
connector/connect/common/src/main/protobuf/spark/connect/expressions.proto:
##
@@ -43,6 +43,8 @@ message Expression {
 Window window = 11;
 UnresolvedExtractValue unresolved_extract_value = 12;
 UpdateFields update_fields = 13;
+PythonFunction python_function = 14;

Review Comment:
   It looks like this message is never used outside the Python UDF so it should 
not be listed here. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tedyu closed pull request #39586: [MINOR] Drop saslTimeoutSeen

2023-01-15 Thread GitBox


tedyu closed pull request #39586: [MINOR] Drop saslTimeoutSeen
URL: https://github.com/apache/spark/pull/39586


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] databricks-david-lewis commented on pull request #39488: [SPARK-41970] Introduce SparkPath for typesafety

2023-01-15 Thread GitBox


databricks-david-lewis commented on PR #39488:
URL: https://github.com/apache/spark/pull/39488#issuecomment-1383440531

   Thanks for the comment @JoshRosen! You put it very well.
   
   One of the other places I wanted to use `SparkPath` was in the streaming 
interfaces (`FileStreamSink`, `FileStreamSource`, `SinkFileStatus`, and maybe 
more).
   
   I also wanted to standardize on the currently expected behavior of users 
providing hadoop-path-strings in SQL statements (table locations, SQLOnFiles, 
DataFrame-Reader/-Writer, etc). Right now they all (I think) end up in a new 
Path(_), but could benefit from type-safety.
   
   @mridulm , this PR is ready for review I think. Please take a look. I think 
the sooner we can get it into a Spark minor version the better. Even if that 
means we are incrementally changing the internal interfaces in subsequent minor 
versions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #37525: [WIP][SPARK-40086][SPARK-42049][SQL] Improve AliasAwareOutputPartitioning and AliasAwareQueryOutputOrdering to take all aliases i

2023-01-15 Thread GitBox


cloud-fan commented on code in PR #37525:
URL: https://github.com/apache/spark/pull/37525#discussion_r1070800149


##
sql/core/src/main/scala/org/apache/spark/sql/execution/AliasAwareOutputExpression.scala:
##
@@ -16,24 +16,53 @@
  */
 package org.apache.spark.sql.execution
 
-import org.apache.spark.sql.catalyst.expressions.{Alias, Expression, 
NamedExpression, SortOrder}
-import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, 
Partitioning, PartitioningCollection, UnknownPartitioning}
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, 
Expression, NamedExpression, SortOrder}
+import org.apache.spark.sql.catalyst.plans.physical.{Partitioning, 
PartitioningCollection, UnknownPartitioning}
+import org.apache.spark.sql.internal.SQLConf
 
 /**
  * A trait that provides functionality to handle aliases in the 
`outputExpressions`.
  */
 trait AliasAwareOutputExpression extends UnaryExecNode {
   protected def outputExpressions: Seq[NamedExpression]
 
-  private lazy val aliasMap = outputExpressions.collect {
-case a @ Alias(child, _) => child.canonicalized -> a.toAttribute
-  }.toMap
+  private lazy val aliasMap = {
+val aliases = mutable.Map[Expression, mutable.ListBuffer[Attribute]]()
+// Add aliases to the map. If multiple alias is defined for a source 
attribute then add all.
+outputExpressions.foreach {
+  case a @ Alias(child, _) =>
+// This prepend is needed to make the first element of the 
`ListBuffer` point to the last
+// occurrence of an aliased child. This is to keep the previous 
behavior:
+// - when we return `Partitioning`s in `outputPartitioning()`, the 
first should be same as
+//   it was previously
+// - when we return a `SortOrder` in `outputOrdering()`, it should be 
should be same as
+//   previously
+a.toAttribute +=: aliases.getOrElseUpdate(child.canonicalized, 
mutable.ListBuffer.empty)
+  case _ =>
+}
+// Append identity mapping of an attribute to the map if both the 
attribute and its aliased
+// version can be found in `outputExpressions`.
+outputExpressions.foreach {
+  case a: Attribute if aliases.contains(a.canonicalized) => 
aliases(a.canonicalized) += a
+  case _ =>
+}
+aliases
+  }
 
   protected def hasAlias: Boolean = aliasMap.nonEmpty
 
-  protected def normalizeExpression(exp: Expression): Expression = {
+  // This function returns a limited number of alternatives to mapped `exp`.
+  protected def normalizeExpression(exp: Expression): Seq[Expression] = {
+exp.multiTransform {
+  case e: Expression if aliasMap.contains(e.canonicalized) => 
aliasMap(e.canonicalized).toSeq

Review Comment:
   Yea the key is that `multiTransform` allows you to return the original expr 
so that it can be expanded again with other mappings.
   ```
   input.multiTransform {
 case e if exprAliasMap.contains(e) => e +: exprAliasMap(e)
 case e if attrAliasMap(e) => attrAliasMap(e)
   }
   ```
   @peter-toth do you agree that it's better to let the caller decide if the 
original expr should be included or not? It seems currently the framework 
always include the original expr.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] databricks-david-lewis commented on a diff in pull request #39488: [SPARK-41970] Introduce SparkPath for typesafety

2023-01-15 Thread GitBox


databricks-david-lewis commented on code in PR #39488:
URL: https://github.com/apache/spark/pull/39488#discussion_r1070796985


##
core/src/main/scala/org/apache/spark/paths/SparkPath.scala:
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.paths
+
+import java.net.URI
+
+import org.apache.hadoop.fs.{FileStatus, Path}
+
+/**
+ * A SparkPath is the result of a URI.toString call.
+ */
+case class SparkPath private (private val underlying: String) {
+  def uriEncoded: String = underlying
+  def toUri: URI = new URI(underlying)
+  def toPath: Path = new Path(toUri)
+  override def toString: String = underlying
+}
+
+object SparkPath {
+  def fromPathString(str: String): SparkPath = fromPath(new Path(str))

Review Comment:
   I have revamped the comments in this class, please let me know what you 
think.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on pull request #39571: [SPARK-42064][SQL] Implement bloom filter join hint

2023-01-15 Thread GitBox


ulysses-you commented on PR #39571:
URL: https://github.com/apache/spark/pull/39571#issuecomment-1383430507

   I mean something like https://github.com/apache/spark/pull/32355, e.g. we 
can not prune left for left outer join so it should show logs if the 
bloom_filter_join point to left. There are a lots of limits to prevent 
injecting runtime filter even we have bloom_filter_hint.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #35969: [SPARK-38651][SQL] Add `spark.sql.legacy.allowEmptySchemaWrite`

2023-01-15 Thread GitBox


cloud-fan commented on PR #35969:
URL: https://github.com/apache/spark/pull/35969#issuecomment-1383428417

   sorry for the late review. If this is a valid use case, shall we just allow 
it in certain file formats like ORC? We can pass the entire query schema to 
`FileFormat.supportsDataType` and only fail empty `StructType` if we are sure 
the format doesn't support (parquet is one of them).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] databricks-david-lewis commented on a diff in pull request #39488: [SPARK-41970] Introduce SparkPath for typesafety

2023-01-15 Thread GitBox


databricks-david-lewis commented on code in PR #39488:
URL: https://github.com/apache/spark/pull/39488#discussion_r1070791606


##
common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java:
##
@@ -383,6 +383,7 @@ public static File createTempDir() throws IOException {
   public static File createTempDir(String root, String namePrefix) throws 
IOException {
 if (root == null) root = System.getProperty("java.io.tmpdir");
 if (namePrefix == null) namePrefix = "spark";
+namePrefix = namePrefix + " with a space";

Review Comment:
   In my brief search I found that they all came to this method in the end 
(withTempDir, etc). I'm actually undoing this change for now, because it causes 
too many failures and I don't have the bandwidth to fix the tests right now.
   Here is a PR in my branch where I made only this change: 
https://github.com/databricks-david-lewis/spark-opensource/pull/2.
   I am hoping to fix the problems with spark's tests, and then propose that as 
a PR.



##
common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java:
##
@@ -383,6 +383,7 @@ public static File createTempDir() throws IOException {
   public static File createTempDir(String root, String namePrefix) throws 
IOException {
 if (root == null) root = System.getProperty("java.io.tmpdir");
 if (namePrefix == null) namePrefix = "spark";
+namePrefix = namePrefix + " with a space";

Review Comment:
   In my brief search I found that they all came to this method in the end 
(withTempDir, etc). I'm actually undoing this change for now, because it causes 
too many failures and I don't have the bandwidth to fix the tests right now.
   Here is a PR in my branch where I made only this change: 
https://github.com/databricks-david-lewis/spark-opensource/pull/2.
   I am hoping to fix the problems with spark's tests, and then propose that as 
a PR, but I don't know when I will be able to.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng opened a new pull request, #39588: [SPARK-42077][CONNECT][PYTHON] Literal should throw TypeError for unsupported DataType

2023-01-15 Thread GitBox


zhengruifeng opened a new pull request, #39588:
URL: https://github.com/apache/spark/pull/39588

   ### What changes were proposed in this pull request?
   Make `Literal` throw TypeError for unsupported DataType
   
   
   ### Why are the changes needed?
   Literal should throw TypeError for unsupported DataType
   
   
   ### Does this PR introduce _any_ user-facing change?
   yes
   
   ### How was this patch tested?
   updated UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] databricks-david-lewis commented on a diff in pull request #39488: [SPARK-41970] Introduce SparkPath for typesafety

2023-01-15 Thread GitBox


databricks-david-lewis commented on code in PR #39488:
URL: https://github.com/apache/spark/pull/39488#discussion_r1070790755


##
core/src/main/scala/org/apache/spark/paths/SparkPath.scala:
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.paths
+
+import java.net.URI
+
+import org.apache.hadoop.fs.{FileStatus, Path}
+
+/**
+ * A SparkPath is the result of a URI.toString call.
+ */
+case class SparkPath private (private val underlying: String) {
+  def uriEncoded: String = underlying

Review Comment:
   I don't feel strongly. I went with "uri" because it was from a 
`java.net.URI`. I will change it to `urlEncoded`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #39131: [SPARK-41162][SQL] Fix anti- and semi-join for self-join with aggregations

2023-01-15 Thread GitBox


cloud-fan commented on code in PR #39131:
URL: https://github.com/apache/spark/pull/39131#discussion_r1070790235


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushDownLeftSemiAntiJoin.scala:
##
@@ -61,9 +61,10 @@ object PushDownLeftSemiAntiJoin extends Rule[LogicalPlan]
   }
 
 // LeftSemi/LeftAnti over Aggregate, only push down if join can be planned 
as broadcast join.
-case join @ Join(agg: Aggregate, rightOp, LeftSemiOrAnti(_), _, _)
+case join @ Join(agg: Aggregate, rightOp, LeftSemiOrAnti(_), joinCond, _)
 if agg.aggregateExpressions.forall(_.deterministic) && 
agg.groupingExpressions.nonEmpty &&
   
!agg.aggregateExpressions.exists(ScalarSubquery.hasCorrelatedScalarSubquery) &&
+  canPushThroughCondition(agg.children, joinCond, rightOp) &&

Review Comment:
   This is probably an existing issue. If the `Aggregate` does `a as x, b as 
y`, then the join condition won't reference `a` and `b` and this check is 
useless.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng opened a new pull request, #39587: [SPARK-42076][CONNECT][PYTHON] Factor data conversion `arrow -> rows` out to `conversion.py`

2023-01-15 Thread GitBox


zhengruifeng opened a new pull request, #39587:
URL: https://github.com/apache/spark/pull/39587

   ### What changes were proposed in this pull request?
   Factor data conversion `arrow -> rows` out to `conversion.py`
   
   
   ### Why are the changes needed?
   to make it more clearer, and also more consistent with PySpark
   
   
   ### Does this PR introduce _any_ user-facing change?
   no, refactor-only
   
   
   ### How was this patch tested?
   existing UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #39564: [SPARK-41990][SQL] Use `FieldReference.column` instead of `apply` in V1 to V2 filter conversion

2023-01-15 Thread GitBox


cloud-fan commented on code in PR #39564:
URL: https://github.com/apache/spark/pull/39564#discussion_r107072


##
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala:
##
@@ -1963,4 +1977,9 @@ class JDBCSuite extends QueryTest with SharedSparkSession 
{
   }
 }
   }
+
+  test("SPARK-41990: Filter with composite name") {
+val df = sql("SELECT * FROM composite_name WHERE `last name` = 'smith'")

Review Comment:
   is it because we always disable nested predicate pushdown for JDBC v1?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #39564: [SPARK-41990][SQL] Use `FieldReference.column` instead of `apply` in V1 to V2 filter conversion

2023-01-15 Thread GitBox


cloud-fan commented on code in PR #39564:
URL: https://github.com/apache/spark/pull/39564#discussion_r1070788648


##
sql/catalyst/src/main/scala/org/apache/spark/sql/sources/filters.scala:
##
@@ -74,6 +75,15 @@ sealed abstract class Filter {
* Converts V1 filter to V2 filter
*/
   private[sql] def toV2: Predicate
+
+  protected def toV2Column(attribute: String): NamedReference = {
+try {
+  FieldReference(attribute)

Review Comment:
   We should add some comments to explain it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on pull request #39571: [SPARK-42064][SQL] Implement bloom filter join hint

2023-01-15 Thread GitBox


wangyum commented on PR #39571:
URL: https://github.com/apache/spark/pull/39571#issuecomment-1383418899

   > Make sense to me. Shall we show some warn logs if the hint can not work ? 
We do the similar things for other join hints.
   
   Please see `ResolveJoinStrategyHints`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a diff in pull request #39555: [SPARK-42051][SQL] Codegen Support for HiveGenericUDF

2023-01-15 Thread GitBox


yaooqinn commented on code in PR #39555:
URL: https://github.com/apache/spark/pull/39555#discussion_r1070776840


##
sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala:
##
@@ -128,11 +129,10 @@ private[hive] class DeferredObjectAdapter(oi: 
ObjectInspector, dataType: DataTyp
   override def get(): AnyRef = wrapper(func()).asInstanceOf[AnyRef]
 }
 
-private[hive] case class HiveGenericUDF(
+case class HiveGenericUDF(

Review Comment:
   Seems quite complicated to handle the hive value mapping and function 
wrapping



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #39584: [SPARK-42074][SQL] Enable `KryoSerializer` in `TPCDSQueryBenchmark` to enforce SQL class registration

2023-01-15 Thread GitBox


LuciferYang commented on PR #39584:
URL: https://github.com/apache/spark/pull/39584#issuecomment-1383392745

   Late LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #38376: [SPARK-40817][K8S] `spark.files` should preserve remote files

2023-01-15 Thread GitBox


HyukjinKwon commented on PR #38376:
URL: https://github.com/apache/spark/pull/38376#issuecomment-1383390931

   > @antonipp Is this related to https://github.com/apache/spark/pull/39488 ?
   
   I don't think this is related. Maybe `isLocalDependency` itself is related 
because it assumes that `null` in scheme is a local file (which is wrong I 
guess? because it assumes the default URL scheme is a file) but this PR itself 
isn't related to that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38376: [SPARK-40817][K8S] `spark.files` should preserve remote files

2023-01-15 Thread GitBox


HyukjinKwon commented on code in PR #38376:
URL: https://github.com/apache/spark/pull/38376#discussion_r1070769064


##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala:
##
@@ -168,27 +168,27 @@ private[spark] class BasicDriverFeatureStep(conf: 
KubernetesDriverConf)
   MEMORY_OVERHEAD_FACTOR.key -> defaultOverheadFactor.toString)
 // try upload local, resolvable files to a hadoop compatible file system
 Seq(JARS, FILES, ARCHIVES, SUBMIT_PYTHON_FILES).foreach { key =>
-  val uris = conf.get(key).filter(uri => 
KubernetesUtils.isLocalAndResolvable(uri))
+  val (localUris, remoteUris) =
+conf.get(key).partition(uri => 
KubernetesUtils.isLocalAndResolvable(uri))
   val value = {
 if (key == ARCHIVES) {

Review Comment:
   In case of archives, it should also resolve the URL properly for 
`remoteUris` too in order to remove fragments (e.g., 
`file:///tmp/tmp4542734800151332666.txt.tar.gz#test_tar_gz`). Otherwise, it 
would fail to upload the file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on pull request #39571: [SPARK-42064][SQL] Implement bloom filter join hint

2023-01-15 Thread GitBox


ulysses-you commented on PR #39571:
URL: https://github.com/apache/spark/pull/39571#issuecomment-1383386714

   Make sense to me. Shall we show some warn logs if the hint can not work ? We 
do the similar things for other join hints.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tedyu opened a new pull request, #39586: [MINOR] Drop saslTimeoutSeen

2023-01-15 Thread GitBox


tedyu opened a new pull request, #39586:
URL: https://github.com/apache/spark/pull/39586

   ### What changes were proposed in this pull request?
   This PR proposes to drop `saslTimeoutSeen` field from 
RetryingBlockTransferor.
   
   ### Why are the changes needed?
   This field doesn't affect the return value for `shouldRetry`.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Existing test suite.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a diff in pull request #37525: [WIP][SPARK-40086][SPARK-42049][SQL] Improve AliasAwareOutputPartitioning and AliasAwareQueryOutputOrdering to take all aliases

2023-01-15 Thread GitBox


ulysses-you commented on code in PR #37525:
URL: https://github.com/apache/spark/pull/37525#discussion_r1070753260


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/AliasAwareOutputExpression.scala:
##
@@ -70,53 +66,16 @@ trait AliasAwareOutputExpression extends SQLConfHelper {
   protected def normalizeExpression(
   expr: Expression,
   pruneFunc: (Expression, AttributeSet) => Option[Expression]): 
Seq[Expression] = {
-val normalizedCandidates = new mutable.HashSet[Expression]()
-normalizedCandidates.add(expr)
 val outputSet = AttributeSet(outputExpressions.map(_.toAttribute))
-
-def pruneCandidate(candidate: Expression): Option[Expression] = {
+expr.multiTransform {

Review Comment:
   According to my usage. The `multiTransform` should at least support 4 cases 
of pruning:
   - the max limit size of returned result
   - prune the result whose references is not subset of output
   - prune intermediate result if the alias map does not contain any other 
sub-expression
   - prune sub-expression, e.g. `PartitionCollection(a, b)` -> 
`PartitionCollection(a)` if b is not subset of output
   
   If all this requirements can be matched, I think it's good to switch to 
multi-transofrm.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a diff in pull request #37525: [WIP][SPARK-40086][SPARK-42049][SQL] Improve AliasAwareOutputPartitioning and AliasAwareQueryOutputOrdering to take all aliases

2023-01-15 Thread GitBox


ulysses-you commented on code in PR #37525:
URL: https://github.com/apache/spark/pull/37525#discussion_r1070753260


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/AliasAwareOutputExpression.scala:
##
@@ -70,53 +66,16 @@ trait AliasAwareOutputExpression extends SQLConfHelper {
   protected def normalizeExpression(
   expr: Expression,
   pruneFunc: (Expression, AttributeSet) => Option[Expression]): 
Seq[Expression] = {
-val normalizedCandidates = new mutable.HashSet[Expression]()
-normalizedCandidates.add(expr)
 val outputSet = AttributeSet(outputExpressions.map(_.toAttribute))
-
-def pruneCandidate(candidate: Expression): Option[Expression] = {
+expr.multiTransform {

Review Comment:
   According to my usage. The `multiTransform` should at least support 3 cases 
of pruning:
   1. the max limit size of returned result
   2. eagerly pruning func
   - prune the result whose references is not subset of output
   - prune intermediate result if the alias map does not contain any other 
sub-expression
   - prune sub-expression, e.g. `PartitionCollection(a, b)` -> 
`PartitionCollection(a)` if b is not subset of output
   
   If all this requirements can be matched, I think it's good to switch to 
multi-transofrm.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xinrong-meng opened a new pull request, #39585: [WIP] Unregistered Python UDF in Spark Connect

2023-01-15 Thread GitBox


xinrong-meng opened a new pull request, #39585:
URL: https://github.com/apache/spark/pull/39585

   ### What changes were proposed in this pull request?
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #39387: [SPARK-41586][PYTHON] Introduce `pyspark.errors` and error classes for PySpark.

2023-01-15 Thread GitBox


HyukjinKwon closed pull request #39387: [SPARK-41586][PYTHON] Introduce 
`pyspark.errors` and error classes for PySpark.
URL: https://github.com/apache/spark/pull/39387


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #39387: [SPARK-41586][PYTHON] Introduce `pyspark.errors` and error classes for PySpark.

2023-01-15 Thread GitBox


HyukjinKwon commented on PR #39387:
URL: https://github.com/apache/spark/pull/39387#issuecomment-1383328922

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2023-01-15 Thread GitBox


beliefer commented on PR #39091:
URL: https://github.com/apache/spark/pull/39091#issuecomment-1383316479

   ping @hvanhovell 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #39572: [SPARK-39979][SQL] Add option to use large variable width vectors for arrow UDF operations

2023-01-15 Thread GitBox


HyukjinKwon commented on PR #39572:
URL: https://github.com/apache/spark/pull/39572#issuecomment-1383312668

   cc @BryanCutler @viirya @ueshin FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #39570: [SPARK-41903][CONNECT][PYTHON] `Literal` should support 1-dim ndarray

2023-01-15 Thread GitBox


HyukjinKwon closed pull request #39570: [SPARK-41903][CONNECT][PYTHON] 
`Literal` should support 1-dim ndarray
URL: https://github.com/apache/spark/pull/39570


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #39570: [SPARK-41903][CONNECT][PYTHON] `Literal` should support 1-dim ndarray

2023-01-15 Thread GitBox


HyukjinKwon commented on PR #39570:
URL: https://github.com/apache/spark/pull/39570#issuecomment-1383310216

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #39583: [SPARK-42073][CONNECT][PYTHON][TESTS] Enable tests in common/test_parity_serde, common/test_parity_types

2023-01-15 Thread GitBox


HyukjinKwon closed pull request #39583: [SPARK-42073][CONNECT][PYTHON][TESTS] 
Enable tests in common/test_parity_serde, common/test_parity_types
URL: https://github.com/apache/spark/pull/39583


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on pull request #39571: [SPARK-42064][SQL] Implement bloom filter join hint

2023-01-15 Thread GitBox


wangyum commented on PR #39571:
URL: https://github.com/apache/spark/pull/39571#issuecomment-1383309269

@sigmod @cloud-fan


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #39583: [SPARK-42073][CONNECT][PYTHON][TESTS] Enable tests in common/test_parity_serde, common/test_parity_types

2023-01-15 Thread GitBox


HyukjinKwon commented on PR #39583:
URL: https://github.com/apache/spark/pull/39583#issuecomment-1383309267

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #39580: [MINOR] [FOLLOW-UP] Throw Timeout exception when SASL Request Retry is enabled

2023-01-15 Thread GitBox


mridulm commented on PR #39580:
URL: https://github.com/apache/spark/pull/39580#issuecomment-1383309053

   Thanks for the interest @tedyu  !
   Please do feel free to ping me in case there are other improvements or bug 
fixes you contribute 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #39582: [SPARK-41921][CONNECT][TESTS][FOLLOW-UP] Enable doctests in connect/column and connect/readwriter

2023-01-15 Thread GitBox


HyukjinKwon closed pull request #39582: 
[SPARK-41921][CONNECT][TESTS][FOLLOW-UP] Enable doctests in connect/column and 
connect/readwriter
URL: https://github.com/apache/spark/pull/39582


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #39582: [SPARK-41921][CONNECT][TESTS][FOLLOW-UP] Enable doctests in connect/column and connect/readwriter

2023-01-15 Thread GitBox


HyukjinKwon commented on PR #39582:
URL: https://github.com/apache/spark/pull/39582#issuecomment-1383303216

   Actually https://github.com/apache/spark/pull/39581 had the same change. I 
merged that instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #39582: [SPARK-41921][CONNECT][TESTS][FOLLOW-UP] Enable doctests in connect/column and connect/readwriter

2023-01-15 Thread GitBox


HyukjinKwon commented on PR #39582:
URL: https://github.com/apache/spark/pull/39582#issuecomment-1383301651

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #39581: [SPARK-42011][SPARK-42012][CONNECT][PYTHON][TESTS][FOLLOW-UP] Enable csv, orc tests in connect/test_parity_datasources.py

2023-01-15 Thread GitBox


HyukjinKwon closed pull request #39581: 
[SPARK-42011][SPARK-42012][CONNECT][PYTHON][TESTS][FOLLOW-UP] Enable csv, orc 
tests in connect/test_parity_datasources.py
URL: https://github.com/apache/spark/pull/39581


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #39581: [SPARK-42011][SPARK-42012][CONNECT][PYTHON][TESTS][FOLLOW-UP] Enable csv, orc tests in connect/test_parity_datasources.py

2023-01-15 Thread GitBox


HyukjinKwon commented on PR #39581:
URL: https://github.com/apache/spark/pull/39581#issuecomment-1383301343

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #37822: [SPARK-40381][DEPLOY] Support standalone worker recommission

2023-01-15 Thread GitBox


github-actions[bot] closed pull request #37822: [SPARK-40381][DEPLOY] Support 
standalone worker recommission
URL: https://github.com/apache/spark/pull/37822


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #38073: [SPARK-40632][SQL] Do not inject runtime filter if join condition reference non simple expression

2023-01-15 Thread GitBox


github-actions[bot] closed pull request #38073: [SPARK-40632][SQL] Do not 
inject runtime filter if join condition reference non simple expression
URL: https://github.com/apache/spark/pull/38073


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #37809: [SPARK-40355][SQL] Improve pushdown for orc & parquet when cast scenario

2023-01-15 Thread GitBox


github-actions[bot] closed pull request #37809: [SPARK-40355][SQL] Improve 
pushdown for orc & parquet when cast scenario
URL: https://github.com/apache/spark/pull/37809


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #38092: [SPARK-40647][CORE] DAGScheduler should fail job until all related running tasks have been killed

2023-01-15 Thread GitBox


github-actions[bot] commented on PR #38092:
URL: https://github.com/apache/spark/pull/38092#issuecomment-1383298058

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #37996: [SPARK-40558][SQL] Add Reusable Exchange in Bloom creation side plan

2023-01-15 Thread GitBox


github-actions[bot] closed pull request #37996: [SPARK-40558][SQL] Add Reusable 
Exchange in Bloom creation side plan
URL: https://github.com/apache/spark/pull/37996


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39584: [SPARK-42074][SQL] Enable `KryoSerializer` in `TPCDSQueryBenchmark` to enforce SQL class registration

2023-01-15 Thread GitBox


dongjoon-hyun commented on PR #39584:
URL: https://github.com/apache/spark/pull/39584#issuecomment-1383296513

   Merged to master for Apache Spark 3.4.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >