[GitHub] [spark] grundprinzip commented on a diff in pull request #39016: [SPARK-41477][CONNECT][PYTHON] Correctly infer the datatype of literal integers

2022-12-09 Thread GitBox


grundprinzip commented on code in PR #39016:
URL: https://github.com/apache/spark/pull/39016#discussion_r1045017515


##
python/pyspark/sql/connect/column.py:
##
@@ -180,7 +180,12 @@ def to_plan(self, session: "SparkConnectClient") -> 
"proto.Expression":
 elif isinstance(self._value, bool):
 expr.literal.boolean = bool(self._value)
 elif isinstance(self._value, int):
-expr.literal.long = int(self._value)
+if -2147483648 <= self._value <= 2147483647:

Review Comment:
   can we extract these magic values out into "constants"? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #38972: [SPARK-41443][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1061

2022-12-09 Thread GitBox


MaxGekk commented on PR #38972:
URL: https://github.com/apache/spark/pull/38972#issuecomment-1345173649

   @panbingkun Could you fix the test failure, seems it is related to your 
changes:
   ```
   sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 
"[COLUMN_NOT_FOUND] The column `some_random_column` cannot be found. Verify the 
spelling and correctness of the column name according to the SQL config 
"spark.sql.caseSensitive"." did not contain "does not exist"
at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
at 
org.apache.spark.sql.StatisticsCollectionSuite.$anonfun$new$15(StatisticsCollectionSuite.scala:136)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a diff in pull request #38998: [SPARK-41463][SQL][TESTS] Ensure error class names contain only capital letters, numbers and underscores

2022-12-09 Thread GitBox


MaxGekk commented on code in PR #38998:
URL: https://github.com/apache/spark/pull/38998#discussion_r1045013373


##
core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala:
##
@@ -147,6 +147,18 @@ class SparkThrowableSuite extends SparkFunSuite {
 assert(rereadErrorClassToInfoMap == errorReader.errorInfoMap)
   }
 
+  test("Error class names should contain only capital letters, numbers and 
underscores") {
+val allowedChars = "[A-Z0-9_]*"
+errorReader.errorInfoMap.foreach { e =>
+  assert(e._1.matches(allowedChars))

Review Comment:
   This one doesn't show which error class among 1500 is invalid, see:
   ```
   e._1.matches(allowedChars) was false
   ScalaTestFailureLocation: org.apache.spark.SparkThrowableSuite at 
(SparkThrowableSuite.scala:153)
   org.scalatest.exceptions.TestFailedException: e._1.matches(allowedChars) was 
false
   ```
   Could you output the error class name:
   ```suggestion
 assert(e._1.matches(allowedChars), s"The error class is invalid: 
${e._1}")
   ```



##
core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala:
##
@@ -147,6 +147,18 @@ class SparkThrowableSuite extends SparkFunSuite {
 assert(rereadErrorClassToInfoMap == errorReader.errorInfoMap)
   }
 
+  test("Error class names should contain only capital letters, numbers and 
underscores") {
+val allowedChars = "[A-Z0-9_]*"
+errorReader.errorInfoMap.foreach { e =>
+  assert(e._1.matches(allowedChars))
+  e._2.subClass.map { s =>
+s.keys.foreach { k =>
+  assert(k.matches(allowedChars))

Review Comment:
   ```suggestion
 assert(k.matches(allowedChars), s"The error sub-class is invalid: 
$k")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk closed pull request #38954: [SPARK-41417][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0019` to `INVALID_TYPED_LITERAL`

2022-12-09 Thread GitBox


MaxGekk closed pull request #38954: [SPARK-41417][CORE][SQL] Rename 
`_LEGACY_ERROR_TEMP_0019` to `INVALID_TYPED_LITERAL`
URL: https://github.com/apache/spark/pull/38954


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #38954: [SPARK-41417][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0019` to `INVALID_TYPED_LITERAL`

2022-12-09 Thread GitBox


MaxGekk commented on PR #38954:
URL: https://github.com/apache/spark/pull/38954#issuecomment-1345157464

   +1, LGTM. Merging to master.
   Thank you, @LuciferYang.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #39012: [SPARK-41475][CONNECT] Fix lint-scala command error and typo

2022-12-09 Thread GitBox


AmplabJenkins commented on PR #39012:
URL: https://github.com/apache/spark/pull/39012#issuecomment-1345155848

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #39013: [SPARK-41472][CONNECT][PYTHON] Implement the rest of string/binary functions

2022-12-09 Thread GitBox


AmplabJenkins commented on PR #39013:
URL: https://github.com/apache/spark/pull/39013#issuecomment-1345155835

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39005: [SPARK-41467][BUILD] Upgrade httpclient from 4.5.13 to 4.5.14

2022-12-09 Thread GitBox


dongjoon-hyun commented on PR #39005:
URL: https://github.com/apache/spark/pull/39005#issuecomment-1345151502

   All tests passed. Merged to master. Thank you, @panbingkun and all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #39005: [SPARK-41467][BUILD] Upgrade httpclient from 4.5.13 to 4.5.14

2022-12-09 Thread GitBox


dongjoon-hyun closed pull request #39005: [SPARK-41467][BUILD] Upgrade 
httpclient from 4.5.13 to 4.5.14
URL: https://github.com/apache/spark/pull/39005


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39015: [SPARK-41476][INFRA] Prevent `README.md` from triggering CIs

2022-12-09 Thread GitBox


dongjoon-hyun commented on PR #39015:
URL: https://github.com/apache/spark/pull/39015#issuecomment-1345151056

   Thank you, @HyukjinKwon !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng opened a new pull request, #39016: [SPARK-41477][CONNECT][PYTHON] Correctly infer the datatype of literal integers

2022-12-09 Thread GitBox


zhengruifeng opened a new pull request, #39016:
URL: https://github.com/apache/spark/pull/39016

   ### What changes were proposed in this pull request?
   check the bounds of integer and choose the correct datatypes
   
   
   ### Why are the changes needed?
   to match pyspark:
   
   ```
   
   In [15]: int_max = 2147483647
   
   In [16]: long_max = 9223372036854775807
   
   In [17]: sdf = spark.range(0, 1)
   
   In [18]: sdf.select(lit(0), lit(int_max), lit(int_max + 1), lit(long_max))
   Out[18]: DataFrame[0: int, 2147483647: int, 2147483648: bigint, 
9223372036854775807: bigint]
   
   In [19]: sdf.select(lit(0), lit(int_max), lit(int_max + 1), lit(long_max), 
lit(long_max + 1))
   ---
   Py4JError Traceback (most recent call last)
   Cell In[19], line 1
   > 1 sdf.select(lit(0), lit(int_max), lit(int_max + 1), lit(long_max), 
lit(long_max + 1))
   
   File ~/Dev/spark/python/pyspark/sql/functions.py:179, in lit(col)
   177 if dt is not None:
   178 return _invoke_function("lit", col).astype(dt)
   --> 179 return _invoke_function("lit", col)
   
   File ~/Dev/spark/python/pyspark/sql/functions.py:88, in 
_invoke_function(name, *args)
86 assert SparkContext._active_spark_context is not None
87 jf = _get_jvm_function(name, SparkContext._active_spark_context)
   ---> 88 return Column(jf(*args))
   
   ...
   
   File ~/Dev/spark/python/pyspark/sql/utils.py:199, in 
capture_sql_exception..deco(*a, **kw)
   197 def deco(*a: Any, **kw: Any) -> Any:
   198 try:
   --> 199 return f(*a, **kw)
   200 except Py4JJavaError as e:
   201 converted = convert_exception(e.java_exception)
   
   File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:330, in 
get_return_value(answer, gateway_client, target_id, name)
   326 raise Py4JJavaError(
   327 "An error occurred while calling {0}{1}{2}.\n".
   328 format(target_id, ".", name), value)
   329 else:
   --> 330 raise Py4JError(
   331 "An error occurred while calling {0}{1}{2}. 
Trace:\n{3}\n".
   332 format(target_id, ".", name, value))
   333 else:
   334 raise Py4JError(
   335 "An error occurred while calling {0}{1}{2}".
   336 format(target_id, ".", name))
   
   Py4JError: An error occurred while calling 
z:org.apache.spark.sql.functions.lit. Trace:
   java.lang.NumberFormatException: For input string: "9223372036854775808"
   at 
java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
   
   ```
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   no
   
   
   ### How was this patch tested?
   added ut


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39015: [SPARK-41476][INFRA] Prevent `README.md` from triggering CIs

2022-12-09 Thread GitBox


dongjoon-hyun commented on PR #39015:
URL: https://github.com/apache/spark/pull/39015#issuecomment-1345146063

   BTW, if you don't mind, I'd like to land this to all live branches to reduce 
the community resources. WDTY, @HyukjinKwon and @gengliangwang ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #38991: [SPARK-41457][PYTHON][TESTS] Refactor type annotations and dependency checks in tests

2022-12-09 Thread GitBox


dongjoon-hyun closed pull request #38991: [SPARK-41457][PYTHON][TESTS] Refactor 
type annotations and dependency checks in tests
URL: https://github.com/apache/spark/pull/38991


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #38991: [SPARK-41457][PYTHON][TESTS] Refactor type annotations and dependency checks in tests

2022-12-09 Thread GitBox


dongjoon-hyun commented on PR #38991:
URL: https://github.com/apache/spark/pull/38991#issuecomment-1345145538

   All python and linter tests passed. Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #39012: [SPARK-41475][CONNECT] Fix lint-scala command error and typo

2022-12-09 Thread GitBox


dongjoon-hyun closed pull request #39012: [SPARK-41475][CONNECT] Fix lint-scala 
command error and typo
URL: https://github.com/apache/spark/pull/39012


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39015: [SPARK-41476][INFRA] Prevent `README.md` from triggering CIs

2022-12-09 Thread GitBox


dongjoon-hyun commented on code in PR #39015:
URL: https://github.com/apache/spark/pull/39015#discussion_r1044989412


##
dev/sparktestsupport/utils.py:
##
@@ -34,19 +34,22 @@ def determine_modules_for_files(filenames):
 Given a list of filenames, return the set of modules that contain those 
files.
 If a file is not associated with a more specific submodule, then this 
method will consider that
 file to belong to the 'root' module. `.github` directory is counted only 
in GitHub Actions,
-and `appveyor.yml` is always ignored because this file is dedicated only 
to AppVeyor builds.
+and `appveyor.yml` is always ignored because this file is dedicated only 
to AppVeyor builds,
+and `README.md` is always ignored too.
 
 >>> sorted(x.name for x in 
determine_modules_for_files(["python/pyspark/a.py", "sql/core/foo"]))
 ['pyspark-core', 'sql']
 >>> [x.name for x in 
determine_modules_for_files(["file_not_matched_by_any_subproject"])]
 ['root']
->>> [x.name for x in determine_modules_for_files(["appveyor.yml"])]
+>>> [x.name for x in determine_modules_for_files(["appveyor.yml", 
"sql/README.md"])]
 []
 """
 changed_modules = set()
 for filename in filenames:
 if filename in ("appveyor.yml",):
 continue
+if filename.endswith("README.md"):

Review Comment:
   Since we need `endswith` for `README.md`, we have a new `if` statement, 
@HyukjinKwon .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39015: [SPARK-41476][INFRA] Prevent `README.md` from triggering CIs

2022-12-09 Thread GitBox


dongjoon-hyun commented on code in PR #39015:
URL: https://github.com/apache/spark/pull/39015#discussion_r1044988801


##
dev/sparktestsupport/utils.py:
##
@@ -84,7 +84,7 @@ def identify_changed_files_from_git_commits(patch_sha, 
target_branch=None, targe
 ["git", "diff", "--name-only", patch_sha, diff_target], 
universal_newlines=True
 )
 # Remove any empty strings
-return [f for f in raw_output.split("\n") if f]
+return [f for f in raw_output.split("\n") if f and not 
f.endswith("README.md")]

Review Comment:
   Oh, right. I forgot that. Let me move this there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shuyouZZ commented on a diff in pull request #38983: [SPARK-41447][CORE] Clean up expired event log files that don't exist in listing db

2022-12-09 Thread GitBox


shuyouZZ commented on code in PR #38983:
URL: https://github.com/apache/spark/pull/38983#discussion_r1044983924


##
core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala:
##
@@ -1705,6 +1705,47 @@ abstract class FsHistoryProviderSuite extends 
SparkFunSuite with Matchers with P
 provider.stop()
   }
 
+  test("clean up expired event log files that don't exist in listing db") {

Review Comment:
   > Could you add a test prefix like the following?
   > 
   > ```scala
   > - test("clean up expired event log files that don't exist in listing db") {
   > + test("SPARK-41447: clean up expired event log files that don't exist in 
listing db") {
   > ```
   
   @dongjoon-hyun, could you review this too, thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #38993: [SPARK-41459][SQL] fix thrift server operation log output is empty

2022-12-09 Thread GitBox


viirya commented on PR #38993:
URL: https://github.com/apache/spark/pull/38993#issuecomment-1345117121

   > BTW, seems you change the PR description template by mistake. Can you 
restore the template?
   
   Can you restore to the standard template?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a diff in pull request #38993: [SPARK-41459][SQL] fix thrift server operation log output is empty

2022-12-09 Thread GitBox


viirya commented on code in PR #38993:
URL: https://github.com/apache/spark/pull/38993#discussion_r1044982387


##
sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java:
##
@@ -276,12 +276,19 @@ private static StringLayout 
initLayout(OperationLog.LoggingLevel loggingMode) {
 return getLayout(isVerbose, layout);
   }
 
-  public LogDivertAppender(OperationManager operationManager,
-OperationLog.LoggingLevel loggingMode) {
+  public static LogDivertAppender create(OperationManager operationManager,
+ OperationLog.LoggingLevel 
loggingMode) {
+CharArrayWriter writer = new CharArrayWriter();
+return new LogDivertAppender(operationManager, loggingMode, writer);
+  }
+
+  private LogDivertAppender(OperationManager operationManager,
+OperationLog.LoggingLevel loggingMode, CharArrayWriter writer) {

Review Comment:
   Oh, yea. I didn't run the suggested one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] idealspark commented on a diff in pull request #38993: [SPARK-41459][SQL] fix thrift server operation log output is empty

2022-12-09 Thread GitBox


idealspark commented on code in PR #38993:
URL: https://github.com/apache/spark/pull/38993#discussion_r1044970950


##
sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java:
##
@@ -276,12 +276,19 @@ private static StringLayout 
initLayout(OperationLog.LoggingLevel loggingMode) {
 return getLayout(isVerbose, layout);
   }
 
-  public LogDivertAppender(OperationManager operationManager,
-OperationLog.LoggingLevel loggingMode) {
+  public static LogDivertAppender create(OperationManager operationManager,
+ OperationLog.LoggingLevel 
loggingMode) {
+CharArrayWriter writer = new CharArrayWriter();
+return new LogDivertAppender(operationManager, loggingMode, writer);
+  }
+
+  private LogDivertAppender(OperationManager operationManager,

Review Comment:
   we need CharArrayWriter  as parameter pass to  LogDivertAppender constructor 
, then pass to supper constructor. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] idealspark commented on a diff in pull request #38993: [SPARK-41459][SQL] fix thrift server operation log output is empty

2022-12-09 Thread GitBox


idealspark commented on code in PR #38993:
URL: https://github.com/apache/spark/pull/38993#discussion_r1044969362


##
sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java:
##
@@ -276,12 +276,19 @@ private static StringLayout 
initLayout(OperationLog.LoggingLevel loggingMode) {
 return getLayout(isVerbose, layout);
   }
 
-  public LogDivertAppender(OperationManager operationManager,
-OperationLog.LoggingLevel loggingMode) {
+  public static LogDivertAppender create(OperationManager operationManager,
+ OperationLog.LoggingLevel 
loggingMode) {
+CharArrayWriter writer = new CharArrayWriter();
+return new LogDivertAppender(operationManager, loggingMode, writer);
+  }
+
+  private LogDivertAppender(OperationManager operationManager,
+OperationLog.LoggingLevel loggingMode, CharArrayWriter writer) {

Review Comment:
   this is no way to create CharArrayWriter in original constructor.
   because original constructor first need call supper constructor.  supper 
constructor need a CharArrayWriter as parameter.
   ```
   private LogDivertAppender(OperationManager operationManager,
   OperationLog.LoggingLevel loggingMode, CharArrayWriter writer) {
   super("LogDivertAppender", initLayout(loggingMode), null, false, true, 
Property.EMPTY_ARRAY,
   new WriterManager(writer, "LogDivertAppender",
   initLayout(loggingMode), true));
   }
   ```
   
   if create CharArrayWriter in original constructor like you suggest  compile 
will failed.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dengziming commented on a diff in pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint

2022-12-09 Thread GitBox


dengziming commented on code in PR #38984:
URL: https://github.com/apache/spark/pull/38984#discussion_r1044965012


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##
@@ -305,7 +305,11 @@ class SparkConnectPlanner(session: SparkSession) {
   }
 
   private def transformHint(rel: proto.Hint): LogicalPlan = {
-val params = rel.getParametersList.asScala.map(toCatalystValue).toSeq
+val params = rel.getParametersList.asScala.map(toCatalystValue).toSeq.map{
+  case name: String => UnresolvedAttribute(name)
+  case l: Long => l.toInt

Review Comment:
   Hi @HyukjinKwon , using a long value will fail scala pattern match here 
since it will only accept a int value:
   
   
https://github.com/apache/spark/blob/c78f935185cbfae2e45adb393def0bd389a89c7e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala#L242-L249



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] idealspark commented on a diff in pull request #38993: [SPARK-41459][SQL] fix thrift server operation log output is empty

2022-12-09 Thread GitBox


idealspark commented on code in PR #38993:
URL: https://github.com/apache/spark/pull/38993#discussion_r1044964246


##
sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java:
##
@@ -276,12 +276,19 @@ private static StringLayout 
initLayout(OperationLog.LoggingLevel loggingMode) {
 return getLayout(isVerbose, layout);
   }
 
-  public LogDivertAppender(OperationManager operationManager,
-OperationLog.LoggingLevel loggingMode) {
+  public static LogDivertAppender create(OperationManager operationManager,
+ OperationLog.LoggingLevel 
loggingMode) {
+CharArrayWriter writer = new CharArrayWriter();
+return new LogDivertAppender(operationManager, loggingMode, writer);
+  }
+
+  private LogDivertAppender(OperationManager operationManager,
+OperationLog.LoggingLevel loggingMode, CharArrayWriter writer) {
 super("LogDivertAppender", initLayout(loggingMode), null, false, true, 
Property.EMPTY_ARRAY,
-new WriterManager(new CharArrayWriter(), "LogDivertAppender",

Review Comment:
   yes, it use another CharArrayWriter, and in this class append() method does 
not call supper.append()  so CharArrayWriter is empty.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] panbingkun commented on a diff in pull request #38972: [SPARK-41443][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1061

2022-12-09 Thread GitBox


panbingkun commented on code in PR #38972:
URL: https://github.com/apache/spark/pull/38972#discussion_r1044960581


##
core/src/main/resources/error/error-classes.json:
##
@@ -109,6 +109,11 @@
   "The column  already exists. Consider to choose another name 
or rename the existing column."
 ]
   },
+  "COLUMN_NOT_FOUND" : {
+"message" : [
+  "The column  does not exist."

Review Comment:
   Done



##
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala:
##
@@ -155,7 +155,7 @@ class ResolveSessionCatalog(val catalogManager: 
CatalogManager)
 case DescribeColumn(ResolvedV1TableIdentifier(ident), column, isExtended, 
output) =>
   column match {
 case u: UnresolvedAttribute =>
-  throw QueryCompilationErrors.columnDoesNotExistError(u.name)
+  throw QueryCompilationErrors.columnNotExistError(u.name)

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint

2022-12-09 Thread GitBox


HyukjinKwon commented on code in PR #38984:
URL: https://github.com/apache/spark/pull/38984#discussion_r1044958569


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##
@@ -305,7 +305,11 @@ class SparkConnectPlanner(session: SparkSession) {
   }
 
   private def transformHint(rel: proto.Hint): LogicalPlan = {
-val params = rel.getParametersList.asScala.map(toCatalystValue).toSeq
+val params = rel.getParametersList.asScala.map(toCatalystValue).toSeq.map{
+  case name: String => UnresolvedAttribute(name)
+  case l: Long => l.toInt

Review Comment:
   sorry if I am being dumb but why do we need pass a int literal? Can't we 
just pass long literal?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint

2022-12-09 Thread GitBox


HyukjinKwon commented on code in PR #38984:
URL: https://github.com/apache/spark/pull/38984#discussion_r1044958569


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##
@@ -305,7 +305,11 @@ class SparkConnectPlanner(session: SparkSession) {
   }
 
   private def transformHint(rel: proto.Hint): LogicalPlan = {
-val params = rel.getParametersList.asScala.map(toCatalystValue).toSeq
+val params = rel.getParametersList.asScala.map(toCatalystValue).toSeq.map{
+  case name: String => UnresolvedAttribute(name)
+  case l: Long => l.toInt

Review Comment:
   sorry if I am being dumb but why do we need pass a int literal?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a diff in pull request #38876: [SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost

2022-12-09 Thread GitBox


Ngone51 commented on code in PR #38876:
URL: https://github.com/apache/spark/pull/38876#discussion_r1044955782


##
core/src/main/scala/org/apache/spark/storage/BlockManager.scala:
##
@@ -637,9 +637,11 @@ private[spark] class BlockManager(
   def reregister(): Unit = {
 // TODO: We might need to rate limit re-registering.
 logInfo(s"BlockManager $blockManagerId re-registering with master")
-master.registerBlockManager(blockManagerId, 
diskBlockManager.localDirsString, maxOnHeapMemory,
-  maxOffHeapMemory, storageEndpoint)
-reportAllBlocks()
+val id = master.registerBlockManager(blockManagerId, 
diskBlockManager.localDirsString,
+  maxOnHeapMemory, maxOffHeapMemory, storageEndpoint, isReRegister = true)
+if (id.executorId != BlockManagerId.INVALID_EXECUTOR_ID) {
+  reportAllBlocks()
+}

Review Comment:
   Added the termination logic here. cc @mridulm 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38991: [SPARK-41457][PYTHON][TESTS] Refactor type annotations and dependency checks in tests

2022-12-09 Thread GitBox


HyukjinKwon commented on code in PR #38991:
URL: https://github.com/apache/spark/pull/38991#discussion_r1044955147


##
dev/lint-python:
##
@@ -104,7 +104,7 @@ function mypy_data_test {
   -c dev/pyproject.toml \
   --rootdir python \
   --mypy-only-local-stub \
-  --mypy-ini-file python/mypy.ini \
+  --mypy-ini-file dev/mypy.ini \

Review Comment:
   Okay, MyPy requires the configuration file to locate in the root directory 
of the package. Let me move it back for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39015: [SPARK-41476][INFRA] Prevent `README.md` from triggering CIs

2022-12-09 Thread GitBox


HyukjinKwon commented on code in PR #39015:
URL: https://github.com/apache/spark/pull/39015#discussion_r1044952894


##
dev/sparktestsupport/utils.py:
##
@@ -84,7 +84,7 @@ def identify_changed_files_from_git_commits(patch_sha, 
target_branch=None, targe
 ["git", "diff", "--name-only", patch_sha, diff_target], 
universal_newlines=True
 )
 # Remove any empty strings
-return [f for f in raw_output.split("\n") if f]
+return [f for f in raw_output.split("\n") if f and not 
f.endswith("README.md")]

Review Comment:
   Should we maybe fix it in L48 `if filename in ("appveyor.yml",):` at 
`determine_modules_for_files` above?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39014: [SPARK-41474][PROTOBUF][BUILD] Exclude `proto` files from `spark-protobuf` jar

2022-12-09 Thread GitBox


dongjoon-hyun commented on PR #39014:
URL: https://github.com/apache/spark/pull/39014#issuecomment-1344990963

   All tests passed. Merged to master for Apache Spark 3.4.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #39014: [SPARK-41474][PROTOBUF][BUILD] Exclude `proto` files from `spark-protobuf` jar

2022-12-09 Thread GitBox


dongjoon-hyun closed pull request #39014: [SPARK-41474][PROTOBUF][BUILD] 
Exclude `proto` files from `spark-protobuf` jar
URL: https://github.com/apache/spark/pull/39014


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39015: [SPARK-41476][INFRA] Prevent `README.md` from triggering CIs

2022-12-09 Thread GitBox


dongjoon-hyun commented on PR #39015:
URL: https://github.com/apache/spark/pull/39015#issuecomment-1344988102

   Thank you so much, @gengliangwang !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39015: [SPARK-41476][INFRA] Prevent `README.md` from triggering CIs

2022-12-09 Thread GitBox


dongjoon-hyun commented on PR #39015:
URL: https://github.com/apache/spark/pull/39015#issuecomment-1344987151

   Could you review this too please, @gengliangwang ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-09 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1044857157


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala:
##
@@ -38,6 +38,12 @@ private[sql] class ProtobufOptions(
 
   val parseMode: ParseMode =
 parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode)
+
+  val circularReferenceType: String = 
parameters.getOrElse("circularReferenceType", "FIELD_NAME")

Review Comment:
   @rangadi @baganokodo2022 thanks for the quick meet. meeting conclusion was 
to use descriptor type full name and added unit tests with some complex schema. 
   ```
   val recordName = fd.getMessageType.getFullName
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request, #39015: [SPARK-41476][INFRA] Prevent `README.md` from triggering CIs

2022-12-09 Thread GitBox


dongjoon-hyun opened a new pull request, #39015:
URL: https://github.com/apache/spark/pull/39015

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39014: [SPARK-41474][PROTOBUF][BUILD] Exclude `proto` files from `spark-protobuf` jar

2022-12-09 Thread GitBox


dongjoon-hyun commented on PR #39014:
URL: https://github.com/apache/spark/pull/39014#issuecomment-1344983516

   Thank you, @gengliangwang !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39014: [SPARK-41474][PROTOBUF][BUILD] Exclude `proto` files from `spark-protobuf` jar

2022-12-09 Thread GitBox


dongjoon-hyun commented on PR #39014:
URL: https://github.com/apache/spark/pull/39014#issuecomment-1344977841

   Thank you, @SandishKumarHN .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on pull request #39014: [SPARK-41474][PROTOBUF][BUILD] Exclude `proto` files from `spark-protobuf` jar

2022-12-09 Thread GitBox


SandishKumarHN commented on PR #39014:
URL: https://github.com/apache/spark/pull/39014#issuecomment-1344977544

   @dongjoon-hyun LGTM, thanks for the PR. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39014: [SPARK-41474][PROTOBUF][BUILD] Exclude `proto` files from `spark-protobuf` jar

2022-12-09 Thread GitBox


dongjoon-hyun commented on PR #39014:
URL: https://github.com/apache/spark/pull/39014#issuecomment-1344964855

   Could you review this, @SandishKumarHN and @gengliangwang ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #38946: [SPARK-41414][CONNECT][PYTHON] Implement date/timestamp functions

2022-12-09 Thread GitBox


zhengruifeng commented on PR #38946:
URL: https://github.com/apache/spark/pull/38946#issuecomment-1344961425

   merged into master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng closed pull request #38946: [SPARK-41414][CONNECT][PYTHON] Implement date/timestamp functions

2022-12-09 Thread GitBox


zhengruifeng closed pull request #38946: [SPARK-41414][CONNECT][PYTHON] 
Implement date/timestamp functions
URL: https://github.com/apache/spark/pull/38946


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a diff in pull request #38996: [SPARK-41008][MLLIB] Follow-up isotonic regression features deduplica…

2022-12-09 Thread GitBox


srowen commented on code in PR #38996:
URL: https://github.com/apache/spark/pull/38996#discussion_r1044938891


##
docs/mllib-isotonic-regression.md:
##
@@ -43,7 +43,17 @@ best fitting the original data points.
 which uses an approach to
 [parallelizing isotonic 
regression](https://doi.org/10.1007/978-3-642-99789-1_10).
 The training input is an RDD of tuples of three double values that represent
-label, feature and weight in this order. Additionally, IsotonicRegression 
algorithm has one
+label, feature and weight in this order. In case there are multiple tuples with
+the same feature then these tuples are aggregated into a single tuple as 
follows:
+
+* Aggregated label is the weighted average of all labels.
+* Aggregated feature is the weighted average of all equal features. It is 
possible

Review Comment:
   (Sorry if this double-posts)
   Yeah I was confused about this too. If feature values are pooled when 
exactly equal, why average them? if anything that introduces tiny errors



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dengziming commented on a diff in pull request #39012: [ SPARK-41475][CONNECT]: Fix lint-scala command error and typo

2022-12-09 Thread GitBox


dengziming commented on code in PR #39012:
URL: https://github.com/apache/spark/pull/39012#discussion_r1044935825


##
dev/lint-scala:
##
@@ -29,14 +29,14 @@ ERRORS=$(./build/mvn \
 -Dscalafmt.skip=false \
 -Dscalafmt.validateOnly=true \
 -Dscalafmt.changedOnly=false \
--pl connector/connect \
+-pl connector/connect/server \

Review Comment:
   Thank you for your reminders, I have filed one and changed the title.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dengziming commented on a diff in pull request #39012: [MINOR][CONNECT]: Fix command error and typo

2022-12-09 Thread GitBox


dengziming commented on code in PR #39012:
URL: https://github.com/apache/spark/pull/39012#discussion_r1044932932


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##
@@ -322,8 +322,7 @@ class SparkConnectPlanner(session: SparkSession) {
 None,
 rel.getVariableColumnName,
 Seq(rel.getValueColumnName),
-transformRelation(rel.getInput)
-  )
+transformRelation(rel.getInput))

Review Comment:
   Yes, I re-run lint-scala after changing it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #38994: [SPARK-41329][CONNECT] Resolve circular imports in Spark Connect

2022-12-09 Thread GitBox


dongjoon-hyun commented on PR #38994:
URL: https://github.com/apache/spark/pull/38994#issuecomment-1344949254

   No problem at all. If you checked locally, that's more than enough. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request, #39014: [SPARK-41474][BUILD] Exclude proto files from spark-protobuf

2022-12-09 Thread GitBox


dongjoon-hyun opened a new pull request, #39014:
URL: https://github.com/apache/spark/pull/39014

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #38994: [SPARK-41329][CONNECT] Resolve circular imports in Spark Connect

2022-12-09 Thread GitBox


HyukjinKwon commented on PR #38994:
URL: https://github.com/apache/spark/pull/38994#issuecomment-1344946168

   oh yeah. the tests passed but the linter failed. I just removed one ignore 
in linter that's not used, and I checked it locally.
   
   But sorry I should have waited for the test results. Will make sure on that 
next time. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #38994: [SPARK-41329][CONNECT] Resolve circular imports in Spark Connect

2022-12-09 Thread GitBox


dongjoon-hyun commented on PR #38994:
URL: https://github.com/apache/spark/pull/38994#issuecomment-1344945736

   Ur, does this pass CI?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #38994: [SPARK-41329][CONNECT] Resolve circular imports in Spark Connect

2022-12-09 Thread GitBox


HyukjinKwon closed pull request #38994: [SPARK-41329][CONNECT] Resolve circular 
imports in Spark Connect
URL: https://github.com/apache/spark/pull/38994


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #38994: [SPARK-41329][CONNECT] Resolve circular imports in Spark Connect

2022-12-09 Thread GitBox


HyukjinKwon commented on PR #38994:
URL: https://github.com/apache/spark/pull/38994#issuecomment-1344945421

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38994: [SPARK-41329][CONNECT] Resolve circular imports in Spark Connect

2022-12-09 Thread GitBox


HyukjinKwon commented on code in PR #38994:
URL: https://github.com/apache/spark/pull/38994#discussion_r1044928308


##
python/pyspark/sql/connect/column.py:
##
@@ -706,28 +705,30 @@ def substr(self, startPos: Union[int, "Column"], length: 
Union[int, "Column"]) -
 length_t=type(length),
 )
 )
-from pyspark.sql.connect.function_builder import functions as F
 
 if isinstance(length, int):
-length_exp = self._lit(length)
+length_exp = lit(length)
 elif isinstance(length, Column):
 length_exp = length
 else:
 raise TypeError("Unsupported type for substr().")
 
 if isinstance(startPos, int):
-start_exp = self._lit(startPos)
+start_exp = lit(startPos)
 else:
 start_exp = startPos
 
-return F.substr(self, start_exp, length_exp)
+return F.substr(self, start_exp, length_exp)  # type: 
ignore[attr-defined]

Review Comment:
   ```suggestion
   return F.substr(self, start_exp, length_exp)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a diff in pull request #39000: [SPARK-41461][BUILD][CORE] Support user configurable protoc executables when building Spark Core.

2022-12-09 Thread GitBox


gengliangwang commented on code in PR #39000:
URL: https://github.com/apache/spark/pull/39000#discussion_r1044927630


##
core/pom.xml:
##
@@ -713,6 +693,71 @@
 
   
 
+
+  default-protoc
+  
+
+  !skipDefaultProtoc
+
+  
+  
+
+  
+com.github.os72
+protoc-jar-maven-plugin
+${protoc-jar-maven-plugin.version}
+
+  
+generate-sources
+
+  run
+
+
+  
com.google.protobuf:protoc:${protobuf.version}
+  ${protobuf.version}
+  
+src/main/protobuf
+  
+
+  
+
+  
+
+  
+
+
+  user-defined-protoc
+  
+
${env.CORE_PROTOC_EXEC_PATH}
+  
+  
+
+  
+com.github.os72
+protoc-jar-maven-plugin
+${protoc-jar-maven-plugin.version}
+
+  
+generate-sources
+
+  run
+
+
+  
com.google.protobuf:protoc:${protobuf.version}
+  ${protobuf.version}
+  ${core.protoc.executable.path}
+  
+src/main/protobuf
+  
+
+  
+
+  
+
+  
+
   
 
+

Review Comment:
   nit: remove empty lines?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #36921: [SPARK-39481][SQL] Do not push down complex filter condition

2022-12-09 Thread GitBox


github-actions[bot] closed pull request #36921: [SPARK-39481][SQL] Do not push 
down complex filter condition
URL: https://github.com/apache/spark/pull/36921


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #38985: [SPARK-41451][K8S] Avoid using empty abbrevMarker in StringUtils.abbreviate

2022-12-09 Thread GitBox


dongjoon-hyun commented on PR #38985:
URL: https://github.com/apache/spark/pull/38985#issuecomment-1344930220

   Thank you for closing, @pan3793 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang closed pull request #38988: [SPARK-41456][SQL] Improve the performance of try_cast

2022-12-09 Thread GitBox


gengliangwang closed pull request #38988: [SPARK-41456][SQL] Improve the 
performance of try_cast
URL: https://github.com/apache/spark/pull/38988


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #38988: [SPARK-41456][SQL] Improve the performance of try_cast

2022-12-09 Thread GitBox


gengliangwang commented on PR #38988:
URL: https://github.com/apache/spark/pull/38988#issuecomment-1344894973

   @cloud-fan @HyukjinKwon @LuciferYang Thanks for the review. 
   Merging to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xinrong-meng commented on pull request #39009: [SPARK-41225][CONNECT][PYTHON][FOLLOW-UP] Disable unsupported functions.

2022-12-09 Thread GitBox


xinrong-meng commented on PR #39009:
URL: https://github.com/apache/spark/pull/39009#issuecomment-1344864352

   Merged to master, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xinrong-meng closed pull request #39009: [SPARK-41225][CONNECT][PYTHON][FOLLOW-UP] Disable unsupported functions.

2022-12-09 Thread GitBox


xinrong-meng closed pull request #39009: 
[SPARK-41225][CONNECT][PYTHON][FOLLOW-UP] Disable unsupported functions.
URL: https://github.com/apache/spark/pull/39009


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xinrong-meng opened a new pull request, #39013: [SPARK-41472][CONNECT][PYTHON] Implement the rest of string/binary functions

2022-12-09 Thread GitBox


xinrong-meng opened a new pull request, #39013:
URL: https://github.com/apache/spark/pull/39013

   ### What changes were proposed in this pull request?
   Implement the rest of string/binary functions. The first commit is 
https://github.com/apache/spark/pull/38921.
   
   
   ### Why are the changes needed?
   For API coverage on Spark Connect.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. New functions are available on Spark Connect.
   
   ### How was this patch tested?
   Unit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #38982: [SPARK-41376][CORE][3.2] Correct the Netty preferDirectBufs check logic on executor start

2022-12-09 Thread GitBox


dongjoon-hyun closed pull request #38982: [SPARK-41376][CORE][3.2] Correct the 
Netty preferDirectBufs check logic on executor start
URL: https://github.com/apache/spark/pull/38982


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics

2022-12-09 Thread GitBox


otterc commented on code in PR #36165:
URL: https://github.com/apache/spark/pull/36165#discussion_r1044862699


##
core/src/test/resources/HistoryServerExpectations/one_stage_json_with_partitionId_expectation.json:
##
@@ -26,13 +26,23 @@
   "outputBytes" : 0,
   "outputRecords" : 0,
   "shuffleRemoteBlocksFetched" : 0,
-  "shuffleLocalBlocksFetched" : 80,
+  "shuffleLocalBlocksFetched" : 0,

Review Comment:
   Same here. Why did we change the value?



##
core/src/test/resources/HistoryServerExpectations/one_stage_json_with_partitionId_expectation.json:
##
@@ -26,13 +26,23 @@
   "outputBytes" : 0,
   "outputRecords" : 0,
   "shuffleRemoteBlocksFetched" : 0,
-  "shuffleLocalBlocksFetched" : 80,
+  "shuffleLocalBlocksFetched" : 0,
   "shuffleFetchWaitTime" : 0,
   "shuffleRemoteBytesRead" : 0,
   "shuffleRemoteBytesReadToDisk" : 0,
-  "shuffleLocalBytesRead" : 3760,
-  "shuffleReadBytes" : 3760,
-  "shuffleReadRecords" : 100,
+  "shuffleLocalBytesRead" : 0,
+  "shuffleReadBytes" : 0,
+  "shuffleReadRecords" : 0,

Review Comment:
   Same for these.



##
core/src/test/resources/HistoryServerExpectations/one_stage_json_with_partitionId_expectation.json:
##
@@ -76,12 +86,24 @@
 },
 "shuffleReadMetrics" : {
   "remoteBlocksFetched" : 0,
-  "localBlocksFetched" : 8,
+  "localBlocksFetched" : 0,
   "fetchWaitTime" : 0,
   "remoteBytesRead" : 0,
   "remoteBytesReadToDisk" : 0,
-  "localBytesRead" : 376,
-  "recordsRead" : 10
+  "localBytesRead" : 0,
+  "recordsRead" : 0,

Review Comment:
   this as well



##
core/src/test/resources/HistoryServerExpectations/one_stage_json_with_partitionId_expectation.json:
##
@@ -127,12 +149,24 @@
 },
 "shuffleReadMetrics" : {
   "remoteBlocksFetched" : 0,
-  "localBlocksFetched" : 8,
+  "localBlocksFetched" : 0,
   "fetchWaitTime" : 0,
   "remoteBytesRead" : 0,
   "remoteBytesReadToDisk" : 0,
-  "localBytesRead" : 376,
-  "recordsRead" : 10
+  "localBytesRead" : 0,
+  "recordsRead" : 0,

Review Comment:
   existing value changed



##
core/src/test/resources/HistoryServerExpectations/failed_stage_list_json_expectation.json:
##
@@ -66,5 +76,7 @@
 "MajorGCCount" : 0,
 "MajorGCTime" : 0,
 "TotalGCTime" : 0
-  }
-} ]
+  },
+  "isPushBasedShuffleEnabled" : false,
+  "shuffleMergersCount" : 0
+} ]

Review Comment:
   No newline at the end of file



##
core/src/test/resources/HistoryServerExpectations/one_stage_json_with_partitionId_expectation.json:
##
@@ -76,12 +86,24 @@
 },
 "shuffleReadMetrics" : {
   "remoteBlocksFetched" : 0,
-  "localBlocksFetched" : 8,
+  "localBlocksFetched" : 0,

Review Comment:
   here as well



##
core/src/test/resources/HistoryServerExpectations/one_stage_json_with_partitionId_expectation.json:
##
@@ -178,12 +212,24 @@
 },
 "shuffleReadMetrics" : {
   "remoteBlocksFetched" : 0,
-  "localBlocksFetched" : 8,
+  "localBlocksFetched" : 0,

Review Comment:
   same here



##
core/src/test/resources/HistoryServerExpectations/one_stage_json_with_partitionId_expectation.json:
##
@@ -178,12 +212,24 @@
 },
 "shuffleReadMetrics" : {
   "remoteBlocksFetched" : 0,
-  "localBlocksFetched" : 8,
+  "localBlocksFetched" : 0,
   "fetchWaitTime" : 0,
   "remoteBytesRead" : 0,
   "remoteBytesReadToDisk" : 0,
-  "localBytesRead" : 372,
-  "recordsRead" : 9
+  "localBytesRead" : 0,
+  "recordsRead" : 0,

Review Comment:
   same here



##
core/src/test/resources/HistoryServerExpectations/one_stage_json_with_partitionId_expectation.json:
##
@@ -127,12 +149,24 @@
 },
 "shuffleReadMetrics" : {
   "remoteBlocksFetched" : 0,
-  "localBlocksFetched" : 8,
+  "localBlocksFetched" : 0,

Review Comment:
   value is changed



##
core/src/test/resources/HistoryServerExpectations/failed_stage_list_json_expectation.json:
##
@@ -27,13 +27,23 @@
   "outputBytes" : 0,
   "outputRecords" : 0,
   "shuffleRemoteBlocksFetched" : 0,
-  "shuffleLocalBlocksFetched" : 64,
-  "shuffleFetchWaitTime" : 1,
+  "shuffleLocalBlocksFetched" : 0,
+  "shuffleFetchWaitTime" : 0,

Review Comment:
   Why are we changing these values?



##
core/src/test/resources/HistoryServerExpectations/one_stage_json_with_partitionId_expectation.json:
##
@@ -620,5 +750,7 @@
 "MajorGCCount" : 0,
 "MajorGCTime" : 0,
 "TotalGCTime" : 0
-  }
+  },
+  "isPushBasedShuffleEnabled" : false,
+  "shuffleMergersCount" : 0
 } ]

Review Comment:
   We can add a new line at the end of file.



-- 
This is an automated message from the Apache 

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-09 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1044857157


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala:
##
@@ -38,6 +38,12 @@ private[sql] class ProtobufOptions(
 
   val parseMode: ParseMode =
 parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode)
+
+  val circularReferenceType: String = 
parameters.getOrElse("circularReferenceType", "FIELD_NAME")

Review Comment:
   @rangadi @baganokodo2022 thanks for the quick meet. meeting conclusion use 
descriptor type name and added unit tests with some complex schema. 
   ```
   val recordName = fd.getMessageType.getFullName
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics

2022-12-09 Thread GitBox


mridulm commented on code in PR #36165:
URL: https://github.com/apache/spark/pull/36165#discussion_r1044823829


##
core/src/main/scala/org/apache/spark/status/api/v1/api.scala:
##
@@ -302,7 +312,9 @@ class StageData private[spark](
 @JsonDeserialize(using = classOf[ExecutorMetricsJsonDeserializer])
 val peakExecutorMetrics: Option[ExecutorMetrics],
 val taskMetricsDistributions: Option[TaskMetricDistributions],
-val executorMetricsDistributions: Option[ExecutorMetricsDistributions])
+val executorMetricsDistributions: Option[ExecutorMetricsDistributions],
+val isPushBasedShuffleEnabled: Boolean,

Review Comment:
   We actually have a lot of `pushBased*` references in the codebase.
   Given the block id's are named shufflePush, that might be a good prefix to 
settle on.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] anchovYu commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer

2022-12-09 Thread GitBox


anchovYu commented on code in PR #38776:
URL: https://github.com/apache/spark/pull/38776#discussion_r1044807443


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala:
##
@@ -424,8 +424,51 @@ case class OuterReference(e: NamedExpression)
   override def qualifier: Seq[String] = e.qualifier
   override def exprId: ExprId = e.exprId
   override def toAttribute: Attribute = e.toAttribute
-  override def newInstance(): NamedExpression = OuterReference(e.newInstance())
+  override def newInstance(): NamedExpression =
+OuterReference(e.newInstance()).setNameParts(nameParts)
   final override val nodePatterns: Seq[TreePattern] = Seq(OUTER_REFERENCE)
+
+  // optional field, the original name parts of UnresolvedAttribute before it 
is resolved to
+  // OuterReference. Used in rule ResolveLateralColumnAlias to convert 
OuterReference back to
+  // LateralColumnAliasReference.
+  var nameParts: Option[Seq[String]] = None
+  def setNameParts(newNameParts: Option[Seq[String]]): OuterReference = {

Review Comment:
   As we discussed before, I feel it is not safe to do so given the current 
solution in ResolveOuterReference that each rule is applied only once. I made 
up a query (it can't run, just for demonstration):
   ```
   SELECT *
   FROM range(1, 7)
   WHERE (
 SELECT id2
 FROM (SELECT dept * 2.0 AS id, id + 1 AS id2 FROM $testTable)) > 5
   ORDER BY id
   ```
   It is possible that `dept * 2.0` is not resolved because it needs type 
conversion, so the LCA rule doesn't apply. Then it just wraps the `id` in `id + 
1 AS id2` as OuterReference.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] WolverineJiang commented on pull request #39000: [SPARK-41461][BUILD][CORE] Support user configurable protoc executables when building Spark Core.

2022-12-09 Thread GitBox


WolverineJiang commented on PR #39000:
URL: https://github.com/apache/spark/pull/39000#issuecomment-1344724455

   The pom of the core module has active profiles, and activeByDefault does not 
take effect. Therefore, property is used instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on pull request #39004: [SPARK-41466][BUILD] Change Scala Style configuration to catch AnyFunSuite instead of FunSuite

2022-12-09 Thread GitBox


amaliujia commented on PR #39004:
URL: https://github.com/apache/spark/pull/39004#issuecomment-1344680268

   late LGTM thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] anchovYu commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer

2022-12-09 Thread GitBox


anchovYu commented on code in PR #38776:
URL: https://github.com/apache/spark/pull/38776#discussion_r1044744657


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala:
##
@@ -638,6 +638,14 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
   case UnresolvedWindowExpression(_, windowSpec) =>
 throw 
QueryCompilationErrors.windowSpecificationNotDefinedError(windowSpec.name)
 })
+// This should not happen, resolved Project or Aggregate should 
restore or resolve
+// all lateral column alias references. Add check for extra safe.

Review Comment:
   I didn't add it intentionally. This is because I don't want those attributes 
actually can be resolve as LCA but to show in the error msg as 
UnresolvedAttribute. Also note that unlike RemoveTempResolvedColumn, LCARef 
can't be directly resolved to the NamedExpression inside of it because the plan 
won't be right - there is no alias push down.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #38993: [SPARK-41459][SQL] fix thrift server operation log output is empty

2022-12-09 Thread GitBox


viirya commented on PR #38993:
URL: https://github.com/apache/spark/pull/38993#issuecomment-1344649595

   BTW, seems you change the PR description template by mistake. Can you 
restore the template?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a diff in pull request #38993: [SPARK-41459][SQL] fix thrift server operation log output is empty

2022-12-09 Thread GitBox


viirya commented on code in PR #38993:
URL: https://github.com/apache/spark/pull/38993#discussion_r1044740898


##
sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java:
##
@@ -276,12 +276,19 @@ private static StringLayout 
initLayout(OperationLog.LoggingLevel loggingMode) {
 return getLayout(isVerbose, layout);
   }
 
-  public LogDivertAppender(OperationManager operationManager,
-OperationLog.LoggingLevel loggingMode) {
+  public static LogDivertAppender create(OperationManager operationManager,
+ OperationLog.LoggingLevel 
loggingMode) {
+CharArrayWriter writer = new CharArrayWriter();
+return new LogDivertAppender(operationManager, loggingMode, writer);
+  }
+
+  private LogDivertAppender(OperationManager operationManager,
+OperationLog.LoggingLevel loggingMode, CharArrayWriter writer) {

Review Comment:
   Isn't it just to create `CharArrayWriter` in original constructor?
   
   ```suggestion
 public LogDivertAppender(OperationManager operationManager,
OperationLog.LoggingLevel 
loggingMode) {
   CharArrayWriter writer = new CharArrayWriter();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a diff in pull request #38993: [SPARK-41459][SQL] fix thrift server operation log output is empty

2022-12-09 Thread GitBox


viirya commented on code in PR #38993:
URL: https://github.com/apache/spark/pull/38993#discussion_r1044738513


##
sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java:
##
@@ -276,12 +276,19 @@ private static StringLayout 
initLayout(OperationLog.LoggingLevel loggingMode) {
 return getLayout(isVerbose, layout);
   }
 
-  public LogDivertAppender(OperationManager operationManager,
-OperationLog.LoggingLevel loggingMode) {
+  public static LogDivertAppender create(OperationManager operationManager,
+ OperationLog.LoggingLevel 
loggingMode) {
+CharArrayWriter writer = new CharArrayWriter();
+return new LogDivertAppender(operationManager, loggingMode, writer);
+  }
+
+  private LogDivertAppender(OperationManager operationManager,

Review Comment:
   Any reason to change it to private constructor and add `create` for 
constructing?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a diff in pull request #38993: [SPARK-41459][SQL] fix thrift server operation log output is empty

2022-12-09 Thread GitBox


viirya commented on code in PR #38993:
URL: https://github.com/apache/spark/pull/38993#discussion_r1044737028


##
sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java:
##
@@ -276,12 +276,19 @@ private static StringLayout 
initLayout(OperationLog.LoggingLevel loggingMode) {
 return getLayout(isVerbose, layout);
   }
 
-  public LogDivertAppender(OperationManager operationManager,
-OperationLog.LoggingLevel loggingMode) {
+  public static LogDivertAppender create(OperationManager operationManager,
+ OperationLog.LoggingLevel 
loggingMode) {
+CharArrayWriter writer = new CharArrayWriter();
+return new LogDivertAppender(operationManager, loggingMode, writer);
+  }
+
+  private LogDivertAppender(OperationManager operationManager,
+OperationLog.LoggingLevel loggingMode, CharArrayWriter writer) {
 super("LogDivertAppender", initLayout(loggingMode), null, false, true, 
Property.EMPTY_ARRAY,
-new WriterManager(new CharArrayWriter(), "LogDivertAppender",

Review Comment:
   The issue is here using another `CharArrayWriter`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #38982: [SPARK-41376][CORE][3.2] Correct the Netty preferDirectBufs check logic on executor start

2022-12-09 Thread GitBox


dongjoon-hyun commented on PR #38982:
URL: https://github.com/apache/spark/pull/38982#issuecomment-1344618354

   Merged to branch-3.2 too. Thank you, @pan3793 and @Yikun .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc commented on pull request #38608: [SPARK-41080][SQL] Support Bit manipulation function SETBIT

2022-12-09 Thread GitBox


vinodkc commented on PR #38608:
URL: https://github.com/apache/spark/pull/38608#issuecomment-1344618388

   @gengliangwang, can you please review it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint

2022-12-09 Thread GitBox


amaliujia commented on code in PR #38984:
URL: https://github.com/apache/spark/pull/38984#discussion_r1044705096


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##
@@ -305,7 +305,11 @@ class SparkConnectPlanner(session: SparkSession) {
   }
 
   private def transformHint(rel: proto.Hint): LogicalPlan = {
-val params = rel.getParametersList.asScala.map(toCatalystValue).toSeq
+val params = rel.getParametersList.asScala.map(toCatalystValue).toSeq.map{
+  case name: String => UnresolvedAttribute(name)
+  case l: Long => l.toInt

Review Comment:
   This becomes a bit tricky: Python is just one of the clients. Scala also has 
a client in the future.
   
   I actually not sure what is the best way to deal with cross language 
difference. cc @HyukjinKwon any idea?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #38991: [SPARK-41457][PYTHON][TESTS] Refactor type annotations and dependency checks in tests

2022-12-09 Thread GitBox


dongjoon-hyun commented on PR #38991:
URL: https://github.com/apache/spark/pull/38991#issuecomment-1344612390

   Could you check the linter failure?
   ```
   starting mypy data test...
   annotations failed data checks:
   = test session starts 
==
   platform linux -- Python 3.9.5, pytest-7.1.3, pluggy-1.0.0
   rootdir: /__w/spark/spark/python, configfile: ../dev/pyproject.toml
   plugins: mypy-plugins-1.9.3
   collected 40 items
   
   python/pyspark/ml/tests/typing/test_classification.yml FF[  
5%]
   python/pyspark/ml/tests/typing/test_clustering.yaml F[  
7%]
   python/pyspark/ml/tests/typing/test_evaluation.yml F [ 
10%]
   python/pyspark/ml/tests/typing/test_feature.yml FF   [ 
15%]
   python/pyspark/ml/tests/typing/test_param.yml F  [ 
17%]
   python/pyspark/ml/tests/typing/test_readable.yml F   [ 
20%]
   python/pyspark/ml/tests/typing/test_regression.yml FFF   [ 
27%]
   python/pyspark/sql/tests/typing/test_column.yml F[ 
30%]
   python/pyspark/sql/tests/typing/test_dataframe.yml FFF   [ 
47%]
   python/pyspark/sql/tests/typing/test_functions.yml F [ 
50%]
   python/pyspark/sql/tests/typing/test_readwriter.yml FF   [ 
55%]
   python/pyspark/sql/tests/typing/test_session.yml F   [ 
67%]
   python/pyspark/sql/tests/typing/test_udf.yml FFF [ 
85%]
   python/pyspark/tests/typing/test_context.yml F   [ 
87%]
   python/pyspark/tests/typing/test_core.yml F  [ 
90%]
   python/pyspark/tests/typing/test_rdd.yml FFF [ 
97%]
   python/pyspark/tests/typing/test_resultiterable.yml F
[100%]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xinrong-meng commented on a diff in pull request #38946: [SPARK-41414][CONNECT][PYTHON] Implement date/timestamp functions

2022-12-09 Thread GitBox


xinrong-meng commented on code in PR #38946:
URL: https://github.com/apache/spark/pull/38946#discussion_r1044694589


##
python/pyspark/sql/tests/connect/test_connect_function.py:
##
@@ -645,6 +645,153 @@ def test_string_functions(self):
 sdf.select(SF.encode("c", "UTF-8")).toPandas(),
 )
 
+# TODO(SPARK-41283): To compare toPandas for test cases with dtypes marked
+def test_date_ts_functions(self):
+from pyspark.sql import functions as SF
+from pyspark.sql.connect import functions as CF
+
+query = """
+SELECT * FROM VALUES
+('1997/02/28 10:30:00', '2023/03/01 06:00:00', 'JST', 1428476400, 
2020, 12, 6),
+('2000/01/01 04:30:05', '2020/05/01 12:15:00', 'PST', 1403892395, 
2022, 12, 6)
+AS tab(ts1, ts2, tz, seconds, Y, M, D)
+"""
+# +---+---+---+--++---+---+
+# |ts1|ts2| tz|   seconds|   Y|  M|  D|
+# +---+---+---+--++---+---+
+# |1997/02/28 10:30:00|2023/03/01 06:00:00|JST|1428476400|2020| 12|  6|
+# |2000/01/01 04:30:05|2020/05/01 12:15:00|PST|1403892395|2022| 12|  6|
+# +---+---+---+--++---+---+
+
+cdf = self.connect.sql(query)
+sdf = self.spark.sql(query)
+
+# With no parameters
+for cfunc, sfunc in [
+(CF.current_date, SF.current_date),
+]:
+self.assert_eq(
+cdf.select(cfunc()).toPandas(),
+sdf.select(sfunc()).toPandas(),
+)
+
+# current_timestamp
+# [left]:  datetime64[ns, America/Los_Angeles]
+# [right]: datetime64[ns]
+plan = cdf.select(CF.current_timestamp())._plan.to_proto(self.connect)
+self.assertEqual(
+plan.root.project.expressions.unresolved_function.function_name, 
"current_timestamp"

Review Comment:
   Thank you both! The tests are improved as 
https://github.com/apache/spark/pull/38946/commits/0493a01a4e39795a522cdc58f8a69b0ed261c832.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #38999: [SPARK-41450][BUILD] Fix shading in `core` module

2022-12-09 Thread GitBox


gengliangwang commented on PR #38999:
URL: https://github.com/apache/spark/pull/38999#issuecomment-1344591174

   @pan3793 Thanks for fixing it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics

2022-12-09 Thread GitBox


otterc commented on code in PR #36165:
URL: https://github.com/apache/spark/pull/36165#discussion_r1044679570


##
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:
##
@@ -726,6 +736,61 @@ final class ShuffleBlockFetcherIterator(
 }
   }
 
+  // Number of map blocks in a ShuffleBlockChunk
+  private def getShuffleChunkCardinality(blockId: ShuffleBlockChunkId): Int = {
+
pushBasedFetchHelper.getRoaringBitMap(blockId).map(_.getCardinality).getOrElse(0)
+  }
+
+  // Check if the merged block is local to the host or not
+  private def isLocalMergedBlockAddress(address: BlockManagerId): Boolean = {
+address.executorId.isEmpty && address.host == 
blockManager.blockManagerId.host

Review Comment:
   This method is there in PushBasedFetchHelper. 
`PushBasedFetchHelper.isLocalPushMergedBlockAddress`. We should try to just use 
that.  We can maybe make that static. The executor id is not empty. It's 
`SHUFFLE_MERGER_IDENTIFIER`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] steveloughran commented on pull request #38974: [SPARK-41392][BUILD] Make maven build Spark master with Hadoop 3.4.0-SNAPSHOT successful

2022-12-09 Thread GitBox


steveloughran commented on PR #38974:
URL: https://github.com/apache/spark/pull/38974#issuecomment-1344581573

   yeah, not going to happen for a while; 3.3.5 RC0 coming soon though; just 
trying to wrap up an abfs prefetch bug


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #38974: [SPARK-41392][BUILD] Make maven build Spark master with Hadoop 3.4.0-SNAPSHOT successful

2022-12-09 Thread GitBox


LuciferYang commented on PR #38974:
URL: https://github.com/apache/spark/pull/38974#issuecomment-1344577731

   fine to me, close first ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang closed pull request #38974: [SPARK-41392][BUILD] Make maven build Spark master with Hadoop 3.4.0-SNAPSHOT successful

2022-12-09 Thread GitBox


LuciferYang closed pull request #38974: [SPARK-41392][BUILD] Make maven build 
Spark master with Hadoop 3.4.0-SNAPSHOT successful
URL: https://github.com/apache/spark/pull/38974


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #38997: [SPARK-41462][SQL] Date and timestamp type can up cast to TimestampNTZ

2022-12-09 Thread GitBox


MaxGekk commented on PR #38997:
URL: https://github.com/apache/spark/pull/38997#issuecomment-1344574665

   +1, LGTM. Merging to master.
   Thank you, @gengliangwang and @cloud-fan for review.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39012: [MINOR][CONNECT]: Fix command error and typo

2022-12-09 Thread GitBox


dongjoon-hyun commented on code in PR #39012:
URL: https://github.com/apache/spark/pull/39012#discussion_r1044671065


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##
@@ -322,8 +322,7 @@ class SparkConnectPlanner(session: SparkSession) {
 None,
 rel.getVariableColumnName,
 Seq(rel.getValueColumnName),
-transformRelation(rel.getInput)
-  )
+transformRelation(rel.getInput))

Review Comment:
   Is this required for linter?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39012: [MINOR][CONNECT]: Fix command error and typo

2022-12-09 Thread GitBox


dongjoon-hyun commented on code in PR #39012:
URL: https://github.com/apache/spark/pull/39012#discussion_r1044670574


##
dev/lint-scala:
##
@@ -29,14 +29,14 @@ ERRORS=$(./build/mvn \
 -Dscalafmt.skip=false \
 -Dscalafmt.validateOnly=true \
 -Dscalafmt.changedOnly=false \
--pl connector/connect \
+-pl connector/connect/server \

Review Comment:
   For this case, we need official JIRA ID instead of `MINOR`. Could you file 
one, @dengziming ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39004: [SPARK-41466][BUILD] Change Scala Style configuration to catch AnyFunSuite instead of FunSuite

2022-12-09 Thread GitBox


dongjoon-hyun commented on PR #39004:
URL: https://github.com/apache/spark/pull/39004#issuecomment-1344576249

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #39004: [SPARK-41466][BUILD] Change Scala Style configuration to catch AnyFunSuite instead of FunSuite

2022-12-09 Thread GitBox


dongjoon-hyun closed pull request #39004: [SPARK-41466][BUILD] Change Scala 
Style configuration to catch AnyFunSuite instead of FunSuite
URL: https://github.com/apache/spark/pull/39004


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk closed pull request #38997: [SPARK-41462][SQL] Date and timestamp type can up cast to TimestampNTZ

2022-12-09 Thread GitBox


MaxGekk closed pull request #38997: [SPARK-41462][SQL] Date and timestamp type 
can up cast to TimestampNTZ
URL: https://github.com/apache/spark/pull/38997


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] rednaxelafx commented on pull request #38923: [SPARK-41395][SQL] `InterpretedMutableProjection` should use `setDecimal` to set null values for decimals in an unsafe row

2022-12-09 Thread GitBox


rednaxelafx commented on PR #38923:
URL: https://github.com/apache/spark/pull/38923#issuecomment-1344558867

   Post-hoc review: LGTM, this is a good catch. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] warrenzhu25 commented on pull request #39011: [WIP][SPARK-41469][CORE] Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated

2022-12-09 Thread GitBox


warrenzhu25 commented on PR #39011:
URL: https://github.com/apache/spark/pull/39011#issuecomment-1344553081

   > cc @warrenzhu25 too
   
   It's really the change I want. Great work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wineternity commented on a diff in pull request #38702: [SPARK-41187][CORE] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-12-09 Thread GitBox


wineternity commented on code in PR #38702:
URL: https://github.com/apache/spark/pull/38702#discussion_r1044596644


##
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala:
##
@@ -674,22 +674,30 @@ private[spark] class AppStatusListener(
   delta
 }.orNull
 
-val (completedDelta, failedDelta, killedDelta) = event.reason match {
+// SPARK-41187: For `SparkListenerTaskEnd` with `Resubmitted` reason, 
which is raised by
+// executor lost, it can lead to negative `LiveStage.activeTasks` since 
there's no
+// corresponding `SparkListenerTaskStart` event for each of them. The 
negative activeTasks
+// will make the stage always remains in the live stage list as it can 
never meet the
+// condition activeTasks == 0. This in turn causes the dead executor to 
never be retained
+// if that live stage's submissionTime is less than the dead executor's 
removeTime.
+val (completedDelta, failedDelta, killedDelta, activeTasksDelta) = 
event.reason match {

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a diff in pull request #38702: [SPARK-41187][CORE] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-12-09 Thread GitBox


Ngone51 commented on code in PR #38702:
URL: https://github.com/apache/spark/pull/38702#discussion_r1044581470


##
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala:
##
@@ -674,22 +674,30 @@ private[spark] class AppStatusListener(
   delta
 }.orNull
 
-val (completedDelta, failedDelta, killedDelta) = event.reason match {
+// SPARK-41187: For `SparkListenerTaskEnd` with `Resubmitted` reason, 
which is raised by
+// executor lost, it can lead to negative `LiveStage.activeTasks` since 
there's no
+// corresponding `SparkListenerTaskStart` event for each of them. The 
negative activeTasks
+// will make the stage always remains in the live stage list as it can 
never meet the
+// condition activeTasks == 0. This in turn causes the dead executor to 
never be retained
+// if that live stage's submissionTime is less than the dead executor's 
removeTime.
+val (completedDelta, failedDelta, killedDelta, activeTasksDelta) = 
event.reason match {

Review Comment:
   nit:
   ```suggestion
   val (completedDelta, failedDelta, killedDelta, activeDelta) = 
event.reason match {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dengziming opened a new pull request, #39012: [MINOR][CONNECT]: Fix command error and typo

2022-12-09 Thread GitBox


dengziming opened a new pull request, #39012:
URL: https://github.com/apache/spark/pull/39012

   ### What changes were proposed in this pull request?
   We separate connect into server and common, but failed to update the 
`lint-scala` tools.
   fix a typo: fase -> false
   format the code.
   
   
   ### Why are the changes needed?
   We can check the scala code format without it.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #38999: [SPARK-41450][BUILD] Fix shading in `core` module

2022-12-09 Thread GitBox


dongjoon-hyun commented on PR #38999:
URL: https://github.com/apache/spark/pull/38999#issuecomment-1344459575

   Merged to master. Thank you, @pan3793 and all!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #38999: [SPARK-41450][BUILD] Fix shading in `core` module

2022-12-09 Thread GitBox


dongjoon-hyun closed pull request #38999: [SPARK-41450][BUILD] Fix shading in 
`core` module
URL: https://github.com/apache/spark/pull/38999


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #39011: [WIP][SPARK-41469][CORE] Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated

2022-12-09 Thread GitBox


Ngone51 commented on PR #39011:
URL: https://github.com/apache/spark/pull/39011#issuecomment-1344456134

   Mark as WIP first regarding the compilation error and missing ut. Any 
feedback is still welcome.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-12-09 Thread GitBox


vinodkc commented on PR #38146:
URL: https://github.com/apache/spark/pull/38146#issuecomment-1344455185

   @gengliangwang, Review comments are addressed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a diff in pull request #39011: [SPARK-41469][CORE] Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated

2022-12-09 Thread GitBox


Ngone51 commented on code in PR #39011:
URL: https://github.com/apache/spark/pull/39011#discussion_r1044572392


##
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:
##
@@ -1046,17 +1046,46 @@ private[spark] class TaskSetManager(
 
   /** Called by TaskScheduler when an executor is lost so we can re-enqueue 
our tasks */
   override def executorLost(execId: String, host: String, reason: 
ExecutorLossReason): Unit = {
-// Re-enqueue any tasks that ran on the failed executor if this is a 
shuffle map stage,
-// and we are not using an external shuffle server which could serve the 
shuffle outputs.
-// The reason is the next stage wouldn't be able to fetch the data from 
this dead executor
-// so we would need to rerun these tasks on other executors.
-if (isShuffleMapTasks && !env.blockManager.externalShuffleServiceEnabled 
&& !isZombie) {
+// Re-enqueue any tasks with potential shuffle data loss that ran on the 
failed executor
+// if this is a shuffle map stage, and we are not using an external 
shuffle server which
+// could serve the shuffle outputs or the executor lost is caused by 
decommission (which
+// can destroy the whole host). The reason is the next stage wouldn't be 
able to fetch the
+// data from this dead executor so we would need to rerun these tasks on 
other executors.
+val maybeShuffleMapOutputLoss = isShuffleMapTasks &&
+  (reason.isInstanceOf[ExecutorDecommission] || 
!env.blockManager.externalShuffleServiceEnabled)
+if (maybeShuffleMapOutputLoss && !isZombie) {
   for ((tid, info) <- taskInfos if info.executorId == execId) {
 val index = info.index
+lazy val isShuffleMapOutputAvailable = reason match {
+  case ExecutorDecommission(_, _) =>
+val shuffleId = tasks(0).asInstanceOf[ShuffleMapTask].shuffleId
+val mapId = if (conf.get(config.SHUFFLE_USE_OLD_FETCH_PROTOCOL)) {
+  info.partitionId
+} else {
+  tid
+}
+val locationOpt = 
env.mapOutputTracker.asInstanceOf[MapOutputTrackerMaster]
+  .getMapOutputLocation(shuffleId, mapId)
+// There are 3 cases of locationOpt:
+// 1) locationOpt.isDefined && locationOpt.get.host == host:
+//this case implies that the shuffle map output is still on 
the lost executor. The
+//map output file is supposed to lose so we should rerun this 
task;
+// 2) locationOpt.isDefined && locationOpt.get.host != host:
+//this case implies that the shuffle map output has been 
migrated to another
+//host. The task doesn't need to rerun;
+// 3) location.isEmpty:

Review Comment:
   ```suggestion
   // 3) locationOpt.isEmpty:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #39011: [SPARK-41469][CORE] Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated

2022-12-09 Thread GitBox


Ngone51 commented on PR #39011:
URL: https://github.com/apache/spark/pull/39011#issuecomment-137148

   cc @warrenzhu25 too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 opened a new pull request, #39011: [SPARK-41469][CORE] Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated

2022-12-09 Thread GitBox


Ngone51 opened a new pull request, #39011:
URL: https://github.com/apache/spark/pull/39011

   
   
   ### What changes were proposed in this pull request?
   
   
   This PR proposes to avoid rerunning the finished shuffle map task in 
`TaskSetManager.executorLost()` if the executor lost is caused by decommission 
and the shuffle data has been successfully migrated.
   
   ### Why are the changes needed?
   
   
   To avoid unnecessary task recomputation.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No.
   
   
   ### How was this patch tested?
   
   
   (Will try to add unit tests tmr)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >