[GitHub] [spark] grundprinzip commented on a diff in pull request #39016: [SPARK-41477][CONNECT][PYTHON] Correctly infer the datatype of literal integers
grundprinzip commented on code in PR #39016: URL: https://github.com/apache/spark/pull/39016#discussion_r1045017515 ## python/pyspark/sql/connect/column.py: ## @@ -180,7 +180,12 @@ def to_plan(self, session: "SparkConnectClient") -> "proto.Expression": elif isinstance(self._value, bool): expr.literal.boolean = bool(self._value) elif isinstance(self._value, int): -expr.literal.long = int(self._value) +if -2147483648 <= self._value <= 2147483647: Review Comment: can we extract these magic values out into "constants"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #38972: [SPARK-41443][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1061
MaxGekk commented on PR #38972: URL: https://github.com/apache/spark/pull/38972#issuecomment-1345173649 @panbingkun Could you fix the test failure, seems it is related to your changes: ``` sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: "[COLUMN_NOT_FOUND] The column `some_random_column` cannot be found. Verify the spelling and correctness of the column name according to the SQL config "spark.sql.caseSensitive"." did not contain "does not exist" at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) at org.apache.spark.sql.StatisticsCollectionSuite.$anonfun$new$15(StatisticsCollectionSuite.scala:136) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a diff in pull request #38998: [SPARK-41463][SQL][TESTS] Ensure error class names contain only capital letters, numbers and underscores
MaxGekk commented on code in PR #38998: URL: https://github.com/apache/spark/pull/38998#discussion_r1045013373 ## core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala: ## @@ -147,6 +147,18 @@ class SparkThrowableSuite extends SparkFunSuite { assert(rereadErrorClassToInfoMap == errorReader.errorInfoMap) } + test("Error class names should contain only capital letters, numbers and underscores") { +val allowedChars = "[A-Z0-9_]*" +errorReader.errorInfoMap.foreach { e => + assert(e._1.matches(allowedChars)) Review Comment: This one doesn't show which error class among 1500 is invalid, see: ``` e._1.matches(allowedChars) was false ScalaTestFailureLocation: org.apache.spark.SparkThrowableSuite at (SparkThrowableSuite.scala:153) org.scalatest.exceptions.TestFailedException: e._1.matches(allowedChars) was false ``` Could you output the error class name: ```suggestion assert(e._1.matches(allowedChars), s"The error class is invalid: ${e._1}") ``` ## core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala: ## @@ -147,6 +147,18 @@ class SparkThrowableSuite extends SparkFunSuite { assert(rereadErrorClassToInfoMap == errorReader.errorInfoMap) } + test("Error class names should contain only capital letters, numbers and underscores") { +val allowedChars = "[A-Z0-9_]*" +errorReader.errorInfoMap.foreach { e => + assert(e._1.matches(allowedChars)) + e._2.subClass.map { s => +s.keys.foreach { k => + assert(k.matches(allowedChars)) Review Comment: ```suggestion assert(k.matches(allowedChars), s"The error sub-class is invalid: $k") ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk closed pull request #38954: [SPARK-41417][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0019` to `INVALID_TYPED_LITERAL`
MaxGekk closed pull request #38954: [SPARK-41417][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0019` to `INVALID_TYPED_LITERAL` URL: https://github.com/apache/spark/pull/38954 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #38954: [SPARK-41417][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0019` to `INVALID_TYPED_LITERAL`
MaxGekk commented on PR #38954: URL: https://github.com/apache/spark/pull/38954#issuecomment-1345157464 +1, LGTM. Merging to master. Thank you, @LuciferYang. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #39012: [SPARK-41475][CONNECT] Fix lint-scala command error and typo
AmplabJenkins commented on PR #39012: URL: https://github.com/apache/spark/pull/39012#issuecomment-1345155848 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #39013: [SPARK-41472][CONNECT][PYTHON] Implement the rest of string/binary functions
AmplabJenkins commented on PR #39013: URL: https://github.com/apache/spark/pull/39013#issuecomment-1345155835 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #39005: [SPARK-41467][BUILD] Upgrade httpclient from 4.5.13 to 4.5.14
dongjoon-hyun commented on PR #39005: URL: https://github.com/apache/spark/pull/39005#issuecomment-1345151502 All tests passed. Merged to master. Thank you, @panbingkun and all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #39005: [SPARK-41467][BUILD] Upgrade httpclient from 4.5.13 to 4.5.14
dongjoon-hyun closed pull request #39005: [SPARK-41467][BUILD] Upgrade httpclient from 4.5.13 to 4.5.14 URL: https://github.com/apache/spark/pull/39005 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #39015: [SPARK-41476][INFRA] Prevent `README.md` from triggering CIs
dongjoon-hyun commented on PR #39015: URL: https://github.com/apache/spark/pull/39015#issuecomment-1345151056 Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng opened a new pull request, #39016: [SPARK-41477][CONNECT][PYTHON] Correctly infer the datatype of literal integers
zhengruifeng opened a new pull request, #39016: URL: https://github.com/apache/spark/pull/39016 ### What changes were proposed in this pull request? check the bounds of integer and choose the correct datatypes ### Why are the changes needed? to match pyspark: ``` In [15]: int_max = 2147483647 In [16]: long_max = 9223372036854775807 In [17]: sdf = spark.range(0, 1) In [18]: sdf.select(lit(0), lit(int_max), lit(int_max + 1), lit(long_max)) Out[18]: DataFrame[0: int, 2147483647: int, 2147483648: bigint, 9223372036854775807: bigint] In [19]: sdf.select(lit(0), lit(int_max), lit(int_max + 1), lit(long_max), lit(long_max + 1)) --- Py4JError Traceback (most recent call last) Cell In[19], line 1 > 1 sdf.select(lit(0), lit(int_max), lit(int_max + 1), lit(long_max), lit(long_max + 1)) File ~/Dev/spark/python/pyspark/sql/functions.py:179, in lit(col) 177 if dt is not None: 178 return _invoke_function("lit", col).astype(dt) --> 179 return _invoke_function("lit", col) File ~/Dev/spark/python/pyspark/sql/functions.py:88, in _invoke_function(name, *args) 86 assert SparkContext._active_spark_context is not None 87 jf = _get_jvm_function(name, SparkContext._active_spark_context) ---> 88 return Column(jf(*args)) ... File ~/Dev/spark/python/pyspark/sql/utils.py:199, in capture_sql_exception..deco(*a, **kw) 197 def deco(*a: Any, **kw: Any) -> Any: 198 try: --> 199 return f(*a, **kw) 200 except Py4JJavaError as e: 201 converted = convert_exception(e.java_exception) File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:330, in get_return_value(answer, gateway_client, target_id, name) 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". 328 format(target_id, ".", name), value) 329 else: --> 330 raise Py4JError( 331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n". 332 format(target_id, ".", name, value)) 333 else: 334 raise Py4JError( 335 "An error occurred while calling {0}{1}{2}". 336 format(target_id, ".", name)) Py4JError: An error occurred while calling z:org.apache.spark.sql.functions.lit. Trace: java.lang.NumberFormatException: For input string: "9223372036854775808" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ``` ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? added ut -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #39015: [SPARK-41476][INFRA] Prevent `README.md` from triggering CIs
dongjoon-hyun commented on PR #39015: URL: https://github.com/apache/spark/pull/39015#issuecomment-1345146063 BTW, if you don't mind, I'd like to land this to all live branches to reduce the community resources. WDTY, @HyukjinKwon and @gengliangwang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #38991: [SPARK-41457][PYTHON][TESTS] Refactor type annotations and dependency checks in tests
dongjoon-hyun closed pull request #38991: [SPARK-41457][PYTHON][TESTS] Refactor type annotations and dependency checks in tests URL: https://github.com/apache/spark/pull/38991 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #38991: [SPARK-41457][PYTHON][TESTS] Refactor type annotations and dependency checks in tests
dongjoon-hyun commented on PR #38991: URL: https://github.com/apache/spark/pull/38991#issuecomment-1345145538 All python and linter tests passed. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #39012: [SPARK-41475][CONNECT] Fix lint-scala command error and typo
dongjoon-hyun closed pull request #39012: [SPARK-41475][CONNECT] Fix lint-scala command error and typo URL: https://github.com/apache/spark/pull/39012 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39015: [SPARK-41476][INFRA] Prevent `README.md` from triggering CIs
dongjoon-hyun commented on code in PR #39015: URL: https://github.com/apache/spark/pull/39015#discussion_r1044989412 ## dev/sparktestsupport/utils.py: ## @@ -34,19 +34,22 @@ def determine_modules_for_files(filenames): Given a list of filenames, return the set of modules that contain those files. If a file is not associated with a more specific submodule, then this method will consider that file to belong to the 'root' module. `.github` directory is counted only in GitHub Actions, -and `appveyor.yml` is always ignored because this file is dedicated only to AppVeyor builds. +and `appveyor.yml` is always ignored because this file is dedicated only to AppVeyor builds, +and `README.md` is always ignored too. >>> sorted(x.name for x in determine_modules_for_files(["python/pyspark/a.py", "sql/core/foo"])) ['pyspark-core', 'sql'] >>> [x.name for x in determine_modules_for_files(["file_not_matched_by_any_subproject"])] ['root'] ->>> [x.name for x in determine_modules_for_files(["appveyor.yml"])] +>>> [x.name for x in determine_modules_for_files(["appveyor.yml", "sql/README.md"])] [] """ changed_modules = set() for filename in filenames: if filename in ("appveyor.yml",): continue +if filename.endswith("README.md"): Review Comment: Since we need `endswith` for `README.md`, we have a new `if` statement, @HyukjinKwon . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39015: [SPARK-41476][INFRA] Prevent `README.md` from triggering CIs
dongjoon-hyun commented on code in PR #39015: URL: https://github.com/apache/spark/pull/39015#discussion_r1044988801 ## dev/sparktestsupport/utils.py: ## @@ -84,7 +84,7 @@ def identify_changed_files_from_git_commits(patch_sha, target_branch=None, targe ["git", "diff", "--name-only", patch_sha, diff_target], universal_newlines=True ) # Remove any empty strings -return [f for f in raw_output.split("\n") if f] +return [f for f in raw_output.split("\n") if f and not f.endswith("README.md")] Review Comment: Oh, right. I forgot that. Let me move this there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shuyouZZ commented on a diff in pull request #38983: [SPARK-41447][CORE] Clean up expired event log files that don't exist in listing db
shuyouZZ commented on code in PR #38983: URL: https://github.com/apache/spark/pull/38983#discussion_r1044983924 ## core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala: ## @@ -1705,6 +1705,47 @@ abstract class FsHistoryProviderSuite extends SparkFunSuite with Matchers with P provider.stop() } + test("clean up expired event log files that don't exist in listing db") { Review Comment: > Could you add a test prefix like the following? > > ```scala > - test("clean up expired event log files that don't exist in listing db") { > + test("SPARK-41447: clean up expired event log files that don't exist in listing db") { > ``` @dongjoon-hyun, could you review this too, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #38993: [SPARK-41459][SQL] fix thrift server operation log output is empty
viirya commented on PR #38993: URL: https://github.com/apache/spark/pull/38993#issuecomment-1345117121 > BTW, seems you change the PR description template by mistake. Can you restore the template? Can you restore to the standard template? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a diff in pull request #38993: [SPARK-41459][SQL] fix thrift server operation log output is empty
viirya commented on code in PR #38993: URL: https://github.com/apache/spark/pull/38993#discussion_r1044982387 ## sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java: ## @@ -276,12 +276,19 @@ private static StringLayout initLayout(OperationLog.LoggingLevel loggingMode) { return getLayout(isVerbose, layout); } - public LogDivertAppender(OperationManager operationManager, -OperationLog.LoggingLevel loggingMode) { + public static LogDivertAppender create(OperationManager operationManager, + OperationLog.LoggingLevel loggingMode) { +CharArrayWriter writer = new CharArrayWriter(); +return new LogDivertAppender(operationManager, loggingMode, writer); + } + + private LogDivertAppender(OperationManager operationManager, +OperationLog.LoggingLevel loggingMode, CharArrayWriter writer) { Review Comment: Oh, yea. I didn't run the suggested one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] idealspark commented on a diff in pull request #38993: [SPARK-41459][SQL] fix thrift server operation log output is empty
idealspark commented on code in PR #38993: URL: https://github.com/apache/spark/pull/38993#discussion_r1044970950 ## sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java: ## @@ -276,12 +276,19 @@ private static StringLayout initLayout(OperationLog.LoggingLevel loggingMode) { return getLayout(isVerbose, layout); } - public LogDivertAppender(OperationManager operationManager, -OperationLog.LoggingLevel loggingMode) { + public static LogDivertAppender create(OperationManager operationManager, + OperationLog.LoggingLevel loggingMode) { +CharArrayWriter writer = new CharArrayWriter(); +return new LogDivertAppender(operationManager, loggingMode, writer); + } + + private LogDivertAppender(OperationManager operationManager, Review Comment: we need CharArrayWriter as parameter pass to LogDivertAppender constructor , then pass to supper constructor. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] idealspark commented on a diff in pull request #38993: [SPARK-41459][SQL] fix thrift server operation log output is empty
idealspark commented on code in PR #38993: URL: https://github.com/apache/spark/pull/38993#discussion_r1044969362 ## sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java: ## @@ -276,12 +276,19 @@ private static StringLayout initLayout(OperationLog.LoggingLevel loggingMode) { return getLayout(isVerbose, layout); } - public LogDivertAppender(OperationManager operationManager, -OperationLog.LoggingLevel loggingMode) { + public static LogDivertAppender create(OperationManager operationManager, + OperationLog.LoggingLevel loggingMode) { +CharArrayWriter writer = new CharArrayWriter(); +return new LogDivertAppender(operationManager, loggingMode, writer); + } + + private LogDivertAppender(OperationManager operationManager, +OperationLog.LoggingLevel loggingMode, CharArrayWriter writer) { Review Comment: this is no way to create CharArrayWriter in original constructor. because original constructor first need call supper constructor. supper constructor need a CharArrayWriter as parameter. ``` private LogDivertAppender(OperationManager operationManager, OperationLog.LoggingLevel loggingMode, CharArrayWriter writer) { super("LogDivertAppender", initLayout(loggingMode), null, false, true, Property.EMPTY_ARRAY, new WriterManager(writer, "LogDivertAppender", initLayout(loggingMode), true)); } ``` if create CharArrayWriter in original constructor like you suggest compile will failed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dengziming commented on a diff in pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint
dengziming commented on code in PR #38984: URL: https://github.com/apache/spark/pull/38984#discussion_r1044965012 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -305,7 +305,11 @@ class SparkConnectPlanner(session: SparkSession) { } private def transformHint(rel: proto.Hint): LogicalPlan = { -val params = rel.getParametersList.asScala.map(toCatalystValue).toSeq +val params = rel.getParametersList.asScala.map(toCatalystValue).toSeq.map{ + case name: String => UnresolvedAttribute(name) + case l: Long => l.toInt Review Comment: Hi @HyukjinKwon , using a long value will fail scala pattern match here since it will only accept a int value: https://github.com/apache/spark/blob/c78f935185cbfae2e45adb393def0bd389a89c7e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala#L242-L249 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] idealspark commented on a diff in pull request #38993: [SPARK-41459][SQL] fix thrift server operation log output is empty
idealspark commented on code in PR #38993: URL: https://github.com/apache/spark/pull/38993#discussion_r1044964246 ## sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java: ## @@ -276,12 +276,19 @@ private static StringLayout initLayout(OperationLog.LoggingLevel loggingMode) { return getLayout(isVerbose, layout); } - public LogDivertAppender(OperationManager operationManager, -OperationLog.LoggingLevel loggingMode) { + public static LogDivertAppender create(OperationManager operationManager, + OperationLog.LoggingLevel loggingMode) { +CharArrayWriter writer = new CharArrayWriter(); +return new LogDivertAppender(operationManager, loggingMode, writer); + } + + private LogDivertAppender(OperationManager operationManager, +OperationLog.LoggingLevel loggingMode, CharArrayWriter writer) { super("LogDivertAppender", initLayout(loggingMode), null, false, true, Property.EMPTY_ARRAY, -new WriterManager(new CharArrayWriter(), "LogDivertAppender", Review Comment: yes, it use another CharArrayWriter, and in this class append() method does not call supper.append() so CharArrayWriter is empty. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] panbingkun commented on a diff in pull request #38972: [SPARK-41443][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1061
panbingkun commented on code in PR #38972: URL: https://github.com/apache/spark/pull/38972#discussion_r1044960581 ## core/src/main/resources/error/error-classes.json: ## @@ -109,6 +109,11 @@ "The column already exists. Consider to choose another name or rename the existing column." ] }, + "COLUMN_NOT_FOUND" : { +"message" : [ + "The column does not exist." Review Comment: Done ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala: ## @@ -155,7 +155,7 @@ class ResolveSessionCatalog(val catalogManager: CatalogManager) case DescribeColumn(ResolvedV1TableIdentifier(ident), column, isExtended, output) => column match { case u: UnresolvedAttribute => - throw QueryCompilationErrors.columnDoesNotExistError(u.name) + throw QueryCompilationErrors.columnNotExistError(u.name) Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint
HyukjinKwon commented on code in PR #38984: URL: https://github.com/apache/spark/pull/38984#discussion_r1044958569 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -305,7 +305,11 @@ class SparkConnectPlanner(session: SparkSession) { } private def transformHint(rel: proto.Hint): LogicalPlan = { -val params = rel.getParametersList.asScala.map(toCatalystValue).toSeq +val params = rel.getParametersList.asScala.map(toCatalystValue).toSeq.map{ + case name: String => UnresolvedAttribute(name) + case l: Long => l.toInt Review Comment: sorry if I am being dumb but why do we need pass a int literal? Can't we just pass long literal? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint
HyukjinKwon commented on code in PR #38984: URL: https://github.com/apache/spark/pull/38984#discussion_r1044958569 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -305,7 +305,11 @@ class SparkConnectPlanner(session: SparkSession) { } private def transformHint(rel: proto.Hint): LogicalPlan = { -val params = rel.getParametersList.asScala.map(toCatalystValue).toSeq +val params = rel.getParametersList.asScala.map(toCatalystValue).toSeq.map{ + case name: String => UnresolvedAttribute(name) + case l: Long => l.toInt Review Comment: sorry if I am being dumb but why do we need pass a int literal? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a diff in pull request #38876: [SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost
Ngone51 commented on code in PR #38876: URL: https://github.com/apache/spark/pull/38876#discussion_r1044955782 ## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ## @@ -637,9 +637,11 @@ private[spark] class BlockManager( def reregister(): Unit = { // TODO: We might need to rate limit re-registering. logInfo(s"BlockManager $blockManagerId re-registering with master") -master.registerBlockManager(blockManagerId, diskBlockManager.localDirsString, maxOnHeapMemory, - maxOffHeapMemory, storageEndpoint) -reportAllBlocks() +val id = master.registerBlockManager(blockManagerId, diskBlockManager.localDirsString, + maxOnHeapMemory, maxOffHeapMemory, storageEndpoint, isReRegister = true) +if (id.executorId != BlockManagerId.INVALID_EXECUTOR_ID) { + reportAllBlocks() +} Review Comment: Added the termination logic here. cc @mridulm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38991: [SPARK-41457][PYTHON][TESTS] Refactor type annotations and dependency checks in tests
HyukjinKwon commented on code in PR #38991: URL: https://github.com/apache/spark/pull/38991#discussion_r1044955147 ## dev/lint-python: ## @@ -104,7 +104,7 @@ function mypy_data_test { -c dev/pyproject.toml \ --rootdir python \ --mypy-only-local-stub \ - --mypy-ini-file python/mypy.ini \ + --mypy-ini-file dev/mypy.ini \ Review Comment: Okay, MyPy requires the configuration file to locate in the root directory of the package. Let me move it back for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39015: [SPARK-41476][INFRA] Prevent `README.md` from triggering CIs
HyukjinKwon commented on code in PR #39015: URL: https://github.com/apache/spark/pull/39015#discussion_r1044952894 ## dev/sparktestsupport/utils.py: ## @@ -84,7 +84,7 @@ def identify_changed_files_from_git_commits(patch_sha, target_branch=None, targe ["git", "diff", "--name-only", patch_sha, diff_target], universal_newlines=True ) # Remove any empty strings -return [f for f in raw_output.split("\n") if f] +return [f for f in raw_output.split("\n") if f and not f.endswith("README.md")] Review Comment: Should we maybe fix it in L48 `if filename in ("appveyor.yml",):` at `determine_modules_for_files` above? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #39014: [SPARK-41474][PROTOBUF][BUILD] Exclude `proto` files from `spark-protobuf` jar
dongjoon-hyun commented on PR #39014: URL: https://github.com/apache/spark/pull/39014#issuecomment-1344990963 All tests passed. Merged to master for Apache Spark 3.4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #39014: [SPARK-41474][PROTOBUF][BUILD] Exclude `proto` files from `spark-protobuf` jar
dongjoon-hyun closed pull request #39014: [SPARK-41474][PROTOBUF][BUILD] Exclude `proto` files from `spark-protobuf` jar URL: https://github.com/apache/spark/pull/39014 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #39015: [SPARK-41476][INFRA] Prevent `README.md` from triggering CIs
dongjoon-hyun commented on PR #39015: URL: https://github.com/apache/spark/pull/39015#issuecomment-1344988102 Thank you so much, @gengliangwang ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #39015: [SPARK-41476][INFRA] Prevent `README.md` from triggering CIs
dongjoon-hyun commented on PR #39015: URL: https://github.com/apache/spark/pull/39015#issuecomment-1344987151 Could you review this too please, @gengliangwang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1044857157 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( val parseMode: ParseMode = parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode) + + val circularReferenceType: String = parameters.getOrElse("circularReferenceType", "FIELD_NAME") Review Comment: @rangadi @baganokodo2022 thanks for the quick meet. meeting conclusion was to use descriptor type full name and added unit tests with some complex schema. ``` val recordName = fd.getMessageType.getFullName ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request, #39015: [SPARK-41476][INFRA] Prevent `README.md` from triggering CIs
dongjoon-hyun opened a new pull request, #39015: URL: https://github.com/apache/spark/pull/39015 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #39014: [SPARK-41474][PROTOBUF][BUILD] Exclude `proto` files from `spark-protobuf` jar
dongjoon-hyun commented on PR #39014: URL: https://github.com/apache/spark/pull/39014#issuecomment-1344983516 Thank you, @gengliangwang ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #39014: [SPARK-41474][PROTOBUF][BUILD] Exclude `proto` files from `spark-protobuf` jar
dongjoon-hyun commented on PR #39014: URL: https://github.com/apache/spark/pull/39014#issuecomment-1344977841 Thank you, @SandishKumarHN . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SandishKumarHN commented on pull request #39014: [SPARK-41474][PROTOBUF][BUILD] Exclude `proto` files from `spark-protobuf` jar
SandishKumarHN commented on PR #39014: URL: https://github.com/apache/spark/pull/39014#issuecomment-1344977544 @dongjoon-hyun LGTM, thanks for the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #39014: [SPARK-41474][PROTOBUF][BUILD] Exclude `proto` files from `spark-protobuf` jar
dongjoon-hyun commented on PR #39014: URL: https://github.com/apache/spark/pull/39014#issuecomment-1344964855 Could you review this, @SandishKumarHN and @gengliangwang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #38946: [SPARK-41414][CONNECT][PYTHON] Implement date/timestamp functions
zhengruifeng commented on PR #38946: URL: https://github.com/apache/spark/pull/38946#issuecomment-1344961425 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng closed pull request #38946: [SPARK-41414][CONNECT][PYTHON] Implement date/timestamp functions
zhengruifeng closed pull request #38946: [SPARK-41414][CONNECT][PYTHON] Implement date/timestamp functions URL: https://github.com/apache/spark/pull/38946 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a diff in pull request #38996: [SPARK-41008][MLLIB] Follow-up isotonic regression features deduplica…
srowen commented on code in PR #38996: URL: https://github.com/apache/spark/pull/38996#discussion_r1044938891 ## docs/mllib-isotonic-regression.md: ## @@ -43,7 +43,17 @@ best fitting the original data points. which uses an approach to [parallelizing isotonic regression](https://doi.org/10.1007/978-3-642-99789-1_10). The training input is an RDD of tuples of three double values that represent -label, feature and weight in this order. Additionally, IsotonicRegression algorithm has one +label, feature and weight in this order. In case there are multiple tuples with +the same feature then these tuples are aggregated into a single tuple as follows: + +* Aggregated label is the weighted average of all labels. +* Aggregated feature is the weighted average of all equal features. It is possible Review Comment: (Sorry if this double-posts) Yeah I was confused about this too. If feature values are pooled when exactly equal, why average them? if anything that introduces tiny errors -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dengziming commented on a diff in pull request #39012: [ SPARK-41475][CONNECT]: Fix lint-scala command error and typo
dengziming commented on code in PR #39012: URL: https://github.com/apache/spark/pull/39012#discussion_r1044935825 ## dev/lint-scala: ## @@ -29,14 +29,14 @@ ERRORS=$(./build/mvn \ -Dscalafmt.skip=false \ -Dscalafmt.validateOnly=true \ -Dscalafmt.changedOnly=false \ --pl connector/connect \ +-pl connector/connect/server \ Review Comment: Thank you for your reminders, I have filed one and changed the title. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dengziming commented on a diff in pull request #39012: [MINOR][CONNECT]: Fix command error and typo
dengziming commented on code in PR #39012: URL: https://github.com/apache/spark/pull/39012#discussion_r1044932932 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -322,8 +322,7 @@ class SparkConnectPlanner(session: SparkSession) { None, rel.getVariableColumnName, Seq(rel.getValueColumnName), -transformRelation(rel.getInput) - ) +transformRelation(rel.getInput)) Review Comment: Yes, I re-run lint-scala after changing it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #38994: [SPARK-41329][CONNECT] Resolve circular imports in Spark Connect
dongjoon-hyun commented on PR #38994: URL: https://github.com/apache/spark/pull/38994#issuecomment-1344949254 No problem at all. If you checked locally, that's more than enough. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request, #39014: [SPARK-41474][BUILD] Exclude proto files from spark-protobuf
dongjoon-hyun opened a new pull request, #39014: URL: https://github.com/apache/spark/pull/39014 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #38994: [SPARK-41329][CONNECT] Resolve circular imports in Spark Connect
HyukjinKwon commented on PR #38994: URL: https://github.com/apache/spark/pull/38994#issuecomment-1344946168 oh yeah. the tests passed but the linter failed. I just removed one ignore in linter that's not used, and I checked it locally. But sorry I should have waited for the test results. Will make sure on that next time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #38994: [SPARK-41329][CONNECT] Resolve circular imports in Spark Connect
dongjoon-hyun commented on PR #38994: URL: https://github.com/apache/spark/pull/38994#issuecomment-1344945736 Ur, does this pass CI? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #38994: [SPARK-41329][CONNECT] Resolve circular imports in Spark Connect
HyukjinKwon closed pull request #38994: [SPARK-41329][CONNECT] Resolve circular imports in Spark Connect URL: https://github.com/apache/spark/pull/38994 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #38994: [SPARK-41329][CONNECT] Resolve circular imports in Spark Connect
HyukjinKwon commented on PR #38994: URL: https://github.com/apache/spark/pull/38994#issuecomment-1344945421 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38994: [SPARK-41329][CONNECT] Resolve circular imports in Spark Connect
HyukjinKwon commented on code in PR #38994: URL: https://github.com/apache/spark/pull/38994#discussion_r1044928308 ## python/pyspark/sql/connect/column.py: ## @@ -706,28 +705,30 @@ def substr(self, startPos: Union[int, "Column"], length: Union[int, "Column"]) - length_t=type(length), ) ) -from pyspark.sql.connect.function_builder import functions as F if isinstance(length, int): -length_exp = self._lit(length) +length_exp = lit(length) elif isinstance(length, Column): length_exp = length else: raise TypeError("Unsupported type for substr().") if isinstance(startPos, int): -start_exp = self._lit(startPos) +start_exp = lit(startPos) else: start_exp = startPos -return F.substr(self, start_exp, length_exp) +return F.substr(self, start_exp, length_exp) # type: ignore[attr-defined] Review Comment: ```suggestion return F.substr(self, start_exp, length_exp) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a diff in pull request #39000: [SPARK-41461][BUILD][CORE] Support user configurable protoc executables when building Spark Core.
gengliangwang commented on code in PR #39000: URL: https://github.com/apache/spark/pull/39000#discussion_r1044927630 ## core/pom.xml: ## @@ -713,6 +693,71 @@ + + default-protoc + + + !skipDefaultProtoc + + + + + +com.github.os72 +protoc-jar-maven-plugin +${protoc-jar-maven-plugin.version} + + +generate-sources + + run + + + com.google.protobuf:protoc:${protobuf.version} + ${protobuf.version} + +src/main/protobuf + + + + + + + + + + user-defined-protoc + + ${env.CORE_PROTOC_EXEC_PATH} + + + + +com.github.os72 +protoc-jar-maven-plugin +${protoc-jar-maven-plugin.version} + + +generate-sources + + run + + + com.google.protobuf:protoc:${protobuf.version} + ${protobuf.version} + ${core.protoc.executable.path} + +src/main/protobuf + + + + + + + + + Review Comment: nit: remove empty lines? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #36921: [SPARK-39481][SQL] Do not push down complex filter condition
github-actions[bot] closed pull request #36921: [SPARK-39481][SQL] Do not push down complex filter condition URL: https://github.com/apache/spark/pull/36921 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #38985: [SPARK-41451][K8S] Avoid using empty abbrevMarker in StringUtils.abbreviate
dongjoon-hyun commented on PR #38985: URL: https://github.com/apache/spark/pull/38985#issuecomment-1344930220 Thank you for closing, @pan3793 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang closed pull request #38988: [SPARK-41456][SQL] Improve the performance of try_cast
gengliangwang closed pull request #38988: [SPARK-41456][SQL] Improve the performance of try_cast URL: https://github.com/apache/spark/pull/38988 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #38988: [SPARK-41456][SQL] Improve the performance of try_cast
gengliangwang commented on PR #38988: URL: https://github.com/apache/spark/pull/38988#issuecomment-1344894973 @cloud-fan @HyukjinKwon @LuciferYang Thanks for the review. Merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xinrong-meng commented on pull request #39009: [SPARK-41225][CONNECT][PYTHON][FOLLOW-UP] Disable unsupported functions.
xinrong-meng commented on PR #39009: URL: https://github.com/apache/spark/pull/39009#issuecomment-1344864352 Merged to master, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xinrong-meng closed pull request #39009: [SPARK-41225][CONNECT][PYTHON][FOLLOW-UP] Disable unsupported functions.
xinrong-meng closed pull request #39009: [SPARK-41225][CONNECT][PYTHON][FOLLOW-UP] Disable unsupported functions. URL: https://github.com/apache/spark/pull/39009 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xinrong-meng opened a new pull request, #39013: [SPARK-41472][CONNECT][PYTHON] Implement the rest of string/binary functions
xinrong-meng opened a new pull request, #39013: URL: https://github.com/apache/spark/pull/39013 ### What changes were proposed in this pull request? Implement the rest of string/binary functions. The first commit is https://github.com/apache/spark/pull/38921. ### Why are the changes needed? For API coverage on Spark Connect. ### Does this PR introduce _any_ user-facing change? Yes. New functions are available on Spark Connect. ### How was this patch tested? Unit tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #38982: [SPARK-41376][CORE][3.2] Correct the Netty preferDirectBufs check logic on executor start
dongjoon-hyun closed pull request #38982: [SPARK-41376][CORE][3.2] Correct the Netty preferDirectBufs check logic on executor start URL: https://github.com/apache/spark/pull/38982 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics
otterc commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1044862699 ## core/src/test/resources/HistoryServerExpectations/one_stage_json_with_partitionId_expectation.json: ## @@ -26,13 +26,23 @@ "outputBytes" : 0, "outputRecords" : 0, "shuffleRemoteBlocksFetched" : 0, - "shuffleLocalBlocksFetched" : 80, + "shuffleLocalBlocksFetched" : 0, Review Comment: Same here. Why did we change the value? ## core/src/test/resources/HistoryServerExpectations/one_stage_json_with_partitionId_expectation.json: ## @@ -26,13 +26,23 @@ "outputBytes" : 0, "outputRecords" : 0, "shuffleRemoteBlocksFetched" : 0, - "shuffleLocalBlocksFetched" : 80, + "shuffleLocalBlocksFetched" : 0, "shuffleFetchWaitTime" : 0, "shuffleRemoteBytesRead" : 0, "shuffleRemoteBytesReadToDisk" : 0, - "shuffleLocalBytesRead" : 3760, - "shuffleReadBytes" : 3760, - "shuffleReadRecords" : 100, + "shuffleLocalBytesRead" : 0, + "shuffleReadBytes" : 0, + "shuffleReadRecords" : 0, Review Comment: Same for these. ## core/src/test/resources/HistoryServerExpectations/one_stage_json_with_partitionId_expectation.json: ## @@ -76,12 +86,24 @@ }, "shuffleReadMetrics" : { "remoteBlocksFetched" : 0, - "localBlocksFetched" : 8, + "localBlocksFetched" : 0, "fetchWaitTime" : 0, "remoteBytesRead" : 0, "remoteBytesReadToDisk" : 0, - "localBytesRead" : 376, - "recordsRead" : 10 + "localBytesRead" : 0, + "recordsRead" : 0, Review Comment: this as well ## core/src/test/resources/HistoryServerExpectations/one_stage_json_with_partitionId_expectation.json: ## @@ -127,12 +149,24 @@ }, "shuffleReadMetrics" : { "remoteBlocksFetched" : 0, - "localBlocksFetched" : 8, + "localBlocksFetched" : 0, "fetchWaitTime" : 0, "remoteBytesRead" : 0, "remoteBytesReadToDisk" : 0, - "localBytesRead" : 376, - "recordsRead" : 10 + "localBytesRead" : 0, + "recordsRead" : 0, Review Comment: existing value changed ## core/src/test/resources/HistoryServerExpectations/failed_stage_list_json_expectation.json: ## @@ -66,5 +76,7 @@ "MajorGCCount" : 0, "MajorGCTime" : 0, "TotalGCTime" : 0 - } -} ] + }, + "isPushBasedShuffleEnabled" : false, + "shuffleMergersCount" : 0 +} ] Review Comment: No newline at the end of file ## core/src/test/resources/HistoryServerExpectations/one_stage_json_with_partitionId_expectation.json: ## @@ -76,12 +86,24 @@ }, "shuffleReadMetrics" : { "remoteBlocksFetched" : 0, - "localBlocksFetched" : 8, + "localBlocksFetched" : 0, Review Comment: here as well ## core/src/test/resources/HistoryServerExpectations/one_stage_json_with_partitionId_expectation.json: ## @@ -178,12 +212,24 @@ }, "shuffleReadMetrics" : { "remoteBlocksFetched" : 0, - "localBlocksFetched" : 8, + "localBlocksFetched" : 0, Review Comment: same here ## core/src/test/resources/HistoryServerExpectations/one_stage_json_with_partitionId_expectation.json: ## @@ -178,12 +212,24 @@ }, "shuffleReadMetrics" : { "remoteBlocksFetched" : 0, - "localBlocksFetched" : 8, + "localBlocksFetched" : 0, "fetchWaitTime" : 0, "remoteBytesRead" : 0, "remoteBytesReadToDisk" : 0, - "localBytesRead" : 372, - "recordsRead" : 9 + "localBytesRead" : 0, + "recordsRead" : 0, Review Comment: same here ## core/src/test/resources/HistoryServerExpectations/one_stage_json_with_partitionId_expectation.json: ## @@ -127,12 +149,24 @@ }, "shuffleReadMetrics" : { "remoteBlocksFetched" : 0, - "localBlocksFetched" : 8, + "localBlocksFetched" : 0, Review Comment: value is changed ## core/src/test/resources/HistoryServerExpectations/failed_stage_list_json_expectation.json: ## @@ -27,13 +27,23 @@ "outputBytes" : 0, "outputRecords" : 0, "shuffleRemoteBlocksFetched" : 0, - "shuffleLocalBlocksFetched" : 64, - "shuffleFetchWaitTime" : 1, + "shuffleLocalBlocksFetched" : 0, + "shuffleFetchWaitTime" : 0, Review Comment: Why are we changing these values? ## core/src/test/resources/HistoryServerExpectations/one_stage_json_with_partitionId_expectation.json: ## @@ -620,5 +750,7 @@ "MajorGCCount" : 0, "MajorGCTime" : 0, "TotalGCTime" : 0 - } + }, + "isPushBasedShuffleEnabled" : false, + "shuffleMergersCount" : 0 } ] Review Comment: We can add a new line at the end of file. -- This is an automated message from the Apache
[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1044857157 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( val parseMode: ParseMode = parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode) + + val circularReferenceType: String = parameters.getOrElse("circularReferenceType", "FIELD_NAME") Review Comment: @rangadi @baganokodo2022 thanks for the quick meet. meeting conclusion use descriptor type name and added unit tests with some complex schema. ``` val recordName = fd.getMessageType.getFullName ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics
mridulm commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1044823829 ## core/src/main/scala/org/apache/spark/status/api/v1/api.scala: ## @@ -302,7 +312,9 @@ class StageData private[spark]( @JsonDeserialize(using = classOf[ExecutorMetricsJsonDeserializer]) val peakExecutorMetrics: Option[ExecutorMetrics], val taskMetricsDistributions: Option[TaskMetricDistributions], -val executorMetricsDistributions: Option[ExecutorMetricsDistributions]) +val executorMetricsDistributions: Option[ExecutorMetricsDistributions], +val isPushBasedShuffleEnabled: Boolean, Review Comment: We actually have a lot of `pushBased*` references in the codebase. Given the block id's are named shufflePush, that might be a good prefix to settle on. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] anchovYu commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer
anchovYu commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1044807443 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala: ## @@ -424,8 +424,51 @@ case class OuterReference(e: NamedExpression) override def qualifier: Seq[String] = e.qualifier override def exprId: ExprId = e.exprId override def toAttribute: Attribute = e.toAttribute - override def newInstance(): NamedExpression = OuterReference(e.newInstance()) + override def newInstance(): NamedExpression = +OuterReference(e.newInstance()).setNameParts(nameParts) final override val nodePatterns: Seq[TreePattern] = Seq(OUTER_REFERENCE) + + // optional field, the original name parts of UnresolvedAttribute before it is resolved to + // OuterReference. Used in rule ResolveLateralColumnAlias to convert OuterReference back to + // LateralColumnAliasReference. + var nameParts: Option[Seq[String]] = None + def setNameParts(newNameParts: Option[Seq[String]]): OuterReference = { Review Comment: As we discussed before, I feel it is not safe to do so given the current solution in ResolveOuterReference that each rule is applied only once. I made up a query (it can't run, just for demonstration): ``` SELECT * FROM range(1, 7) WHERE ( SELECT id2 FROM (SELECT dept * 2.0 AS id, id + 1 AS id2 FROM $testTable)) > 5 ORDER BY id ``` It is possible that `dept * 2.0` is not resolved because it needs type conversion, so the LCA rule doesn't apply. Then it just wraps the `id` in `id + 1 AS id2` as OuterReference. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WolverineJiang commented on pull request #39000: [SPARK-41461][BUILD][CORE] Support user configurable protoc executables when building Spark Core.
WolverineJiang commented on PR #39000: URL: https://github.com/apache/spark/pull/39000#issuecomment-1344724455 The pom of the core module has active profiles, and activeByDefault does not take effect. Therefore, property is used instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amaliujia commented on pull request #39004: [SPARK-41466][BUILD] Change Scala Style configuration to catch AnyFunSuite instead of FunSuite
amaliujia commented on PR #39004: URL: https://github.com/apache/spark/pull/39004#issuecomment-1344680268 late LGTM thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] anchovYu commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer
anchovYu commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1044744657 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -638,6 +638,14 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB case UnresolvedWindowExpression(_, windowSpec) => throw QueryCompilationErrors.windowSpecificationNotDefinedError(windowSpec.name) }) +// This should not happen, resolved Project or Aggregate should restore or resolve +// all lateral column alias references. Add check for extra safe. Review Comment: I didn't add it intentionally. This is because I don't want those attributes actually can be resolve as LCA but to show in the error msg as UnresolvedAttribute. Also note that unlike RemoveTempResolvedColumn, LCARef can't be directly resolved to the NamedExpression inside of it because the plan won't be right - there is no alias push down. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #38993: [SPARK-41459][SQL] fix thrift server operation log output is empty
viirya commented on PR #38993: URL: https://github.com/apache/spark/pull/38993#issuecomment-1344649595 BTW, seems you change the PR description template by mistake. Can you restore the template? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a diff in pull request #38993: [SPARK-41459][SQL] fix thrift server operation log output is empty
viirya commented on code in PR #38993: URL: https://github.com/apache/spark/pull/38993#discussion_r1044740898 ## sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java: ## @@ -276,12 +276,19 @@ private static StringLayout initLayout(OperationLog.LoggingLevel loggingMode) { return getLayout(isVerbose, layout); } - public LogDivertAppender(OperationManager operationManager, -OperationLog.LoggingLevel loggingMode) { + public static LogDivertAppender create(OperationManager operationManager, + OperationLog.LoggingLevel loggingMode) { +CharArrayWriter writer = new CharArrayWriter(); +return new LogDivertAppender(operationManager, loggingMode, writer); + } + + private LogDivertAppender(OperationManager operationManager, +OperationLog.LoggingLevel loggingMode, CharArrayWriter writer) { Review Comment: Isn't it just to create `CharArrayWriter` in original constructor? ```suggestion public LogDivertAppender(OperationManager operationManager, OperationLog.LoggingLevel loggingMode) { CharArrayWriter writer = new CharArrayWriter(); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a diff in pull request #38993: [SPARK-41459][SQL] fix thrift server operation log output is empty
viirya commented on code in PR #38993: URL: https://github.com/apache/spark/pull/38993#discussion_r1044738513 ## sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java: ## @@ -276,12 +276,19 @@ private static StringLayout initLayout(OperationLog.LoggingLevel loggingMode) { return getLayout(isVerbose, layout); } - public LogDivertAppender(OperationManager operationManager, -OperationLog.LoggingLevel loggingMode) { + public static LogDivertAppender create(OperationManager operationManager, + OperationLog.LoggingLevel loggingMode) { +CharArrayWriter writer = new CharArrayWriter(); +return new LogDivertAppender(operationManager, loggingMode, writer); + } + + private LogDivertAppender(OperationManager operationManager, Review Comment: Any reason to change it to private constructor and add `create` for constructing? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a diff in pull request #38993: [SPARK-41459][SQL] fix thrift server operation log output is empty
viirya commented on code in PR #38993: URL: https://github.com/apache/spark/pull/38993#discussion_r1044737028 ## sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java: ## @@ -276,12 +276,19 @@ private static StringLayout initLayout(OperationLog.LoggingLevel loggingMode) { return getLayout(isVerbose, layout); } - public LogDivertAppender(OperationManager operationManager, -OperationLog.LoggingLevel loggingMode) { + public static LogDivertAppender create(OperationManager operationManager, + OperationLog.LoggingLevel loggingMode) { +CharArrayWriter writer = new CharArrayWriter(); +return new LogDivertAppender(operationManager, loggingMode, writer); + } + + private LogDivertAppender(OperationManager operationManager, +OperationLog.LoggingLevel loggingMode, CharArrayWriter writer) { super("LogDivertAppender", initLayout(loggingMode), null, false, true, Property.EMPTY_ARRAY, -new WriterManager(new CharArrayWriter(), "LogDivertAppender", Review Comment: The issue is here using another `CharArrayWriter`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #38982: [SPARK-41376][CORE][3.2] Correct the Netty preferDirectBufs check logic on executor start
dongjoon-hyun commented on PR #38982: URL: https://github.com/apache/spark/pull/38982#issuecomment-1344618354 Merged to branch-3.2 too. Thank you, @pan3793 and @Yikun . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on pull request #38608: [SPARK-41080][SQL] Support Bit manipulation function SETBIT
vinodkc commented on PR #38608: URL: https://github.com/apache/spark/pull/38608#issuecomment-1344618388 @gengliangwang, can you please review it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amaliujia commented on a diff in pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint
amaliujia commented on code in PR #38984: URL: https://github.com/apache/spark/pull/38984#discussion_r1044705096 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -305,7 +305,11 @@ class SparkConnectPlanner(session: SparkSession) { } private def transformHint(rel: proto.Hint): LogicalPlan = { -val params = rel.getParametersList.asScala.map(toCatalystValue).toSeq +val params = rel.getParametersList.asScala.map(toCatalystValue).toSeq.map{ + case name: String => UnresolvedAttribute(name) + case l: Long => l.toInt Review Comment: This becomes a bit tricky: Python is just one of the clients. Scala also has a client in the future. I actually not sure what is the best way to deal with cross language difference. cc @HyukjinKwon any idea? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #38991: [SPARK-41457][PYTHON][TESTS] Refactor type annotations and dependency checks in tests
dongjoon-hyun commented on PR #38991: URL: https://github.com/apache/spark/pull/38991#issuecomment-1344612390 Could you check the linter failure? ``` starting mypy data test... annotations failed data checks: = test session starts == platform linux -- Python 3.9.5, pytest-7.1.3, pluggy-1.0.0 rootdir: /__w/spark/spark/python, configfile: ../dev/pyproject.toml plugins: mypy-plugins-1.9.3 collected 40 items python/pyspark/ml/tests/typing/test_classification.yml FF[ 5%] python/pyspark/ml/tests/typing/test_clustering.yaml F[ 7%] python/pyspark/ml/tests/typing/test_evaluation.yml F [ 10%] python/pyspark/ml/tests/typing/test_feature.yml FF [ 15%] python/pyspark/ml/tests/typing/test_param.yml F [ 17%] python/pyspark/ml/tests/typing/test_readable.yml F [ 20%] python/pyspark/ml/tests/typing/test_regression.yml FFF [ 27%] python/pyspark/sql/tests/typing/test_column.yml F[ 30%] python/pyspark/sql/tests/typing/test_dataframe.yml FFF [ 47%] python/pyspark/sql/tests/typing/test_functions.yml F [ 50%] python/pyspark/sql/tests/typing/test_readwriter.yml FF [ 55%] python/pyspark/sql/tests/typing/test_session.yml F [ 67%] python/pyspark/sql/tests/typing/test_udf.yml FFF [ 85%] python/pyspark/tests/typing/test_context.yml F [ 87%] python/pyspark/tests/typing/test_core.yml F [ 90%] python/pyspark/tests/typing/test_rdd.yml FFF [ 97%] python/pyspark/tests/typing/test_resultiterable.yml F [100%] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xinrong-meng commented on a diff in pull request #38946: [SPARK-41414][CONNECT][PYTHON] Implement date/timestamp functions
xinrong-meng commented on code in PR #38946: URL: https://github.com/apache/spark/pull/38946#discussion_r1044694589 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -645,6 +645,153 @@ def test_string_functions(self): sdf.select(SF.encode("c", "UTF-8")).toPandas(), ) +# TODO(SPARK-41283): To compare toPandas for test cases with dtypes marked +def test_date_ts_functions(self): +from pyspark.sql import functions as SF +from pyspark.sql.connect import functions as CF + +query = """ +SELECT * FROM VALUES +('1997/02/28 10:30:00', '2023/03/01 06:00:00', 'JST', 1428476400, 2020, 12, 6), +('2000/01/01 04:30:05', '2020/05/01 12:15:00', 'PST', 1403892395, 2022, 12, 6) +AS tab(ts1, ts2, tz, seconds, Y, M, D) +""" +# +---+---+---+--++---+---+ +# |ts1|ts2| tz| seconds| Y| M| D| +# +---+---+---+--++---+---+ +# |1997/02/28 10:30:00|2023/03/01 06:00:00|JST|1428476400|2020| 12| 6| +# |2000/01/01 04:30:05|2020/05/01 12:15:00|PST|1403892395|2022| 12| 6| +# +---+---+---+--++---+---+ + +cdf = self.connect.sql(query) +sdf = self.spark.sql(query) + +# With no parameters +for cfunc, sfunc in [ +(CF.current_date, SF.current_date), +]: +self.assert_eq( +cdf.select(cfunc()).toPandas(), +sdf.select(sfunc()).toPandas(), +) + +# current_timestamp +# [left]: datetime64[ns, America/Los_Angeles] +# [right]: datetime64[ns] +plan = cdf.select(CF.current_timestamp())._plan.to_proto(self.connect) +self.assertEqual( +plan.root.project.expressions.unresolved_function.function_name, "current_timestamp" Review Comment: Thank you both! The tests are improved as https://github.com/apache/spark/pull/38946/commits/0493a01a4e39795a522cdc58f8a69b0ed261c832. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #38999: [SPARK-41450][BUILD] Fix shading in `core` module
gengliangwang commented on PR #38999: URL: https://github.com/apache/spark/pull/38999#issuecomment-1344591174 @pan3793 Thanks for fixing it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics
otterc commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1044679570 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -726,6 +736,61 @@ final class ShuffleBlockFetcherIterator( } } + // Number of map blocks in a ShuffleBlockChunk + private def getShuffleChunkCardinality(blockId: ShuffleBlockChunkId): Int = { + pushBasedFetchHelper.getRoaringBitMap(blockId).map(_.getCardinality).getOrElse(0) + } + + // Check if the merged block is local to the host or not + private def isLocalMergedBlockAddress(address: BlockManagerId): Boolean = { +address.executorId.isEmpty && address.host == blockManager.blockManagerId.host Review Comment: This method is there in PushBasedFetchHelper. `PushBasedFetchHelper.isLocalPushMergedBlockAddress`. We should try to just use that. We can maybe make that static. The executor id is not empty. It's `SHUFFLE_MERGER_IDENTIFIER` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] steveloughran commented on pull request #38974: [SPARK-41392][BUILD] Make maven build Spark master with Hadoop 3.4.0-SNAPSHOT successful
steveloughran commented on PR #38974: URL: https://github.com/apache/spark/pull/38974#issuecomment-1344581573 yeah, not going to happen for a while; 3.3.5 RC0 coming soon though; just trying to wrap up an abfs prefetch bug -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #38974: [SPARK-41392][BUILD] Make maven build Spark master with Hadoop 3.4.0-SNAPSHOT successful
LuciferYang commented on PR #38974: URL: https://github.com/apache/spark/pull/38974#issuecomment-1344577731 fine to me, close first ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang closed pull request #38974: [SPARK-41392][BUILD] Make maven build Spark master with Hadoop 3.4.0-SNAPSHOT successful
LuciferYang closed pull request #38974: [SPARK-41392][BUILD] Make maven build Spark master with Hadoop 3.4.0-SNAPSHOT successful URL: https://github.com/apache/spark/pull/38974 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #38997: [SPARK-41462][SQL] Date and timestamp type can up cast to TimestampNTZ
MaxGekk commented on PR #38997: URL: https://github.com/apache/spark/pull/38997#issuecomment-1344574665 +1, LGTM. Merging to master. Thank you, @gengliangwang and @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39012: [MINOR][CONNECT]: Fix command error and typo
dongjoon-hyun commented on code in PR #39012: URL: https://github.com/apache/spark/pull/39012#discussion_r1044671065 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -322,8 +322,7 @@ class SparkConnectPlanner(session: SparkSession) { None, rel.getVariableColumnName, Seq(rel.getValueColumnName), -transformRelation(rel.getInput) - ) +transformRelation(rel.getInput)) Review Comment: Is this required for linter? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39012: [MINOR][CONNECT]: Fix command error and typo
dongjoon-hyun commented on code in PR #39012: URL: https://github.com/apache/spark/pull/39012#discussion_r1044670574 ## dev/lint-scala: ## @@ -29,14 +29,14 @@ ERRORS=$(./build/mvn \ -Dscalafmt.skip=false \ -Dscalafmt.validateOnly=true \ -Dscalafmt.changedOnly=false \ --pl connector/connect \ +-pl connector/connect/server \ Review Comment: For this case, we need official JIRA ID instead of `MINOR`. Could you file one, @dengziming ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #39004: [SPARK-41466][BUILD] Change Scala Style configuration to catch AnyFunSuite instead of FunSuite
dongjoon-hyun commented on PR #39004: URL: https://github.com/apache/spark/pull/39004#issuecomment-1344576249 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #39004: [SPARK-41466][BUILD] Change Scala Style configuration to catch AnyFunSuite instead of FunSuite
dongjoon-hyun closed pull request #39004: [SPARK-41466][BUILD] Change Scala Style configuration to catch AnyFunSuite instead of FunSuite URL: https://github.com/apache/spark/pull/39004 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk closed pull request #38997: [SPARK-41462][SQL] Date and timestamp type can up cast to TimestampNTZ
MaxGekk closed pull request #38997: [SPARK-41462][SQL] Date and timestamp type can up cast to TimestampNTZ URL: https://github.com/apache/spark/pull/38997 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rednaxelafx commented on pull request #38923: [SPARK-41395][SQL] `InterpretedMutableProjection` should use `setDecimal` to set null values for decimals in an unsafe row
rednaxelafx commented on PR #38923: URL: https://github.com/apache/spark/pull/38923#issuecomment-1344558867 Post-hoc review: LGTM, this is a good catch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] warrenzhu25 commented on pull request #39011: [WIP][SPARK-41469][CORE] Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated
warrenzhu25 commented on PR #39011: URL: https://github.com/apache/spark/pull/39011#issuecomment-1344553081 > cc @warrenzhu25 too It's really the change I want. Great work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wineternity commented on a diff in pull request #38702: [SPARK-41187][CORE] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen
wineternity commented on code in PR #38702: URL: https://github.com/apache/spark/pull/38702#discussion_r1044596644 ## core/src/main/scala/org/apache/spark/status/AppStatusListener.scala: ## @@ -674,22 +674,30 @@ private[spark] class AppStatusListener( delta }.orNull -val (completedDelta, failedDelta, killedDelta) = event.reason match { +// SPARK-41187: For `SparkListenerTaskEnd` with `Resubmitted` reason, which is raised by +// executor lost, it can lead to negative `LiveStage.activeTasks` since there's no +// corresponding `SparkListenerTaskStart` event for each of them. The negative activeTasks +// will make the stage always remains in the live stage list as it can never meet the +// condition activeTasks == 0. This in turn causes the dead executor to never be retained +// if that live stage's submissionTime is less than the dead executor's removeTime. +val (completedDelta, failedDelta, killedDelta, activeTasksDelta) = event.reason match { Review Comment: fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a diff in pull request #38702: [SPARK-41187][CORE] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen
Ngone51 commented on code in PR #38702: URL: https://github.com/apache/spark/pull/38702#discussion_r1044581470 ## core/src/main/scala/org/apache/spark/status/AppStatusListener.scala: ## @@ -674,22 +674,30 @@ private[spark] class AppStatusListener( delta }.orNull -val (completedDelta, failedDelta, killedDelta) = event.reason match { +// SPARK-41187: For `SparkListenerTaskEnd` with `Resubmitted` reason, which is raised by +// executor lost, it can lead to negative `LiveStage.activeTasks` since there's no +// corresponding `SparkListenerTaskStart` event for each of them. The negative activeTasks +// will make the stage always remains in the live stage list as it can never meet the +// condition activeTasks == 0. This in turn causes the dead executor to never be retained +// if that live stage's submissionTime is less than the dead executor's removeTime. +val (completedDelta, failedDelta, killedDelta, activeTasksDelta) = event.reason match { Review Comment: nit: ```suggestion val (completedDelta, failedDelta, killedDelta, activeDelta) = event.reason match { ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dengziming opened a new pull request, #39012: [MINOR][CONNECT]: Fix command error and typo
dengziming opened a new pull request, #39012: URL: https://github.com/apache/spark/pull/39012 ### What changes were proposed in this pull request? We separate connect into server and common, but failed to update the `lint-scala` tools. fix a typo: fase -> false format the code. ### Why are the changes needed? We can check the scala code format without it. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #38999: [SPARK-41450][BUILD] Fix shading in `core` module
dongjoon-hyun commented on PR #38999: URL: https://github.com/apache/spark/pull/38999#issuecomment-1344459575 Merged to master. Thank you, @pan3793 and all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #38999: [SPARK-41450][BUILD] Fix shading in `core` module
dongjoon-hyun closed pull request #38999: [SPARK-41450][BUILD] Fix shading in `core` module URL: https://github.com/apache/spark/pull/38999 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #39011: [WIP][SPARK-41469][CORE] Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated
Ngone51 commented on PR #39011: URL: https://github.com/apache/spark/pull/39011#issuecomment-1344456134 Mark as WIP first regarding the compilation error and missing ut. Any feedback is still welcome. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'
vinodkc commented on PR #38146: URL: https://github.com/apache/spark/pull/38146#issuecomment-1344455185 @gengliangwang, Review comments are addressed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a diff in pull request #39011: [SPARK-41469][CORE] Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated
Ngone51 commented on code in PR #39011: URL: https://github.com/apache/spark/pull/39011#discussion_r1044572392 ## core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala: ## @@ -1046,17 +1046,46 @@ private[spark] class TaskSetManager( /** Called by TaskScheduler when an executor is lost so we can re-enqueue our tasks */ override def executorLost(execId: String, host: String, reason: ExecutorLossReason): Unit = { -// Re-enqueue any tasks that ran on the failed executor if this is a shuffle map stage, -// and we are not using an external shuffle server which could serve the shuffle outputs. -// The reason is the next stage wouldn't be able to fetch the data from this dead executor -// so we would need to rerun these tasks on other executors. -if (isShuffleMapTasks && !env.blockManager.externalShuffleServiceEnabled && !isZombie) { +// Re-enqueue any tasks with potential shuffle data loss that ran on the failed executor +// if this is a shuffle map stage, and we are not using an external shuffle server which +// could serve the shuffle outputs or the executor lost is caused by decommission (which +// can destroy the whole host). The reason is the next stage wouldn't be able to fetch the +// data from this dead executor so we would need to rerun these tasks on other executors. +val maybeShuffleMapOutputLoss = isShuffleMapTasks && + (reason.isInstanceOf[ExecutorDecommission] || !env.blockManager.externalShuffleServiceEnabled) +if (maybeShuffleMapOutputLoss && !isZombie) { for ((tid, info) <- taskInfos if info.executorId == execId) { val index = info.index +lazy val isShuffleMapOutputAvailable = reason match { + case ExecutorDecommission(_, _) => +val shuffleId = tasks(0).asInstanceOf[ShuffleMapTask].shuffleId +val mapId = if (conf.get(config.SHUFFLE_USE_OLD_FETCH_PROTOCOL)) { + info.partitionId +} else { + tid +} +val locationOpt = env.mapOutputTracker.asInstanceOf[MapOutputTrackerMaster] + .getMapOutputLocation(shuffleId, mapId) +// There are 3 cases of locationOpt: +// 1) locationOpt.isDefined && locationOpt.get.host == host: +//this case implies that the shuffle map output is still on the lost executor. The +//map output file is supposed to lose so we should rerun this task; +// 2) locationOpt.isDefined && locationOpt.get.host != host: +//this case implies that the shuffle map output has been migrated to another +//host. The task doesn't need to rerun; +// 3) location.isEmpty: Review Comment: ```suggestion // 3) locationOpt.isEmpty: ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #39011: [SPARK-41469][CORE] Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated
Ngone51 commented on PR #39011: URL: https://github.com/apache/spark/pull/39011#issuecomment-137148 cc @warrenzhu25 too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 opened a new pull request, #39011: [SPARK-41469][CORE] Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated
Ngone51 opened a new pull request, #39011: URL: https://github.com/apache/spark/pull/39011 ### What changes were proposed in this pull request? This PR proposes to avoid rerunning the finished shuffle map task in `TaskSetManager.executorLost()` if the executor lost is caused by decommission and the shuffle data has been successfully migrated. ### Why are the changes needed? To avoid unnecessary task recomputation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? (Will try to add unit tests tmr) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org