date:20230719

[GitHub] [spark] yaooqinn commented on pull request #40524: [SPARK-42898][SQL] Mark that string/date casts do not need time zone id

2023-07-19 Thread via GitHub



yaooqinn commented on PR #40524:
URL: https://github.com/apache/spark/pull/40524#issuecomment-1643292939

   cc @ulysses-you who is the last one to touch this part


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #41951: [SPARK-44367][SQL][UI] Show error message on UI for each failed query

2023-07-19 Thread via GitHub



yaooqinn commented on PR #41951:
URL: https://github.com/apache/spark/pull/41951#issuecomment-1643284132

   Thanks, merged to master and 3.5.
   
   PS, K8s IT failures are irrelevant.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn closed pull request #41951: [SPARK-44367][SQL][UI] Show error message on UI for each failed query

2023-07-19 Thread via GitHub



yaooqinn closed pull request #41951: [SPARK-44367][SQL][UI] Show error message 
on UI for each failed query
URL: https://github.com/apache/spark/pull/41951


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #42088: [SPARK-44491][INFRA] Add `branch-3.5` to `publish_snapshot` GitHub Action job

2023-07-19 Thread via GitHub



LuciferYang commented on PR #42088:
URL: https://github.com/apache/spark/pull/42088#issuecomment-1643272217

   cc @HyukjinKwon @zhengruifeng 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun opened a new pull request, #42088: [SPARK-44491][INFRA] Add `branch-3.5` to `publish_snapshot` GitHub Action job

2023-07-19 Thread via GitHub



panbingkun opened a new pull request, #42088:
URL: https://github.com/apache/spark/pull/42088

   ### What changes were proposed in this pull request?
   This PR aims to add `branch-3.5` to `publish_snapshot` GitHub Action job.
   
   ### Why are the changes needed?
   Since GitHub Action Cron job is executed only at `master` branch, we need to 
register newly added branch at `master` branch.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   N/A, Manual check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mathewjacob1002 opened a new pull request, #42087: [SPARK-44264] Added Example to Deepspeed Distributor

2023-07-19 Thread via GitHub



mathewjacob1002 opened a new pull request, #42087:
URL: https://github.com/apache/spark/pull/42087

   ### What changes were proposed in this pull request?
   Added examples to the docstring of using DeepspeedTorchDistributor
   
   
   ### Why are the changes needed?
   More concrete examples, allowing for a better understanding of feature.
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, docs changes. 
   
   
   ### How was this patch tested?
   make html
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng closed pull request #42040: [WIP][SPARK-43611][SQL][PS][CONNCECT] Fix unexpected `AnalysisException` from Spark Connect client

2023-07-19 Thread via GitHub



zhengruifeng closed pull request #42040: [WIP][SPARK-43611][SQL][PS][CONNCECT] 
Fix unexpected `AnalysisException` from Spark Connect client
URL: https://github.com/apache/spark/pull/42040


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #42040: [WIP][SPARK-43611][SQL][PS][CONNCECT] Fix unexpected `AnalysisException` from Spark Connect client

2023-07-19 Thread via GitHub



zhengruifeng commented on PR #42040:
URL: https://github.com/apache/spark/pull/42040#issuecomment-1643204561

   close this one in favor of https://github.com/apache/spark/pull/42086


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on pull request #41932: [SPARK-44131][SQL][PYTHON][CONNECT][FOLLOWUP] Support qualified function name for call_function

2023-07-19 Thread via GitHub



beliefer commented on PR #41932:
URL: https://github.com/apache/spark/pull/41932#issuecomment-1643189573

   The CI failure is unrelated to this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `

2023-07-19 Thread via GitHub



zhengruifeng commented on PR #42086:
URL: https://github.com/apache/spark/pull/42086#issuecomment-1643187065

   I have checked with @cloud-fan that we might have to modify the rules one by 
one.
   The good news is that 90% UTs can be resolved by this single one, and I 
think we only need to touch another 3~4 rules.
   
   @HyukjinKwon @itholic 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng opened a new pull request, #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `

2023-07-19 Thread via GitHub



zhengruifeng opened a new pull request, #42086:
URL: https://github.com/apache/spark/pull/42086

   ### What changes were proposed in this pull request?
   Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `
   
   
   ### Why are the changes needed?
   In https://github.com/apache/spark/pull/39925, we introduced a new mechanism 
to resolve expression with specified plan.
   
   However, sometimes the plan ID might be eliminated by the analyzer, and then 
some expressions can not be correctly resolved, this issue is the main blocker 
of PS on Connect.
   
   
   ### Does this PR introduce _any_ user-facing change?
   yes, a lot of Pandas APIs enabled
   
   
   ### How was this patch tested?
   Enable 113 out of all the 122 related UTs 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cxzl25 opened a new pull request, #42085: [SPARK-44490][WEBUI] Remove `TaskPagedTable` in StagePage

2023-07-19 Thread via GitHub



cxzl25 opened a new pull request, #42085:
URL: https://github.com/apache/spark/pull/42085

   ### What changes were proposed in this pull request?
Remove `TaskPagedTable`
   
   ### Why are the changes needed?
   In [SPARK-21809](https://issues.apache.org/jira/browse/SPARK-21809), we 
introduced `stagespage-template.html` to show the running status of Stage. 
   `TaskPagedTable` is no longer effective, but there are still many PRs 
updating related codes.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   local test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #41948: [SPARK-44380][SQL][PYTHON] Support for Python UDTF to analyze in Python

2023-07-19 Thread via GitHub



HyukjinKwon closed pull request #41948: [SPARK-44380][SQL][PYTHON] Support for 
Python UDTF to analyze in Python
URL: https://github.com/apache/spark/pull/41948


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #41948: [SPARK-44380][SQL][PYTHON] Support for Python UDTF to analyze in Python

2023-07-19 Thread via GitHub



HyukjinKwon commented on PR #41948:
URL: https://github.com/apache/spark/pull/41948#issuecomment-1643063875

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer opened a new pull request, #42084: [SPARK-44292][SQL][FOLLOWUP] Make TYPE_CHECK_FAILURE_WITH_HINT use correct name

2023-07-19 Thread via GitHub



beliefer opened a new pull request, #42084:
URL: https://github.com/apache/spark/pull/42084

   ### What changes were proposed in this pull request?
   https://github.com/apache/spark/pull/41850 uses 
`TYPE_CHECK_FAILURE_WITH_HINT`, it should be 
`DATATYPE_MISMATCH.TYPE_CHECK_FAILURE_WITH_HINT`. 
   
   
   ### Why are the changes needed?
   Fix a bug.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   
   
   ### How was this patch tested?
   N/A
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #42007: [SPARK-44431][SQL] Fix behavior of null IN (empty list) in optimization rules

2023-07-19 Thread via GitHub



cloud-fan closed pull request #42007: [SPARK-44431][SQL] Fix behavior of null 
IN (empty list) in optimization rules
URL: https://github.com/apache/spark/pull/42007


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #42007: [SPARK-44431][SQL] Fix behavior of null IN (empty list) in optimization rules

2023-07-19 Thread via GitHub



cloud-fan commented on PR #42007:
URL: https://github.com/apache/spark/pull/42007#issuecomment-1643037992

   thanks, merging to master/3.5!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] richardc-db opened a new pull request, #42083: Support deserializing long types when creating `Metadata` object from JObject

2023-07-19 Thread via GitHub



richardc-db opened a new pull request, #42083:
URL: https://github.com/apache/spark/pull/42083

   
   
   ### What changes were proposed in this pull request?
   
   Adds support to deserialize long types when creating `Metadata` objects from 
`JObject`s.
   
   ### Why are the changes needed?
   
   Code will previously crash when adding a `long` type to the `Metadata` 
object.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   no
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42037: [SPARK-44305][SQL] Dynamically choose whether to broadcast hadoop conf

2023-07-19 Thread via GitHub



HyukjinKwon commented on code in PR #42037:
URL: https://github.com/apache/spark/pull/42037#discussion_r1268879200


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala:
##
@@ -152,15 +153,25 @@ class OrcFileFormat
   assert(supportBatch(sparkSession, resultSchema))
 }
 
-OrcConf.IS_SCHEMA_EVOLUTION_CASE_SENSITIVE.setBoolean(hadoopConf, 
sqlConf.caseSensitiveAnalysis)
-
-val broadcastedConf =
-  sparkSession.sparkContext.broadcast(new 
SerializableConfiguration(hadoopConf))
 val isCaseSensitive = sparkSession.sessionState.conf.caseSensitiveAnalysis
+OrcConf.IS_SCHEMA_EVOLUTION_CASE_SENSITIVE.setBoolean(hadoopConf, 
isCaseSensitive)
+val broadcastedHadoopConf = if (options.isEmpty) {
+  Option.empty
+} else {
+  Option(sparkSession.sparkContext.broadcast(new 
SerializableConfiguration(hadoopConf)))
+}
+
 val orcFilterPushDown = sparkSession.sessionState.conf.orcFilterPushDown
+val sparkConf = sparkSession.sparkContext.conf
 
 (file: PartitionedFile) => {
-  val conf = broadcastedConf.value.value
+  val conf = if (broadcastedHadoopConf.isDefined) {
+broadcastedHadoopConf.get.value.value

Review Comment:
   Oh, I see. you're passing the original one here. Okay I am good with this 
approach.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun commented on a diff in pull request #41349: [SPARK-43839][SQL] Convert `_LEGACY_ERROR_TEMP_1337` to `UNSUPPORTED_FEATURE.TIME_TRAVEL`

2023-07-19 Thread via GitHub



panbingkun commented on code in PR #41349:
URL: https://github.com/apache/spark/pull/41349#discussion_r1268872661


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala:
##
@@ -89,10 +89,12 @@ class V2SessionCatalog(catalog: SessionCatalog)
   throw QueryCompilationErrors.timeTravelUnsupportedError(

Review Comment:
   @cloud-fan 
   Follow Up pr: https://github.com/apache/spark/pull/42082



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun commented on pull request #42082: [SPARK-43839][SQL][FOLLOWUP] Convert _LEGACY_ERROR_TEMP_1337 to UNSUPPORTED_FEATURE.TIME_TRAVEL

2023-07-19 Thread via GitHub



panbingkun commented on PR #42082:
URL: https://github.com/apache/spark/pull/42082#issuecomment-1643026104

   cc @cloud-fan @MaxGekk


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun commented on a diff in pull request #42082: [SPARK-43839][SQL][FOLLOWUP] Convert _LEGACY_ERROR_TEMP_1337 to UNSUPPORTED_FEATURE.TIME_TRAVEL

2023-07-19 Thread via GitHub



panbingkun commented on code in PR #42082:
URL: https://github.com/apache/spark/pull/42082#discussion_r1268872155


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala:
##
@@ -88,19 +88,11 @@ class V2SessionCatalog(catalog: SessionCatalog)
   }
 
   private def failTimeTravel(ident: Identifier, t: Table): Table = {
-t match {
-  case V1Table(catalogTable) =>
-if (catalogTable.tableType == CatalogTableType.VIEW) {
-  throw QueryCompilationErrors.timeTravelUnsupportedError(
-toSQLId(catalogTable.identifier.nameParts))
-} else {
-  throw QueryCompilationErrors.timeTravelUnsupportedError(
-toSQLId(catalogTable.identifier.nameParts))
-}
-
-  case _ => throw QueryCompilationErrors.timeTravelUnsupportedError(
-toSQLId(ident.asTableIdentifier.nameParts))
+val nameParts = t match {
+  case V1Table(catalogTable) => catalogTable.identifier.nameParts
+  case _ => ident.asTableIdentifier.nameParts

Review Comment:
   If the code here is simplified as follows:
   `val nameParts = ident.asTableIdentifier.nameParts`
   in some scenarios, `relationId` will not attach catalog name in the error 
message, such as:
   
https://github.com/apache/spark/blob/e0c79c637c16df92aa3c6fb14f151132ea26e92f/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala#L979-L990
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun opened a new pull request, #42082: [SPARK-43839][SQL][FOLLOWUP] Convert _LEGACY_ERROR_TEMP_1337 to UNSUPPORTED_FEATURE.TIME_TRAVEL

2023-07-19 Thread via GitHub



panbingkun opened a new pull request, #42082:
URL: https://github.com/apache/spark/pull/42082

   ### What changes were proposed in this pull request?
   - The pr is following up https://github.com/apache/spark/pull/41349.
   - The pr aims to simplify code logic after merge `_LEGACY_ERROR_TEMP_1337` 
to `UNSUPPORTED_FEATURE.TIME_TRAVEL`.
   
   ### Why are the changes needed?
   Simplify code logic.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Pass GA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Hisoka-X commented on pull request #42081: [SPARK-44487][TEST] Fix KubernetesSuite report NPE when not set spark.kubernetes.test.unpackSparkDir

2023-07-19 Thread via GitHub



Hisoka-X commented on PR #42081:
URL: https://github.com/apache/spark/pull/42081#issuecomment-1643018942

   > Is this related to the GA failure of the master?
   
   No, just bug I fonud when debug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #42081: [SPARK-44487][TEST] Fix KubernetesSuite report NPE when not set spark.kubernetes.test.unpackSparkDir

2023-07-19 Thread via GitHub



LuciferYang commented on PR #42081:
URL: https://github.com/apache/spark/pull/42081#issuecomment-1643016092

   Is this related to the GA failure of the master?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #41850: [SPARK-44292][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2315-2319]

2023-07-19 Thread via GitHub



beliefer commented on code in PR #41850:
URL: https://github.com/apache/spark/pull/41850#discussion_r1268862800


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala:
##
@@ -277,13 +277,13 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
 e.dataTypeMismatch(e, checkRes)
   case TypeCheckResult.TypeCheckFailure(message) =>
 e.setTagValue(DATA_TYPE_MISMATCH_ERROR, true)
-extraHintForAnsiTypeCoercionExpression(operator)
+val extraHint = 
extraHintForAnsiTypeCoercionExpression(operator)
 e.failAnalysis(
-  errorClass = "_LEGACY_ERROR_TEMP_2315",
+  errorClass = "TYPE_CHECK_FAILURE_WITH_HINT",

Review Comment:
   OK. let me see it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Hisoka-X commented on pull request #41347: [SPARK-43838][SQL] Fix subquery on single table with having clause can't be optimized

2023-07-19 Thread via GitHub



Hisoka-X commented on PR #41347:
URL: https://github.com/apache/spark/pull/41347#issuecomment-1643006858

   Thanks @cloud-fan for your help and @allisonwang-db 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #41347: [SPARK-43838][SQL] Fix subquery on single table with having clause can't be optimized

2023-07-19 Thread via GitHub



cloud-fan closed pull request #41347: [SPARK-43838][SQL] Fix subquery on single 
table with having clause can't be optimized
URL: https://github.com/apache/spark/pull/41347


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #41347: [SPARK-43838][SQL] Fix subquery on single table with having clause can't be optimized

2023-07-19 Thread via GitHub



cloud-fan commented on PR #41347:
URL: https://github.com/apache/spark/pull/41347#issuecomment-1643004942

   The k8s failure is unrelated, I'm merging it to master, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun commented on a diff in pull request #41349: [SPARK-43839][SQL] Convert `_LEGACY_ERROR_TEMP_1337` to `UNSUPPORTED_FEATURE.TIME_TRAVEL`

2023-07-19 Thread via GitHub



panbingkun commented on code in PR #41349:
URL: https://github.com/apache/spark/pull/41349#discussion_r1268852208


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala:
##
@@ -89,10 +89,12 @@ class V2SessionCatalog(catalog: SessionCatalog)
   throw QueryCompilationErrors.timeTravelUnsupportedError(

Review Comment:
   Yeah, after merging, the code here can be simplified. It's my bad, let me do 
it immediately.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Hisoka-X commented on pull request #42081: [SPARK-44487][TEST] Fix KubernetesSuite report NPE when not set spark.kubernetes.test.unpackSparkDir

2023-07-19 Thread via GitHub



Hisoka-X commented on PR #42081:
URL: https://github.com/apache/spark/pull/42081#issuecomment-1642997959

   cc @Yikun @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Hisoka-X opened a new pull request, #42081: [SPARK-44487][TEST] Fix KubernetesSuite report NPE when not set spark.kubernetes.test.unpackSparkDir

2023-07-19 Thread via GitHub



Hisoka-X opened a new pull request, #42081:
URL: https://github.com/apache/spark/pull/42081

   
   
   ### What changes were proposed in this pull request?
   Fix KubernetesSuite report NPE when not set 
`spark.kubernetes.test.unpackSparkDir`
   
   ```java
   Exception encountered when invoking run on a nested suite.
   java.lang.NullPointerException
   at sun.nio.fs.UnixPath.normalizeAndCheck(UnixPath.java:77)
   at sun.nio.fs.UnixPath.(UnixPath.java:71)
   at sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:281)
   at java.nio.file.Paths.get(Paths.java:84)
   at 
org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$beforeAll$4(KubernetesSuite.scala:164)
   at 
org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$beforeAll$4$adapted(KubernetesSuite.scala:163)
   at scala.collection.LinearSeqOptimized.find(LinearSeqOptimized.scala:115)
   at 
scala.collection.LinearSeqOptimized.find$(LinearSeqOptimized.scala:112)
   at scala.collection.immutable.List.find(List.scala:91)
   at 
org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:163)
   ```
   
   
   
   
   ### Why are the changes needed?
   `spark.kubernetes.test.unpackSparkDir` is an option, not necessary config.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   
   ### How was this patch tested?
   exist test. Tested in local
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun commented on a diff in pull request #41349: [SPARK-43839][SQL] Convert `_LEGACY_ERROR_TEMP_1337` to `UNSUPPORTED_FEATURE.TIME_TRAVEL`

2023-07-19 Thread via GitHub



panbingkun commented on code in PR #41349:
URL: https://github.com/apache/spark/pull/41349#discussion_r1268851681


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala:
##
@@ -89,10 +89,12 @@ class V2SessionCatalog(catalog: SessionCatalog)
   throw QueryCompilationErrors.timeTravelUnsupportedError(
 toSQLId(catalogTable.identifier.nameParts))
 } else {
-  throw QueryCompilationErrors.tableNotSupportTimeTravelError(ident)

Review Comment:
   Its corresponding error class '_LEGACY_ERROR_TEMP_1337' has been merged with 
`UNSUPPORTED_FEATURE.TIME_TRAVEL`, so it has been removed here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM

2023-07-19 Thread via GitHub



itholic commented on code in PR #41711:
URL: https://github.com/apache/spark/pull/41711#discussion_r1268843077


##
dev/error_message_refiner.py:
##
@@ -0,0 +1,265 @@
+#!/usr/bin/env python3
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+Utility for refining error messages based on LLM.
+
+Usage:
+python error_message_refiner.py  [--model_name=]
+
+Arguments:
+   Required.
+The name of the error class to refine the messages 
for.
+The list of error classes is located in
+
`common/utils/src/main/resources/error/error-classes.json`.
+
+Options:
+--model_name= Optional.
+The version of Chat GPT to use for refining the 
error messages.
+If not provided, the default 
version("gpt-3.5-turbo") will be used.
+
+Example usage:
+python error_message_refiner.py CANNOT_DECODE_URL --model_name=gpt-4
+
+Description:
+This script refines error messages using the LLM based approach.
+It takes the name of the error class as a required argument and, 
optionally,
+allows specifying the version of Chat GPT to use for refining the messages.
+
+Options:
+--model_name: Specifies the version of Chat GPT.
+   If not provided, the default version("gpt-3.5-turbo") 
will be used.
+
+Note:
+- Ensure that the necessary dependencies are installed before running the 
script.
+- Ensure that the valid API key is entered in the `api-key.txt`.
+- The refined error messages will be displayed in the console output.
+- To use the gpt-4 model, you need to join the waitlist. Please refer to
+  https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4 for 
more details.
+"""
+
+import argparse
+import json
+import openai
+import re
+import subprocess
+import random
+import os
+from typing import Tuple, Optional
+
+from sparktestsupport import SPARK_HOME
+
+PATH_TO_ERROR_CLASS = 
f"{SPARK_HOME}/common/utils/src/main/resources/error/error-classes.json"
+# Register your own open API key for the environment variable.
+# The API key can be obtained from 
https://platform.openai.com/account/api-keys.
+OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
+
+openai.api_key = OPENAI_API_KEY
+
+
+def _git_grep_files(search_string: str, exclude: str = None) -> str:
+"""
+Executes 'git grep' command to search for files containing the given 
search string.
+Returns the file path where the search string is found.
+"""
+result = subprocess.run(
+["git", "grep", "-l", search_string, "--", f"{SPARK_HOME}/*.scala"],
+capture_output=True,
+text=True,
+)
+output = result.stdout.strip()
+
+files = output.split("\n")
+files = [file for file in files if "Suite" not in file]
+if exclude is not None:
+files = [file for file in files if exclude not in file]
+file = random.choice(files)
+return file
+
+
+def _find_function(file_name: str, search_string: str) -> Optional[str]:
+"""
+Searches for a function in the given file containing the specified search 
string.
+Returns the name of the function if found, otherwise None.
+"""
+with open(file_name, "r") as file:
+content = file.read()
+functions = re.findall(r"def\s+(\w+)\s*\(", content)
+
+for function in functions:
+function_content = re.search(
+
rf"def\s+{re.escape(function)}(?:(?!def).)*?{re.escape(search_string)}",
+content,
+re.DOTALL,
+)
+if function_content and search_string in function_content.group(0):
+return function
+
+return None
+
+
+def _find_func_body(file_name: str, search_string: str) -> Optional[str]:
+"""
+Searches for a function body in the given file containing the specified 
search string.
+Returns the function body if found, otherwise None.
+"""
+with open(file_name, "r") as file:
+content = file.read()
+functions = re.findall(r"def\s+(\w+)\s*\(", content)
+
+for function in functions:
+function_content = re.search(
+

[GitHub] [spark] panbingkun commented on a diff in pull request #41349: [SPARK-43839][SQL] Convert `_LEGACY_ERROR_TEMP_1337` to `UNSUPPORTED_FEATURE.TIME_TRAVEL`

2023-07-19 Thread via GitHub



panbingkun commented on code in PR #41349:
URL: https://github.com/apache/spark/pull/41349#discussion_r1268843065


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala:
##
@@ -89,10 +89,12 @@ class V2SessionCatalog(catalog: SessionCatalog)
   throw QueryCompilationErrors.timeTravelUnsupportedError(
 toSQLId(catalogTable.identifier.nameParts))
 } else {
-  throw QueryCompilationErrors.tableNotSupportTimeTravelError(ident)

Review Comment:
   Let me investigate it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhenlineo commented on a diff in pull request #42009: [SPARK-44422][CONNECT] Spark Connect fine grained interrupt

2023-07-19 Thread via GitHub



zhenlineo commented on code in PR #42009:
URL: https://github.com/apache/spark/pull/42009#discussion_r1268824443


##
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SparkSessionE2ESuite.scala:
##
@@ -96,5 +103,151 @@ class SparkSessionE2ESuite extends RemoteSparkSession {
 assert(e2.getMessage.contains("OPERATION_CANCELED"), s"Unexpected 
exception: $e2")
 finished = true
 assert(ThreadUtils.awaitResult(interruptor, 10.seconds))
+assert(interrupted.distinct.length == 2, s"Interrupted operations: 
${interrupted.distinct}.")
+  }
+
+  test("interrupt tag") {
+val session = spark
+import session.implicits._
+
+// global ExecutionContext has only 2 threads in Apache Spark CI
+// create own thread pool for four Futures used in this test
+val numThreads = 4
+val fpool = ThreadUtils.newForkJoinPool("job-tags-test-thread-pool", 
numThreads)
+val executionContext = ExecutionContext.fromExecutorService(fpool)
+
+val q1 = Future {
+  assert(spark.getTags() == Set())
+  spark.addTag("two")
+  assert(spark.getTags() == Set("two"))
+  spark.clearTags() // check that clearing all tags works
+  assert(spark.getTags() == Set())
+  spark.addTag("one")
+  assert(spark.getTags() == Set("one"))
+  try {
+spark
+  .range(10)
+  .map(n => {
+Thread.sleep(3); n
+  })
+  .collect()
+  } finally {
+spark.clearTags() // clear for the case of thread reuse by another 
Future
+  }
+}(executionContext)
+val q2 = Future {
+  assert(spark.getTags() == Set())
+  spark.addTag("one")
+  spark.addTag("two")
+  spark.addTag("one")
+  spark.addTag("two") // duplicates shouldn't matter
+  try {
+spark
+  .range(10)
+  .map(n => {
+Thread.sleep(3); n
+  })
+  .collect()
+  } finally {
+spark.clearTags() // clear for the case of thread reuse by another 
Future
+  }
+}(executionContext)
+val q3 = Future {
+  assert(spark.getTags() == Set())
+  spark.addTag("foo")
+  spark.removeTag("foo")
+  assert(spark.getTags() == Set()) // check that remove works removing the 
last tag
+  spark.addTag("two")
+  assert(spark.getTags() == Set("two"))
+  try {
+spark
+  .range(10)
+  .map(n => {
+Thread.sleep(3); n
+  })
+  .collect()
+  } finally {
+spark.clearTags() // clear for the case of thread reuse by another 
Future
+  }
+}(executionContext)
+val q4 = Future {
+  assert(spark.getTags() == Set())
+  spark.addTag("one")
+  spark.addTag("two")
+  spark.addTag("two")
+  assert(spark.getTags() == Set("one", "two"))
+  spark.removeTag("two") // check that remove works, despite duplicate add
+  assert(spark.getTags() == Set("one"))
+  try {
+spark
+  .range(10)
+  .map(n => {
+Thread.sleep(3); n
+  })
+  .collect()
+  } finally {
+spark.clearTags() // clear for the case of thread reuse by another 
Future
+  }
+}(executionContext)
+val interrupted = mutable.ListBuffer[String]()
+
+// q2 and q3 should be cancelled
+interrupted.clear()
+eventually(timeout(20.seconds), interval(1.seconds)) {
+  val ids = spark.interruptTag("two")
+  interrupted ++= ids
+  assert(
+interrupted.distinct.length == 2,
+s"Interrupted operations: ${interrupted.distinct}.")
+}
+val e2 = intercept[SparkException] {
+  ThreadUtils.awaitResult(q2, 1.minute)
+}
+assert(e2.getCause.getMessage contains "OPERATION_CANCELED")
+val e3 = intercept[SparkException] {
+  ThreadUtils.awaitResult(q3, 1.minute)
+}
+assert(e3.getCause.getMessage contains "OPERATION_CANCELED")
+assert(interrupted.distinct.length == 2, s"Interrupted operations: 
${interrupted.distinct}.")
+
+// q1 and q4 should be cancelled
+interrupted.clear()
+eventually(timeout(20.seconds), interval(1.seconds)) {
+  val ids = spark.interruptTag("one")
+  interrupted ++= ids
+  assert(
+interrupted.distinct.length == 2,
+s"Interrupted operations: ${interrupted.distinct}.")
+}
+val e1 = intercept[SparkException] {
+  ThreadUtils.awaitResult(q1, 1.minute)
+}
+assert(e1.getCause.getMessage contains "OPERATION_CANCELED")
+val e4 = intercept[SparkException] {
+  ThreadUtils.awaitResult(q4, 1.minute)
+}
+assert(e4.getCause.getMessage contains "OPERATION_CANCELED")
+assert(interrupted.distinct.length == 2, s"Interrupted operations: 
${interrupted.distinct}.")
+  }
+
+  test("interrupt operation") {
+val session = spark
+import session.implicits._
+
+val result = spark
+  .range(10)
+  .map(n => {
+Thread.sleep(5000); n
+  })
+  .collectResult()

[GitHub] [spark] ericm-db opened a new pull request, #42080: Statestoresuite threadpool

2023-07-19 Thread via GitHub



ericm-db opened a new pull request, #42080:
URL: https://github.com/apache/spark/pull/42080

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] allisonwang-db commented on pull request #42075: [SPARK-43966][SQL][PYTHON] Support non-deterministic table-valued functions

2023-07-19 Thread via GitHub



allisonwang-db commented on PR #42075:
URL: https://github.com/apache/spark/pull/42075#issuecomment-1642927436

   cc @HyukjinKwon @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #40608: [SPARK-35198][CONNECT][CORE][PYTHON][SQL] Add support for calling debugCodegen from Python & Java

2023-07-19 Thread via GitHub



github-actions[bot] commented on PR #40608:
URL: https://github.com/apache/spark/pull/40608#issuecomment-1642927305

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #40728: [WIP][SPARK-39634][SQL] Allow file splitting in combination with row index generation

2023-07-19 Thread via GitHub



github-actions[bot] commented on PR #40728:
URL: https://github.com/apache/spark/pull/40728#issuecomment-1642927268

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xinrong-meng opened a new pull request, #42079: [WIP][SPARK-44486][PYTHON][CONNECT] Implement PyArrow `self_destruct` feature for `toPandas`

2023-07-19 Thread via GitHub



xinrong-meng opened a new pull request, #42079:
URL: https://github.com/apache/spark/pull/42079

   ### What changes were proposed in this pull request?
   Implement Arrow `self_destruct` of `toPandas` for memory savings.
   
   Now the Spark configuration 
`spark.sql.execution.arrow.pyspark.selfDestruct.enabled` can be used to enable 
PyArrow’s `self_destruct` feature in Spark Connect, which can save memory when 
creating a Pandas DataFrame via `toPandas` by freeing Arrow-allocated memory 
while building the Pandas DataFrame. 
   
   ### Why are the changes needed?
   Reach parity with vanilla PySpark.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   TBD


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #42072: [SPARK-44481][CONNECT][PYTHON] Make pyspark.sql.is_remote an API

2023-07-19 Thread via GitHub



HyukjinKwon closed pull request #42072: [SPARK-44481][CONNECT][PYTHON] Make 
pyspark.sql.is_remote an API
URL: https://github.com/apache/spark/pull/42072


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #42072: [SPARK-44481][CONNECT][PYTHON] Make pyspark.sql.is_remote an API

2023-07-19 Thread via GitHub



HyukjinKwon commented on PR #42072:
URL: https://github.com/apache/spark/pull/42072#issuecomment-1642913899

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #41831: [SPARK-44278][CONNECT] Implement a GRPC server interceptor that cleans up thread local properties

2023-07-19 Thread via GitHub



HyukjinKwon closed pull request #41831: [SPARK-44278][CONNECT] Implement a GRPC 
server interceptor that cleans up thread local properties
URL: https://github.com/apache/spark/pull/41831


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #41831: [SPARK-44278][CONNECT] Implement a GRPC server interceptor that cleans up thread local properties

2023-07-19 Thread via GitHub



HyukjinKwon commented on PR #41831:
URL: https://github.com/apache/spark/pull/41831#issuecomment-1642910764

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42072: [SPARK-44481][CONNECT][PYTHON] Make pyspark.sql.is_remote an API

2023-07-19 Thread via GitHub



HyukjinKwon commented on code in PR #42072:
URL: https://github.com/apache/spark/pull/42072#discussion_r1268791343


##
python/pyspark/sql/__init__.py:
##
@@ -72,4 +73,5 @@
 "DataFrameWriter",
 "DataFrameWriterV2",
 "PandasCogroupedOps",
+"is_remote",

Review Comment:
   I think we should just expose it under `pyspark.sql` because APIs are 
usually exposed under that namespace. I think you can use it as:
   
   ```python
   is_remote = getattr(pyspark.sql, "is_remote", getattr(pyspark.sql.utils, 
"is_remote", None))()
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] juliuszsompolski commented on a diff in pull request #42009: [SPARK-44422][CONNECT] Spark Connect fine grained interrupt

2023-07-19 Thread via GitHub



juliuszsompolski commented on code in PR #42009:
URL: https://github.com/apache/spark/pull/42009#discussion_r1268751980


##
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala:
##
@@ -40,6 +40,7 @@ private[sql] class SparkResult[T](
 extends AutoCloseable
 with Cleanable { self =>
 
+  private[this] var opId: String = null

Review Comment:
   Yeah, but I adapted to the surrounding style of
   ```
 private[this] var opId: String = null
 private[this] var numRecords: Int = 0
 private[this] var structType: StructType = _
 private[this] var arrowSchema: pojo.Schema = _
 private[this] var nextResultIndex: Int = 0
   ```
   From 
   ```
 def schema: StructType = {
   if (structType == null) {
 processResponses(stopOnSchema = true)
   }
   structType
 }
   ```
   it looks like I can assume `_` is null, so I could make it `_` and maybe 
that's more idiomatic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] asl3 commented on a diff in pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM

2023-07-19 Thread via GitHub



asl3 commented on code in PR #41711:
URL: https://github.com/apache/spark/pull/41711#discussion_r1268722533


##
dev/error_message_refiner.py:
##
@@ -0,0 +1,265 @@
+#!/usr/bin/env python3
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+Utility for refining error messages based on LLM.
+
+Usage:
+python error_message_refiner.py  [--model_name=]
+
+Arguments:
+   Required.
+The name of the error class to refine the messages 
for.
+The list of error classes is located in
+
`common/utils/src/main/resources/error/error-classes.json`.
+
+Options:
+--model_name= Optional.
+The version of Chat GPT to use for refining the 
error messages.
+If not provided, the default 
version("gpt-3.5-turbo") will be used.
+
+Example usage:
+python error_message_refiner.py CANNOT_DECODE_URL --model_name=gpt-4
+
+Description:
+This script refines error messages using the LLM based approach.
+It takes the name of the error class as a required argument and, 
optionally,
+allows specifying the version of Chat GPT to use for refining the messages.
+
+Options:
+--model_name: Specifies the version of Chat GPT.
+   If not provided, the default version("gpt-3.5-turbo") 
will be used.
+
+Note:
+- Ensure that the necessary dependencies are installed before running the 
script.
+- Ensure that the valid API key is entered in the `api-key.txt`.
+- The refined error messages will be displayed in the console output.
+- To use the gpt-4 model, you need to join the waitlist. Please refer to
+  https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4 for 
more details.
+"""
+
+import argparse
+import json
+import openai
+import re
+import subprocess
+import random
+import os
+from typing import Tuple, Optional
+
+from sparktestsupport import SPARK_HOME
+
+PATH_TO_ERROR_CLASS = 
f"{SPARK_HOME}/common/utils/src/main/resources/error/error-classes.json"
+# Register your own open API key for the environment variable.
+# The API key can be obtained from 
https://platform.openai.com/account/api-keys.
+OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
+
+openai.api_key = OPENAI_API_KEY
+
+
+def _git_grep_files(search_string: str, exclude: str = None) -> str:
+"""
+Executes 'git grep' command to search for files containing the given 
search string.
+Returns the file path where the search string is found.
+"""
+result = subprocess.run(
+["git", "grep", "-l", search_string, "--", f"{SPARK_HOME}/*.scala"],
+capture_output=True,
+text=True,
+)
+output = result.stdout.strip()
+
+files = output.split("\n")
+files = [file for file in files if "Suite" not in file]
+if exclude is not None:
+files = [file for file in files if exclude not in file]
+file = random.choice(files)
+return file
+
+
+def _find_function(file_name: str, search_string: str) -> Optional[str]:
+"""
+Searches for a function in the given file containing the specified search 
string.
+Returns the name of the function if found, otherwise None.
+"""
+with open(file_name, "r") as file:
+content = file.read()
+functions = re.findall(r"def\s+(\w+)\s*\(", content)
+
+for function in functions:
+function_content = re.search(
+
rf"def\s+{re.escape(function)}(?:(?!def).)*?{re.escape(search_string)}",
+content,
+re.DOTALL,
+)
+if function_content and search_string in function_content.group(0):
+return function
+
+return None
+
+
+def _find_func_body(file_name: str, search_string: str) -> Optional[str]:
+"""
+Searches for a function body in the given file containing the specified 
search string.
+Returns the function body if found, otherwise None.
+"""
+with open(file_name, "r") as file:
+content = file.read()
+functions = re.findall(r"def\s+(\w+)\s*\(", content)
+
+for function in functions:
+function_content = re.search(
+

[GitHub] [spark] asl3 commented on a diff in pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM

2023-07-19 Thread via GitHub



asl3 commented on code in PR #41711:
URL: https://github.com/apache/spark/pull/41711#discussion_r1268722533


##
dev/error_message_refiner.py:
##
@@ -0,0 +1,265 @@
+#!/usr/bin/env python3
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+Utility for refining error messages based on LLM.
+
+Usage:
+python error_message_refiner.py  [--model_name=]
+
+Arguments:
+   Required.
+The name of the error class to refine the messages 
for.
+The list of error classes is located in
+
`common/utils/src/main/resources/error/error-classes.json`.
+
+Options:
+--model_name= Optional.
+The version of Chat GPT to use for refining the 
error messages.
+If not provided, the default 
version("gpt-3.5-turbo") will be used.
+
+Example usage:
+python error_message_refiner.py CANNOT_DECODE_URL --model_name=gpt-4
+
+Description:
+This script refines error messages using the LLM based approach.
+It takes the name of the error class as a required argument and, 
optionally,
+allows specifying the version of Chat GPT to use for refining the messages.
+
+Options:
+--model_name: Specifies the version of Chat GPT.
+   If not provided, the default version("gpt-3.5-turbo") 
will be used.
+
+Note:
+- Ensure that the necessary dependencies are installed before running the 
script.
+- Ensure that the valid API key is entered in the `api-key.txt`.
+- The refined error messages will be displayed in the console output.
+- To use the gpt-4 model, you need to join the waitlist. Please refer to
+  https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4 for 
more details.
+"""
+
+import argparse
+import json
+import openai
+import re
+import subprocess
+import random
+import os
+from typing import Tuple, Optional
+
+from sparktestsupport import SPARK_HOME
+
+PATH_TO_ERROR_CLASS = 
f"{SPARK_HOME}/common/utils/src/main/resources/error/error-classes.json"
+# Register your own open API key for the environment variable.
+# The API key can be obtained from 
https://platform.openai.com/account/api-keys.
+OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
+
+openai.api_key = OPENAI_API_KEY
+
+
+def _git_grep_files(search_string: str, exclude: str = None) -> str:
+"""
+Executes 'git grep' command to search for files containing the given 
search string.
+Returns the file path where the search string is found.
+"""
+result = subprocess.run(
+["git", "grep", "-l", search_string, "--", f"{SPARK_HOME}/*.scala"],
+capture_output=True,
+text=True,
+)
+output = result.stdout.strip()
+
+files = output.split("\n")
+files = [file for file in files if "Suite" not in file]
+if exclude is not None:
+files = [file for file in files if exclude not in file]
+file = random.choice(files)
+return file
+
+
+def _find_function(file_name: str, search_string: str) -> Optional[str]:
+"""
+Searches for a function in the given file containing the specified search 
string.
+Returns the name of the function if found, otherwise None.
+"""
+with open(file_name, "r") as file:
+content = file.read()
+functions = re.findall(r"def\s+(\w+)\s*\(", content)
+
+for function in functions:
+function_content = re.search(
+
rf"def\s+{re.escape(function)}(?:(?!def).)*?{re.escape(search_string)}",
+content,
+re.DOTALL,
+)
+if function_content and search_string in function_content.group(0):
+return function
+
+return None
+
+
+def _find_func_body(file_name: str, search_string: str) -> Optional[str]:
+"""
+Searches for a function body in the given file containing the specified 
search string.
+Returns the function body if found, otherwise None.
+"""
+with open(file_name, "r") as file:
+content = file.read()
+functions = re.findall(r"def\s+(\w+)\s*\(", content)
+
+for function in functions:
+function_content = re.search(
+

[GitHub] [spark] asl3 commented on a diff in pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM

2023-07-19 Thread via GitHub



asl3 commented on code in PR #41711:
URL: https://github.com/apache/spark/pull/41711#discussion_r1268719990


##
dev/error_message_refiner.py:
##
@@ -0,0 +1,265 @@
+#!/usr/bin/env python3
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+Utility for refining error messages based on LLM.
+
+Usage:
+python error_message_refiner.py  [--model_name=]
+
+Arguments:
+   Required.
+The name of the error class to refine the messages 
for.
+The list of error classes is located in
+
`common/utils/src/main/resources/error/error-classes.json`.
+
+Options:
+--model_name= Optional.
+The version of Chat GPT to use for refining the 
error messages.
+If not provided, the default 
version("gpt-3.5-turbo") will be used.
+
+Example usage:
+python error_message_refiner.py CANNOT_DECODE_URL --model_name=gpt-4
+
+Description:
+This script refines error messages using the LLM based approach.
+It takes the name of the error class as a required argument and, 
optionally,
+allows specifying the version of Chat GPT to use for refining the messages.
+
+Options:
+--model_name: Specifies the version of Chat GPT.
+   If not provided, the default version("gpt-3.5-turbo") 
will be used.
+
+Note:
+- Ensure that the necessary dependencies are installed before running the 
script.
+- Ensure that the valid API key is entered in the `api-key.txt`.
+- The refined error messages will be displayed in the console output.
+- To use the gpt-4 model, you need to join the waitlist. Please refer to
+  https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4 for 
more details.
+"""
+
+import argparse
+import json
+import openai
+import re
+import subprocess
+import random
+import os
+from typing import Tuple, Optional
+
+from sparktestsupport import SPARK_HOME
+
+PATH_TO_ERROR_CLASS = 
f"{SPARK_HOME}/common/utils/src/main/resources/error/error-classes.json"
+# Register your own open API key for the environment variable.
+# The API key can be obtained from 
https://platform.openai.com/account/api-keys.
+OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
+
+openai.api_key = OPENAI_API_KEY
+
+
+def _git_grep_files(search_string: str, exclude: str = None) -> str:
+"""
+Executes 'git grep' command to search for files containing the given 
search string.
+Returns the file path where the search string is found.
+"""
+result = subprocess.run(
+["git", "grep", "-l", search_string, "--", f"{SPARK_HOME}/*.scala"],
+capture_output=True,
+text=True,
+)
+output = result.stdout.strip()
+
+files = output.split("\n")
+files = [file for file in files if "Suite" not in file]
+if exclude is not None:
+files = [file for file in files if exclude not in file]
+file = random.choice(files)
+return file
+
+
+def _find_function(file_name: str, search_string: str) -> Optional[str]:
+"""
+Searches for a function in the given file containing the specified search 
string.
+Returns the name of the function if found, otherwise None.
+"""
+with open(file_name, "r") as file:
+content = file.read()
+functions = re.findall(r"def\s+(\w+)\s*\(", content)
+
+for function in functions:
+function_content = re.search(
+
rf"def\s+{re.escape(function)}(?:(?!def).)*?{re.escape(search_string)}",
+content,
+re.DOTALL,
+)
+if function_content and search_string in function_content.group(0):
+return function
+
+return None
+
+
+def _find_func_body(file_name: str, search_string: str) -> Optional[str]:
+"""
+Searches for a function body in the given file containing the specified 
search string.
+Returns the function body if found, otherwise None.
+"""
+with open(file_name, "r") as file:
+content = file.read()
+functions = re.findall(r"def\s+(\w+)\s*\(", content)
+
+for function in functions:
+function_content = re.search(
+

[GitHub] [spark] grundprinzip commented on a diff in pull request #42009: [SPARK-44422][CONNECT] Spark Connect fine grained interrupt

2023-07-19 Thread via GitHub



grundprinzip commented on code in PR #42009:
URL: https://github.com/apache/spark/pull/42009#discussion_r1268715542


##
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala:
##
@@ -40,6 +40,7 @@ private[sql] class SparkResult[T](
 extends AutoCloseable
 with Cleanable { self =>
 
+  private[this] var opId: String = null

Review Comment:
   is using `Option[String]` more ideomatic?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mathewjacob1002 opened a new pull request, #42078: [WIP][DO NOT REVIEW] Testing stuff

2023-07-19 Thread via GitHub



mathewjacob1002 opened a new pull request, #42078:
URL: https://github.com/apache/spark/pull/42078

   First commit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] juliuszsompolski commented on pull request #42009: [SPARK-44422][CONNECT] Spark Connect fine grained interrupt

2023-07-19 Thread via GitHub



juliuszsompolski commented on PR #42009:
URL: https://github.com/apache/spark/pull/42009#issuecomment-1642777980

   cc @zhenlineo @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] WweiL opened a new pull request, #42077: [SPARK-44484][SS]Add batchDuration to StreamingQueryProgress json method

2023-07-19 Thread via GitHub



WweiL opened a new pull request, #42077:
URL: https://github.com/apache/spark/pull/42077

   
   
   ### What changes were proposed in this pull request?
   
   Add the missing field batchDuration to StreamingQueryProgress json method. 
Also modify tests accordingly
   
   ### Why are the changes needed?
   
   Add a missing field
   
   ### Does this PR introduce _any_ user-facing change?
   
   Probably yes - in their call to `query.lastProgress` or 
`query.recentProgress` and inside listener this new field will show up
   
   ### How was this patch tested?
   
   Existing unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ueshin commented on a diff in pull request #41948: [SPARK-44380][SQL][PYTHON] Support for Python UDTF to analyze in Python

2023-07-19 Thread via GitHub



ueshin commented on code in PR #41948:
URL: https://github.com/apache/spark/pull/41948#discussion_r1268593999


##
python/pyspark/sql/udtf.py:
##
@@ -153,6 +175,19 @@ def _validate_udtf_handler(cls: Any) -> None:
 error_class="INVALID_UDTF_NO_EVAL", message_parameters={"name": 
cls.__name__}
 )
 
+has_analyze_staticmethod = hasattr(cls, "analyze") and isinstance(
+inspect.getattr_static(cls, "analyze"), staticmethod

Review Comment:
   So now at 
[4e2b6be](https://github.com/apache/spark/pull/41948/commits/4e2b6befbd015fcdad1ddf6ccecd79a630ac7d60),
   - When `returnType` is not provided, the class must have `analyze` static 
method; otherwise raise an error `INVALID_UDTF_RETURN_TYPE`.
   - When `returnType` is provided and also the class has any `analyze` 
attribute, raise an error `INVALID_UDTF_BOTH_RETURN_TYPE_AND_ANALYZE`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] hvanhovell opened a new pull request, #42076: [SPARK-44449][CONNECT] Upcasting for direct Arrow Deserialization

2023-07-19 Thread via GitHub



hvanhovell opened a new pull request, #42076:
URL: https://github.com/apache/spark/pull/42076

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dtenedor commented on a diff in pull request #41948: [SPARK-44380][SQL][PYTHON] Support for Python UDTF to analyze in Python

2023-07-19 Thread via GitHub



dtenedor commented on code in PR #41948:
URL: https://github.com/apache/spark/pull/41948#discussion_r1268552267


##
python/pyspark/sql/udtf.py:
##
@@ -153,6 +175,19 @@ def _validate_udtf_handler(cls: Any) -> None:
 error_class="INVALID_UDTF_NO_EVAL", message_parameters={"name": 
cls.__name__}
 )
 
+has_analyze_staticmethod = hasattr(cls, "analyze") and isinstance(
+inspect.getattr_static(cls, "analyze"), staticmethod

Review Comment:
   I would recommend to return an error in this case if it's easy to do so, in 
order to prevent confusion. This would be in addition to the existing error 
that @ueshin pointed out that we are already returning:
   
   > It IS checking if the UDTF class has an analyze method and throw an error 
if it is not static, and the error message will be like:
   ```
   The UDTF '' is invalid. It does not specify its return type or 
implement the required 'analyze' static method. Please specify the return type 
or implement the 'analyze' static method in '' and try again.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] juliuszsompolski commented on a diff in pull request #42009: [SPARK-44422][CONNECT] Spark Connect fine grained interrupt

2023-07-19 Thread via GitHub



juliuszsompolski commented on code in PR #42009:
URL: https://github.com/apache/spark/pull/42009#discussion_r1268542613


##
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##
@@ -613,16 +613,30 @@ class SparkSession private[sql] (
   /**
* Interrupt all operations of this session currently running on the 
connected server.
*
-   * TODO/WIP: Currently it will interrupt the Spark Jobs running on the 
server, triggered from
-   * ExecutePlan requests. If an operation is not running a Spark Job, it 
becomes an noop and the
-   * operation will continue afterwards, possibly with more Spark Jobs.
-   *

Review Comment:
   This has actually been fixed by https://github.com/apache/spark/pull/41315
   (now the execution is in different thread, and the interrupt interrupts that 
thread, not only Spark Jobs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ueshin commented on pull request #41948: [SPARK-44380][SQL][PYTHON] Support for Python UDTF to analyze in Python

2023-07-19 Thread via GitHub



ueshin commented on PR #41948:
URL: https://github.com/apache/spark/pull/41948#issuecomment-1642586143

   The failing test is [Spark on Kubernetes Integration 
test](https://github.com/ueshin/apache-spark/actions/runs/5596078914/jobs/10247102939#logs)
 that seems to be broken in `master` branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] allisonwang-db opened a new pull request, #42075: [SPARK-43966][SQL][PYTHON] Support non-deterministic table-valued functions

2023-07-19 Thread via GitHub



allisonwang-db opened a new pull request, #42075:
URL: https://github.com/apache/spark/pull/42075

   
   
   ### What changes were proposed in this pull request?
   
   This PR supports non-deterministic table-valued functions. More 
specifically, it supports running non-deterministic Python UDTFs and built-in 
table-valued generator functions with non-deterministic input values.
   
   ### Why are the changes needed?
   
   To make table-valued functions more versatile.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. Before this PR, Spark will throw an exception when running a 
non-deterministic Python UDTF:
   ```
   select * from random_udtf(1)
   AnalysisException: [INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator 
expects a deterministic expression, 
   ```
   
   After this PR, it is supported.
   
   ### How was this patch tested?
   
   Existing and new unit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] siying opened a new pull request, #42074: [SPARK-44464][SS] Fix applyInPandasWithStatePythonRunner to output rows that have Null as first column value

2023-07-19 Thread via GitHub



siying opened a new pull request, #42074:
URL: https://github.com/apache/spark/pull/42074

   Change the serialization format for group-by-with-state outputs: include an 
explicit hidden column indicating how many data and state records there are.
   
   The current implementation of ApplyInPandasWithStatePythonRunner cannot deal 
with outputs where the first column of the row is null, as it cannot 
distinguish the case where the column is null, or the field is filled as the 
number of data records are smaller than state records. It causes incorrect 
results for the former case.
   
   No
   
   Add unit tests that cover null cases and different other scenarios.
   
   Closes #42046 from siying/pypanda.
   
   Authored-by: Siying Dong 
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] siying commented on pull request #42046: [SPARK-44464][SS] Implement applyInPandasWithState in PySpark

2023-07-19 Thread via GitHub



siying commented on PR #42046:
URL: https://github.com/apache/spark/pull/42046#issuecomment-1642557386

   > @siying There was a conflict. Could you please create a PR against 
branch-3.4? Thanks in advance!
   > 
   > (Btw, I didn't indicate that title is not accurate. Could you please fix 
the title when you submit a PR for 3.4?)
   
   Oops. I didn't realize the title was wrong. Perhaps I did a wrong copy


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] siying commented on pull request #42046: [SPARK-44464][SS] Implement applyInPandasWithState in PySpark

2023-07-19 Thread via GitHub



siying commented on PR #42046:
URL: https://github.com/apache/spark/pull/42046#issuecomment-1642555659

   Sure. Will do that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ericm-db commented on pull request #42066: [SPARK-44480][SS] Use thread pool to perform maintenance activity for hdfs/rocksdb state store providers

2023-07-19 Thread via GitHub



ericm-db commented on PR #42066:
URL: https://github.com/apache/spark/pull/42066#issuecomment-1642470993

   > CI looks failing. Could you please look into it? 
https://github.com/ericm-db/spark/actions/runs/5595467996/jobs/10231376237
   
   @HeartSaVioR looks like I was setting the runnable maintenance task 
incorrectly. It should be fine now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #41928: [SPARK-44475][SQL][CONNECT] Relocate DataType and Parser to sql/api

2023-07-19 Thread via GitHub



amaliujia commented on code in PR #41928:
URL: https://github.com/apache/spark/pull/41928#discussion_r1268235385


##
sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkAnalysisUtils.scala:
##
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.errors.DataTypeErrors
+
+trait SparkAnalysisUtils {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #41928: [SPARK-44475][SQL][CONNECT] Relocate DataType and Parser to sql/api

2023-07-19 Thread via GitHub



amaliujia commented on code in PR #41928:
URL: https://github.com/apache/spark/pull/41928#discussion_r1268233819


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala:
##
@@ -16,34 +16,20 @@
  */
 package org.apache.spark.sql.catalyst.parser
 
-import java.lang.{Long => JLong}
-import java.nio.CharBuffer
 import java.util
 import java.util.Locale
 
 import org.antlr.v4.runtime.{ParserRuleContext, Token}
 import org.antlr.v4.runtime.misc.Interval
-import org.antlr.v4.runtime.tree.{ParseTree, TerminalNode, TerminalNodeImpl}
+import org.antlr.v4.runtime.tree.{ParseTree, TerminalNodeImpl}
 
-import org.apache.spark.sql.catalyst.trees.{CurrentOrigin, Origin}
+import org.apache.spark.sql.catalyst.util.SparkParserUtils
 import org.apache.spark.sql.errors.QueryParsingErrors
 
 /**
  * A collection of utility methods for use during the parsing process.
  */
-object ParserUtils {
-
-  val U16_CHAR_PATTERN = """\\u([a-fA-F0-9]{4})(?s).*""".r
-  val U32_CHAR_PATTERN = """\\U([a-fA-F0-9]{8})(?s).*""".r
-  val OCTAL_CHAR_PATTERN = """\\([01][0-7]{2})(?s).*""".r
-  val ESCAPED_CHAR_PATTERN = """\\((?s).)(?s).*""".r
-
-  /** Get the command which created the token. */
-  def command(ctx: ParserRuleContext): String = {
-val stream = ctx.getStart.getInputStream
-stream.getText(Interval.of(0, stream.size() - 1))
-  }
-
+object ParserUtils extends SparkParserUtils {

Review Comment:
   We could. I happed to move the least part that needed in this refactoring PR 
and it seems to be enough for moving a small portion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #41349: [SPARK-43839][SQL] Convert `_LEGACY_ERROR_TEMP_1337` to `UNSUPPORTED_FEATURE.TIME_TRAVEL`

2023-07-19 Thread via GitHub



cloud-fan commented on code in PR #41349:
URL: https://github.com/apache/spark/pull/41349#discussion_r1268198286


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala:
##
@@ -89,10 +89,12 @@ class V2SessionCatalog(catalog: SessionCatalog)
   throw QueryCompilationErrors.timeTravelUnsupportedError(

Review Comment:
   do we still need the `if-else`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #41349: [SPARK-43839][SQL] Convert `_LEGACY_ERROR_TEMP_1337` to `UNSUPPORTED_FEATURE.TIME_TRAVEL`

2023-07-19 Thread via GitHub



cloud-fan commented on code in PR #41349:
URL: https://github.com/apache/spark/pull/41349#discussion_r1268192638


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala:
##
@@ -89,10 +89,12 @@ class V2SessionCatalog(catalog: SessionCatalog)
   throw QueryCompilationErrors.timeTravelUnsupportedError(
 toSQLId(catalogTable.identifier.nameParts))
 } else {
-  throw QueryCompilationErrors.tableNotSupportTimeTravelError(ident)

Review Comment:
   can we remove `tableNotSupportTimeTravelError` from 
`QueryCompilationErrors`? @panbingkun @beliefer 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #41850: [SPARK-44292][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2315-2319]

2023-07-19 Thread via GitHub



cloud-fan commented on code in PR #41850:
URL: https://github.com/apache/spark/pull/41850#discussion_r1268138809


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala:
##
@@ -277,13 +277,13 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
 e.dataTypeMismatch(e, checkRes)
   case TypeCheckResult.TypeCheckFailure(message) =>
 e.setTagValue(DATA_TYPE_MISMATCH_ERROR, true)
-extraHintForAnsiTypeCoercionExpression(operator)
+val extraHint = 
extraHintForAnsiTypeCoercionExpression(operator)
 e.failAnalysis(
-  errorClass = "_LEGACY_ERROR_TEMP_2315",
+  errorClass = "TYPE_CHECK_FAILURE_WITH_HINT",

Review Comment:
   This should be `DATATYPE_MISMATCH. TYPE_CHECK_FAILURE_WITH_HINT`. Can we add 
a test? @beliefer 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] hvanhovell closed pull request #42011: [SPARK-44396][Connect] Direct Arrow Deserialization

2023-07-19 Thread via GitHub



hvanhovell closed pull request #42011: [SPARK-44396][Connect] Direct Arrow 
Deserialization
URL: https://github.com/apache/spark/pull/42011


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] hvanhovell commented on pull request #42011: [SPARK-44396][Connect] Direct Arrow Deserialization

2023-07-19 Thread via GitHub



hvanhovell commented on PR #42011:
URL: https://github.com/apache/spark/pull/42011#issuecomment-1642080475

   Merging this. Test failure is unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jchen5 commented on pull request #42007: [SPARK-44431][SQL] Fix behavior of null IN (empty list) in optimization rules

2023-07-19 Thread via GitHub



jchen5 commented on PR #42007:
URL: https://github.com/apache/spark/pull/42007#issuecomment-1642057004

   @cloud-fan PR is updated


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] heyihong commented on pull request #41831: [SPARK-44278][CONNECT] Implement a GRPC server interceptor that cleans up thread local properties

2023-07-19 Thread via GitHub



heyihong commented on PR #41831:
URL: https://github.com/apache/spark/pull/41831#issuecomment-1642046358

   > Should this interceptor be included by default (as the outermost 
interceptor)?
   
   I am not sure... But maybe we can do this in a separate pr if needed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] harupy commented on a diff in pull request #42072: [SPARK-44481][CONNECT][PYTHON] Make pyspark.sql.is_remote an API

2023-07-19 Thread via GitHub



harupy commented on code in PR #42072:
URL: https://github.com/apache/spark/pull/42072#discussion_r1267980107


##
python/pyspark/sql/__init__.py:
##
@@ -72,4 +73,5 @@
 "DataFrameWriter",
 "DataFrameWriterV2",
 "PandasCogroupedOps",
+"is_remote",

Review Comment:
   Can we expose this as `pyspark.sql.utils.is_remote`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #41932: [SPARK-44131][SQL][PYTHON][CONNECT][FOLLOWUP] Support qualified function name for call_function

2023-07-19 Thread via GitHub



beliefer commented on code in PR #41932:
URL: https://github.com/apache/spark/pull/41932#discussion_r1267967007


##
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala:
##
@@ -139,7 +139,18 @@ object SparkConnectServerUtils {
   .map(clientTestJar => Seq("--jars", clientTestJar.getCanonicalPath))
   .getOrElse(Seq.empty)
 
-writerV2Configs ++ hiveTestConfigs ++ udfTestConfigs

Review Comment:
   Thank you for the double check. @LuciferYang 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #41932: [SPARK-44131][SQL][PYTHON][CONNECT][FOLLOWUP] Support qualified function name for call_function

2023-07-19 Thread via GitHub



LuciferYang commented on code in PR #41932:
URL: https://github.com/apache/spark/pull/41932#discussion_r1267940632


##
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala:
##
@@ -139,7 +139,18 @@ object SparkConnectServerUtils {
   .map(clientTestJar => Seq("--jars", clientTestJar.getCanonicalPath))
   .getOrElse(Seq.empty)
 
-writerV2Configs ++ hiveTestConfigs ++ udfTestConfigs

Review Comment:
   Both maven and sbt ok now, thanks @beliefer 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun commented on pull request #42073: [SPARK-44482][CONNECT] Connect server should can specify the bind address

2023-07-19 Thread via GitHub



panbingkun commented on PR #42073:
URL: https://github.com/apache/spark/pull/42073#issuecomment-1641909732

   The manual testing process is as follows:
   1.My local env as follows:
   172.xxx.xxx.xxx
   
   2.Set the following configuration in `spark-defauls.conf`
   - spark.connect.grpc.binding.address 127.0.0.1
   - spark.connect.grpc.binding.port 5000`
   
   3.Start connect server
   sh sbin/stop-connect-server.sh
   
   4.Specify the host name and port to connect to the server
   - Success Scenario:
 https://github.com/apache/spark/assets/15246973/01e2b7ce-5e1c-4c3b-b321-e97df43b8a26;>
   
   - Failure scenario:
   https://github.com/apache/spark/assets/15246973/d7de08dc-ac46-4e45-98ca-2d54646fbdbb;>
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun opened a new pull request, #42073: [SPARK-44482][CONNECT] Connect server should can specify the bind address

2023-07-19 Thread via GitHub



panbingkun opened a new pull request, #42073:
URL: https://github.com/apache/spark/pull/42073

   ### What changes were proposed in this pull request?
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   ### How was this patch tested?
   - Manually test.
   - Pass GA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #41932: [SPARK-44131][SQL][PYTHON][CONNECT][FOLLOWUP] Support qualified function name for call_function

2023-07-19 Thread via GitHub



beliefer commented on code in PR #41932:
URL: https://github.com/apache/spark/pull/41932#discussion_r1267892365


##
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala:
##
@@ -1161,6 +1161,27 @@ class ClientE2ETestSuite extends RemoteSparkSession with 
SQLHelper with PrivateM
 val joined = ds1.joinWith(ds2, $"a.value._1" === $"b.value._2", "inner")
 checkSameResult(Seq((Some((2, 3)), Some((1, 2, joined)
   }
+
+  test("call_function") {
+val session: SparkSession = spark
+import session.implicits._
+val testData = spark.range(5).repartition(1)
+try {
+  session.sql("CREATE FUNCTION custom_sum AS 
'test.org.apache.spark.sql.MyDoubleSum'")

Review Comment:
   I have fixed the issue. Please wait for the CI



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon opened a new pull request, #42072: [SPARK-44481][CONNECT][PYTHON] Make pyspark.sql.is_remote an API

2023-07-19 Thread via GitHub



HyukjinKwon opened a new pull request, #42072:
URL: https://github.com/apache/spark/pull/42072

   ### What changes were proposed in this pull request?
   
   ### Why are the changes needed?
   
   For the end users to be able to do if-else, e.g., for dispatching the code 
path to the legacy mode or connect mode.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, it exposes a method as an API.
   
   ### How was this patch tested?
   
   Manually built and checked the documentation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn closed pull request #42054: [SPARK-44470][BUILD] Setting version to 4.0.0-SNAPSHOT

2023-07-19 Thread via GitHub



yaooqinn closed pull request #42054: [SPARK-44470][BUILD] Setting version to 
4.0.0-SNAPSHOT
URL: https://github.com/apache/spark/pull/42054


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #42054: [SPARK-44470][BUILD] Setting version to 4.0.0-SNAPSHOT

2023-07-19 Thread via GitHub



yaooqinn commented on PR #42054:
URL: https://github.com/apache/spark/pull/42054#issuecomment-1641794593

   CLOSE as duplicated and fixed by SPARK-44467


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Deependra-Patel opened a new pull request, #42071: [SPARK-44209] Expose amount of shuffle data available on the node

2023-07-19 Thread via GitHub



Deependra-Patel opened a new pull request, #42071:
URL: https://github.com/apache/spark/pull/42071

   This will be available as external shuffle service metric
   
   ### What changes were proposed in this pull request?
   Adding three more metrics to ShuffleMetrics (exposed by External Shuffle 
Service) "totalShuffleDataBytes", "numAppsWithShuffleData" and 
"lastNodeShuffleMetricRefreshEpochMillis"
   
   We add a new daemon that scans the disk every 30s (configurable) and sums up 
total size of the shuffle data.
   
   
   ### Why are the changes needed?
   Adding these metrics would help in -
   1. Deciding if we can decommission the node if no shuffle data present
   2. Better live monitoring of customer's workload to see if there is skewed 
shuffle data present on the node
   
   ### Does this PR introduce _any_ user-facing change?
   This documentation will need to be updated on next release:
   
https://spark.apache.org/docs/latest/monitoring.html#component-instance--shuffleservice
   
   ### How was this patch tested?
   UTs are added
   Also tested manually on a YARN cluster and metrics are getting published
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #42070: [MINOR][INFRA] Update the labeler for CORE and CONNECT

2023-07-19 Thread via GitHub



HyukjinKwon closed pull request #42070: [MINOR][INFRA] Update the labeler for 
CORE and CONNECT
URL: https://github.com/apache/spark/pull/42070


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #42070: [MINOR][INFRA] Update the labeler for CORE and CONNECT

2023-07-19 Thread via GitHub



HyukjinKwon commented on PR #42070:
URL: https://github.com/apache/spark/pull/42070#issuecomment-1641707945

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #42068: [SPARK-44361][SQL][FOLLOW-UP] Remove unused variables and fix import statements

2023-07-19 Thread via GitHub



HyukjinKwon commented on PR #42068:
URL: https://github.com/apache/spark/pull/42068#issuecomment-1641706328

   Merged to master and branch-3.5.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #42068: [SPARK-44361][SQL][FOLLOW-UP] Remove unused variables and fix import statements

2023-07-19 Thread via GitHub



HyukjinKwon closed pull request #42068: [SPARK-44361][SQL][FOLLOW-UP] Remove 
unused variables and fix import statements
URL: https://github.com/apache/spark/pull/42068


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Hisoka-X commented on pull request #41949: [SPARK-44375][SQL] Use PartitionEvaluator API in DebugExec

2023-07-19 Thread via GitHub



Hisoka-X commented on PR #41949:
URL: https://github.com/apache/spark/pull/41949#issuecomment-1641693941

   Thanks @cloud-fan @viirya @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #41951: [SPARK-44367][SQL][UI] Show error message on UI for each failed query

2023-07-19 Thread via GitHub



yaooqinn commented on PR #41951:
URL: https://github.com/apache/spark/pull/41951#issuecomment-1641676820

   How about this?
   
   
![image](https://github.com/apache/spark/assets/8326978/7a466124-f28a-4a84-944c-6fbeb577145e)
   
   
   
![image](https://github.com/apache/spark/assets/8326978/ca1fca1b-9733-4bcc-a708-6b229d3c7509)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #42067: [SPARK-44264][ML][PYTHON] Support Distributed Training of Functions Using Deepspeed

2023-07-19 Thread via GitHub



HyukjinKwon closed pull request #42067: [SPARK-44264][ML][PYTHON] Support 
Distributed Training of Functions Using Deepspeed
URL: https://github.com/apache/spark/pull/42067


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #42067: [SPARK-44264][ML][PYTHON] Support Distributed Training of Functions Using Deepspeed

2023-07-19 Thread via GitHub



HyukjinKwon commented on PR #42067:
URL: https://github.com/apache/spark/pull/42067#issuecomment-1641654294

   Merged to master and branch-3.5.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vicennial commented on a diff in pull request #42069: [WIP] Fix class loading problem caused by stub user classes not found on the server classpath

2023-07-19 Thread via GitHub



vicennial commented on code in PR #42069:
URL: https://github.com/apache/spark/pull/42069#discussion_r1267729075


##
core/src/main/scala/org/apache/spark/executor/Executor.scala:
##
@@ -860,6 +860,14 @@ private[spark] class Executor(
 }
   }
 
+  private def getIsolatedSession(
+  taskDescription: TaskDescription) = {
+taskDescription.artifacts.state match {
+  case Some(jobArtifactState) =>
+isolatedSessionCache.get(jobArtifactState.uuid, () => 
newSessionState(jobArtifactState))
+  case _ => defaultSessionState
+}
+  }

Review Comment:
   Nit: This refactoring is unnecessary here. This also modifies [OSS-synced 
code](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L516-L521)
 so we should leave it as it is 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vicennial commented on a diff in pull request #42069: [WIP] Fix class loading problem caused by stub user classes not found on the server classpath

2023-07-19 Thread via GitHub



vicennial commented on code in PR #42069:
URL: https://github.com/apache/spark/pull/42069#discussion_r1267726085


##
core/src/main/java/org/apache/spark/util/ChildFirstURLClassLoader.java:
##
@@ -40,6 +40,15 @@ public ChildFirstURLClassLoader(URL[] urls, ClassLoader 
parent) {
 this.parent = new ParentClassLoader(parent);
   }
 
+  /**
+   * Specify the realParent if there is a need to load in the order of
+   * `realParent -> urls (child) -> parent`.

Review Comment:
   What does `realParent` mean here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vicennial commented on a diff in pull request #42069: [WIP] Fix class loading problem caused by stub user classes not found on the server classpath

2023-07-19 Thread via GitHub



vicennial commented on code in PR #42069:
URL: https://github.com/apache/spark/pull/42069#discussion_r1267724082


##
core/src/main/scala/org/apache/spark/executor/Executor.scala:
##
@@ -543,11 +538,16 @@ private[spark] class Executor(
 // requires access to properties contained within (e.g. for access 
control).
 Executor.taskDeserializationProps.set(taskDescription.properties)
 
-updateDependencies(
+val updated = updateDependencies(
   taskDescription.artifacts.files,
   taskDescription.artifacts.jars,
   taskDescription.artifacts.archives,
   isolatedSession)
+if (updated) {
+  // reset the thread class loader
+  val newIsolatedSession = getIsolatedSession(taskDescription)
+  
Thread.currentThread.setContextClassLoader(newIsolatedSession.replClassLoader)
+}

Review Comment:
   `updateDependencies` is called in several places, could we integrate this 
logic into the method itself to prevent cases where we end up using the 
outdated instance? 
   
   Maybe we can leave the 
`Thread.currentThread.setContextClassLoader(newIsolatedSession.replClassLoader)`
 part here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vicennial commented on a diff in pull request #42069: [WIP] Fix class loading problem caused by stub user classes not found on the server classpath

2023-07-19 Thread via GitHub



vicennial commented on code in PR #42069:
URL: https://github.com/apache/spark/pull/42069#discussion_r1267720733


##
core/src/main/scala/org/apache/spark/executor/Executor.scala:
##
@@ -1102,12 +1117,19 @@ private[spark] class Executor(
   val url = new File(root, localName).toURI.toURL
   if (!state.urlClassLoader.getURLs().contains(url)) {
 logInfo(s"Adding $url to class loader")
-state.urlClassLoader.addURL(url)
+// TODO: make use of the session cache for the class loader.
+// Currently we invalidate all when adding a new url to always 
clear the stubbed
+// classes in the current repl class loader.
+// This is not always needed if the newly added jar does not 
contains any stubbed
+// classes.
+isolatedSessionCache.invalidateAll()

Review Comment:
   Shouldn't we only invalidate the `isolatedSession` that is being updated? 
Each session has its own individual `URLClassLoader`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] hvanhovell commented on a diff in pull request #41928: [SPARK-44475][SQL][CONNECT] Relocate DataType and Parser to sql/api

2023-07-19 Thread via GitHub



hvanhovell commented on code in PR #41928:
URL: https://github.com/apache/spark/pull/41928#discussion_r1267677534


##
sql/api/src/main/scala/org/apache/spark/sql/types/Decimal.scala:
##
@@ -143,7 +142,7 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
   // the scale is 2. The expected precision should be 2.
   this._precision = decimal.scale
   this._scale = decimal.scale
-} else if (decimal.scale < 0 && 
!SQLConf.get.allowNegativeScaleOfDecimalEnabled) {
+} else if (decimal.scale < 0 && 
!SqlApiConf.get.allowNegativeScaleOfDecimalEnabled) {

Review Comment:
   As soon as SQLConf is loaded we connect them: 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L185



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #42066: [SPARK-44480][SS] Use thread pool to perform maintenance activity for hdfs/rocksdb state store providers

2023-07-19 Thread via GitHub



HeartSaVioR commented on PR #42066:
URL: https://github.com/apache/spark/pull/42066#issuecomment-1641586354

   CI looks failing. Could you please look into it?
   https://github.com/ericm-db/spark/actions/runs/5595467996/jobs/10231376237


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] juliuszsompolski commented on a diff in pull request #41964: [SPARK-44394][CONNECT][WEBUI] Add a Spark UI page for Spark Connect

2023-07-19 Thread via GitHub



juliuszsompolski commented on code in PR #41964:
URL: https://github.com/apache/spark/pull/41964#discussion_r1267671107


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectServer.scala:
##
@@ -49,6 +49,7 @@ object SparkConnectServer extends Logging {
   SparkConnectService.server.awaitTermination()
 } finally {
   session.stop()
+  SparkConnectService.uiTab.foreach(_.detach())

Review Comment:
   yeah... it seems that in this path the Server is killed by a signal (via 
`sbin/stop-connect-server.sh`) and then `SparkConnectService.stop()` is never 
called.
   It would be nice to SparkConnectService.stop() here, but then it will try to 
shutdown the server again... 
   In any case, it's unrelated to this PR, so let's leave it like it is now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on pull request #41821: [SPARK-44272][YARN] Path Inconsistency when Operating statCache within Yarn Client

2023-07-19 Thread via GitHub



mridulm commented on PR #41821:
URL: https://github.com/apache/spark/pull/41821#issuecomment-1641556251

   Merged to master and branch-3.5.
   Thanks for working on this @shuwang21 !
   
   FYI @xuanyuanking, who is RM for 3.5


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm closed pull request #41821: [SPARK-44272][YARN] Path Inconsistency when Operating statCache within Yarn Client

2023-07-19 Thread via GitHub



mridulm closed pull request #41821: [SPARK-44272][YARN] Path Inconsistency when 
Operating statCache within Yarn Client
URL: https://github.com/apache/spark/pull/41821


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 111 matches

Mail list logo