[jira] [Assigned] (SPARK-46557) Refine docstring for DataFrame.schema/explain/printSchema
[ https://issues.apache.org/jira/browse/SPARK-46557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46557: Assignee: Hyukjin Kwon > Refine docstring for DataFrame.schema/explain/printSchema > - > > Key: SPARK-46557 > URL: https://issues.apache.org/jira/browse/SPARK-46557 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46557) Refine docstring for DataFrame.schema/explain/printSchema
[ https://issues.apache.org/jira/browse/SPARK-46557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46557. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44553 [https://github.com/apache/spark/pull/44553] > Refine docstring for DataFrame.schema/explain/printSchema > - > > Key: SPARK-46557 > URL: https://issues.apache.org/jira/browse/SPARK-46557 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46561) Use `exists` instead of `filter + nonEmpty` to get `showResourceColumn` in `MasterPage.scala`
[ https://issues.apache.org/jira/browse/SPARK-46561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-46561. -- Resolution: Won't Fix > Use `exists` instead of `filter + nonEmpty` to get `showResourceColumn` in > `MasterPage.scala` > - > > Key: SPARK-46561 > URL: https://issues.apache.org/jira/browse/SPARK-46561 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Web UI >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > Labels: pull-request-available > > {code:java} > def render(request: HttpServletRequest): Seq[Node] = { > val state = getMasterState > val showResourceColumn = > state.workers.filter(_.resourcesInfoUsed.nonEmpty).nonEmpty{code} > we can use `exists` instead of > `workers.filter(_.resourcesInfoUsed.nonEmpty).nonEmpty` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46562) Remove retrieval of `keytabFile` from `UserGroupInformation` in `HiveAuthFactory`
[ https://issues.apache.org/jira/browse/SPARK-46562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46562: --- Labels: pull-request-available (was: ) > Remove retrieval of `keytabFile` from `UserGroupInformation` in > `HiveAuthFactory` > - > > Key: SPARK-46562 > URL: https://issues.apache.org/jira/browse/SPARK-46562 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46562) Remove retrieval of `keytabFile` from `UserGroupInformation` in `HiveAuthFactory`
[ https://issues.apache.org/jira/browse/SPARK-46562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-46562: - Summary: Remove retrieval of `keytabFile` from `UserGroupInformation` in `HiveAuthFactory` (was: Remove the process of obtaining `keytabFile` from `UserGroupInformation` in `HiveAuthFactory`) > Remove retrieval of `keytabFile` from `UserGroupInformation` in > `HiveAuthFactory` > - > > Key: SPARK-46562 > URL: https://issues.apache.org/jira/browse/SPARK-46562 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46540) Respect column names when Python data source read function outputs named Row objects
[ https://issues.apache.org/jira/browse/SPARK-46540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46540. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44531 [https://github.com/apache/spark/pull/44531] > Respect column names when Python data source read function outputs named Row > objects > > > Key: SPARK-46540 > URL: https://issues.apache.org/jira/browse/SPARK-46540 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46540) Respect column names when Python data source read function outputs named Row objects
[ https://issues.apache.org/jira/browse/SPARK-46540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46540: Assignee: Allison Wang > Respect column names when Python data source read function outputs named Row > objects > > > Key: SPARK-46540 > URL: https://issues.apache.org/jira/browse/SPARK-46540 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46562) Remove the process of obtaining `keytabFile` from `UserGroupInformation` in `HiveAuthFactory`
Yang Jie created SPARK-46562: Summary: Remove the process of obtaining `keytabFile` from `UserGroupInformation` in `HiveAuthFactory` Key: SPARK-46562 URL: https://issues.apache.org/jira/browse/SPARK-46562 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46556) Refine docstring for DataFrame.createGlobalTempView/createOrReplaceGlobalTempView
[ https://issues.apache.org/jira/browse/SPARK-46556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46556. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44552 [https://github.com/apache/spark/pull/44552] > Refine docstring for > DataFrame.createGlobalTempView/createOrReplaceGlobalTempView > - > > Key: SPARK-46556 > URL: https://issues.apache.org/jira/browse/SPARK-46556 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46556) Refine docstring for DataFrame.createGlobalTempView/createOrReplaceGlobalTempView
[ https://issues.apache.org/jira/browse/SPARK-46556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46556: Assignee: Hyukjin Kwon > Refine docstring for > DataFrame.createGlobalTempView/createOrReplaceGlobalTempView > - > > Key: SPARK-46556 > URL: https://issues.apache.org/jira/browse/SPARK-46556 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46555) Refine docstring for DataFrame.createTempView/createOrReplaceTempView
[ https://issues.apache.org/jira/browse/SPARK-46555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46555: Assignee: Hyukjin Kwon > Refine docstring for DataFrame.createTempView/createOrReplaceTempView > - > > Key: SPARK-46555 > URL: https://issues.apache.org/jira/browse/SPARK-46555 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46555) Refine docstring for DataFrame.createTempView/createOrReplaceTempView
[ https://issues.apache.org/jira/browse/SPARK-46555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46555. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44551 [https://github.com/apache/spark/pull/44551] > Refine docstring for DataFrame.createTempView/createOrReplaceTempView > - > > Key: SPARK-46555 > URL: https://issues.apache.org/jira/browse/SPARK-46555 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46561) Use `exists` instead of `filter + nonEmpty` to get `showResourceColumn` in `MasterPage.scala`
[ https://issues.apache.org/jira/browse/SPARK-46561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46561: --- Labels: pull-request-available (was: ) > Use `exists` instead of `filter + nonEmpty` to get `showResourceColumn` in > `MasterPage.scala` > - > > Key: SPARK-46561 > URL: https://issues.apache.org/jira/browse/SPARK-46561 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Web UI >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > Labels: pull-request-available > > {code:java} > def render(request: HttpServletRequest): Seq[Node] = { > val state = getMasterState > val showResourceColumn = > state.workers.filter(_.resourcesInfoUsed.nonEmpty).nonEmpty{code} > we can use `exists` instead of > `workers.filter(_.resourcesInfoUsed.nonEmpty).nonEmpty` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46561) Use `exists` instead of `filter + nonEmpty` to get `showResourceColumn` in `MasterPage.scala`
Yang Jie created SPARK-46561: Summary: Use `exists` instead of `filter + nonEmpty` to get `showResourceColumn` in `MasterPage.scala` Key: SPARK-46561 URL: https://issues.apache.org/jira/browse/SPARK-46561 Project: Spark Issue Type: Improvement Components: Spark Core, Web UI Affects Versions: 4.0.0 Reporter: Yang Jie {code:java} def render(request: HttpServletRequest): Seq[Node] = { val state = getMasterState val showResourceColumn = state.workers.filter(_.resourcesInfoUsed.nonEmpty).nonEmpty{code} we can use `exists` instead of `workers.filter(_.resourcesInfoUsed.nonEmpty).nonEmpty` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46560) Refine docstring `reverse`
[ https://issues.apache.org/jira/browse/SPARK-46560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-46560: Summary: Refine docstring `reverse` (was: Refine docstring `reverse/flatten`) > Refine docstring `reverse` > -- > > Key: SPARK-46560 > URL: https://issues.apache.org/jira/browse/SPARK-46560 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46560) Refine docstring `reverse/flatten`
BingKun Pan created SPARK-46560: --- Summary: Refine docstring `reverse/flatten` Key: SPARK-46560 URL: https://issues.apache.org/jira/browse/SPARK-46560 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46559) Wrap the `export` in the package name with backticks
[ https://issues.apache.org/jira/browse/SPARK-46559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46559: --- Labels: pull-request-available (was: ) > Wrap the `export` in the package name with backticks > > > Key: SPARK-46559 > URL: https://issues.apache.org/jira/browse/SPARK-46559 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > Labels: pull-request-available > > `export` will be a keyword in Scala 3, using it directly in the package name > will cause a compilation error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46559) Wrap the `export` in the package name with backticks
Yang Jie created SPARK-46559: Summary: Wrap the `export` in the package name with backticks Key: SPARK-46559 URL: https://issues.apache.org/jira/browse/SPARK-46559 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 4.0.0 Reporter: Yang Jie `export` will be a keyword in Scala 3, using it directly in the package name will cause a compilation error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46558) Extract a helper method to eliminate the duplicate code in `GrpcExceptionConverter` that retrieves `MessageParameters` from `ErrorParams`
[ https://issues.apache.org/jira/browse/SPARK-46558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46558: --- Labels: pull-request-available (was: ) > Extract a helper method to eliminate the duplicate code in > `GrpcExceptionConverter` that retrieves `MessageParameters` from `ErrorParams` > - > > Key: SPARK-46558 > URL: https://issues.apache.org/jira/browse/SPARK-46558 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > > {code:java} > params.errorClass match { > case Some(_) => params.messageParameters > case None => Map("message" -> params.message) > } {code} > The above code pattern appears 7 times in `GrpcExceptionConverter`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46558) Extract a helper method to eliminate the duplicate code in `GrpcExceptionConverter` that retrieves `MessageParameters` from `ErrorParams`
Yang Jie created SPARK-46558: Summary: Extract a helper method to eliminate the duplicate code in `GrpcExceptionConverter` that retrieves `MessageParameters` from `ErrorParams` Key: SPARK-46558 URL: https://issues.apache.org/jira/browse/SPARK-46558 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Yang Jie {code:java} params.errorClass match { case Some(_) => params.messageParameters case None => Map("message" -> params.message) } {code} The above code pattern appears 7 times in `GrpcExceptionConverter`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46557) Refine docstring for DataFrame.schema/explain/printSchema
[ https://issues.apache.org/jira/browse/SPARK-46557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46557: --- Labels: pull-request-available (was: ) > Refine docstring for DataFrame.schema/explain/printSchema > - > > Key: SPARK-46557 > URL: https://issues.apache.org/jira/browse/SPARK-46557 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46557) Refine docstring for DataFrame.schema/explain/printSchema
[ https://issues.apache.org/jira/browse/SPARK-46557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46557: - Summary: Refine docstring for DataFrame.schema/explain/printSchema (was: Refine docstring for DataFrame.explain/printSchema) > Refine docstring for DataFrame.schema/explain/printSchema > - > > Key: SPARK-46557 > URL: https://issues.apache.org/jira/browse/SPARK-46557 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46557) Refine docstring for DataFrame.explain/printSchema
Hyukjin Kwon created SPARK-46557: Summary: Refine docstring for DataFrame.explain/printSchema Key: SPARK-46557 URL: https://issues.apache.org/jira/browse/SPARK-46557 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46556) Refine docstring for DataFrame.createGlobalTempView/createOrReplaceGlobalTempView
[ https://issues.apache.org/jira/browse/SPARK-46556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46556: --- Labels: pull-request-available (was: ) > Refine docstring for > DataFrame.createGlobalTempView/createOrReplaceGlobalTempView > - > > Key: SPARK-46556 > URL: https://issues.apache.org/jira/browse/SPARK-46556 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46556) Refine docstring for DataFrame.createGlobalTempView/createOrReplaceGlobalTempView
Hyukjin Kwon created SPARK-46556: Summary: Refine docstring for DataFrame.createGlobalTempView/createOrReplaceGlobalTempView Key: SPARK-46556 URL: https://issues.apache.org/jira/browse/SPARK-46556 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46555) Refine docstring for DataFrame.createTempView/createOrReplaceTempView
[ https://issues.apache.org/jira/browse/SPARK-46555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46555: --- Labels: pull-request-available (was: ) > Refine docstring for DataFrame.createTempView/createOrReplaceTempView > - > > Key: SPARK-46555 > URL: https://issues.apache.org/jira/browse/SPARK-46555 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44115) Upgrade Apache ORC to 2.0
[ https://issues.apache.org/jira/browse/SPARK-44115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801651#comment-17801651 ] Ravi Jain commented on SPARK-44115: --- Hi [~dongjoon], I am new here and would like to contribute. Let me know if this is something that I can pick up as will require some direction to get started. > Upgrade Apache ORC to 2.0 > - > > Key: SPARK-44115 > URL: https://issues.apache.org/jira/browse/SPARK-44115 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > Apache ORC community has the following release cycles which are synchronized > with Apache Spark releases. > * ORC v2.0.0 (next year) for Apache Spark 4.0.x > * ORC v1.9.0 (this month) for Apache Spark 3.5.x > * ORC v1.8.x for Apache Spark 3.4.x > * ORC v1.7.x for Apache Spark 3.3.x > * ORC v1.6.x for Apache Spark 3.2.x -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46555) Refine docstring for DataFrame.createTempView/createOrReplaceTempView
[ https://issues.apache.org/jira/browse/SPARK-46555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46555: - Summary: Refine docstring for DataFrame.createTempView/createOrReplaceTempView (was: Refine docstring for DataFrame.registerTempTable/createTempView/createOrReplaceTempView) > Refine docstring for DataFrame.createTempView/createOrReplaceTempView > - > > Key: SPARK-46555 > URL: https://issues.apache.org/jira/browse/SPARK-46555 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46555) Refine docstring for DataFrame.registerTempTable/createTempView/createOrReplaceTempView
Hyukjin Kwon created SPARK-46555: Summary: Refine docstring for DataFrame.registerTempTable/createTempView/createOrReplaceTempView Key: SPARK-46555 URL: https://issues.apache.org/jira/browse/SPARK-46555 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46477) Hive table (bucketdb.dept_part_buk) is bucketed but partition (location=nk) is not bucketed
[ https://issues.apache.org/jira/browse/SPARK-46477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma updated SPARK-46477: -- Description: Presto fail to read partition of hive table update by spark sql with following error {noformat} Hive table (bucketdb.dept_part_buk) is bucketed but partition (location=nk) is not bucketed {noformat} Spark SQL which cause read failure in presto {noformat} ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION 'file:///tmp/location=nk1'; {noformat} *Root Cause* ALTER TABLE table_name [PARTITION partition_spec] SET LOCATION "loc"; is dropping bucket columns information in HMS. Repo Script {code:java} CREATE TABLE bucketdb.dept_part_buk ( deptno INT, dname STRING, location STRING) PARTITIONED BY (location) CLUSTERED BY (deptno) INTO 2 BUCKETS STORED AS textfile location 'file:///tmp'; ALTER TABLE bucketdb.dept_part_buk ADD IF NOT EXISTS PARTITION (location='nk') LOCATION 'file:///tmp/location=nk'; ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION 'file:///tmp/location=nk1'; {code} *Investigation* *HMS state before running query * {noformat} mysql> select * from SDS where SD_ID = 137; +---+---+--+--+--++-++--+ | SD_ID | CD_ID | INPUT_FORMAT | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES| LOCATION | NUM_BUCKETS | OUTPUT_FORMAT | SERDE_ID | +---+---+--+--+--++-++--+ | 137 | 106 | org.apache.hadoop.mapred.TextInputFormat | 0x00 | 0x00 | file:/tmp/location=nk | 2 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 137 | +---+---+--+--+--++-++--+ mysql> select * from BUCKETING_COLS where SD_ID = 137; +---+-+-+ | SD_ID | BUCKET_COL_NAME | INTEGER_IDX | +---+-+-+ | 137 | deptno | 0 | +---+-+-+ {noformat} *Spark Sql Query* ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION 'file:///Users/someuser/sparkdata/location=nk1'; *HMS state after Running the Spark SQL query* {noformat} mysql> select * from SDS where SD_ID = 137; +---+---+--+--+--+-+-++--+ | SD_ID | CD_ID | INPUT_FORMAT | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES| LOCATION | NUM_BUCKETS | OUTPUT_FORMAT | SERDE_ID | +---+---+--+--+--+-+-++--+ | 137 | 106 | org.apache.hadoop.mapred.TextInputFormat | 0x00 | 0x00 | file:/tmp/location=nk1 | 0 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 137 | +---+---+--+--+--+-+-++--+ mysql> select * from BUCKETING_COLS where SD_ID = 141; Empty set (0.00 sec) {noformat} *Problem* 1. NUM_BUCKETS value is set 0 instead it should be 2 2. Row containing BUCKET_COL_NAME = deptno is deleted from table BUCKETING_COLS Due to above 2 problems Presto/hive is not able to detect the bucketing information for partition nk1. In the contrary spark is not affected. Spark doesn't consume bucket information cause
[jira] [Updated] (SPARK-46477) Hive table (bucketdb.dept_part_buk) is bucketed but partition (location=nk) is not bucketed
[ https://issues.apache.org/jira/browse/SPARK-46477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma updated SPARK-46477: -- Description: Presto fail to read partition of hive table update by spark sql with following error {noformat} Hive table (bucketdb.dept_part_buk) is bucketed but partition (location=nk) is not bucketed {noformat} Spark SQL which cause read failure in presto {noformat} ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION 'file:///tmp/location=nk1'; {noformat} *Root Cause* ALTER TABLE table_name [PARTITION partition_spec] SET LOCATION "loc"; is dropping bucket columns information in HMS. Repo Script {code:java} CREATE TABLE bucketdb.dept_part_buk ( deptno INT, dname STRING, location STRING) PARTITIONED BY (location) CLUSTERED BY (deptno) INTO 2 BUCKETS STORED AS textfile location 'file:///tmp'; ALTER TABLE bucketdb.dept_part_buk ADD IF NOT EXISTS PARTITION (location='nk') LOCATION 'file:///tmp/location=nk'; ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION 'file:///tmp/location=nk1'; {code} *Investigation* *HMS state before running query* {noformat} mysql> select * from SDS where SD_ID = 137; +---+---+--+--+--++-++--+ | SD_ID | CD_ID | INPUT_FORMAT | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES| LOCATION | NUM_BUCKETS | OUTPUT_FORMAT | SERDE_ID | +---+---+--+--+--++-++--+ | 137 | 106 | org.apache.hadoop.mapred.TextInputFormat | 0x00 | 0x00 | file:/tmp/location=nk | 2 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 137 | +---+---+--+--+--++-++--+ mysql> select * from BUCKETING_COLS where SD_ID = 137; +---+-+-+ | SD_ID | BUCKET_COL_NAME | INTEGER_IDX | +---+-+-+ | 137 | deptno | 0 | +---+-+-+ {noformat} *Spark Sql Query* ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION 'file:///Users/someuser/sparkdata/location=nk1'; *HMS state after Running the Spark SQL query* {noformat} mysql> select * from SDS where SD_ID = 137; +---+---+--+--+--+-+-++--+ | SD_ID | CD_ID | INPUT_FORMAT | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES| LOCATION | NUM_BUCKETS | OUTPUT_FORMAT | SERDE_ID | +---+---+--+--+--+-+-++--+ | 137 | 106 | org.apache.hadoop.mapred.TextInputFormat | 0x00 | 0x00 | file:/tmp/location=nk1 | 0 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 137 | +---+---+--+--+--+-+-++--+ mysql> select * from BUCKETING_COLS where SD_ID = 141; Empty set (0.00 sec) {noformat} *Problem* 1. NUM_BUCKETS value is set 0 instead it should be 2 2. Row containing BUCKET_COL_NAME = deptno is deleted from table BUCKETING_COLS Due to above 2 problems Presto/hive is not able to detect the bucketing information for partition nk1. In the contrary spark is not affected. Spark doesn't consume bucket information cause spark
[jira] [Updated] (SPARK-46477) Hive table (bucketdb.dept_part_buk) is bucketed but partition (location=nk) is not bucketed
[ https://issues.apache.org/jira/browse/SPARK-46477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sharma updated SPARK-46477: -- Description: Presto fail to read partition of hive table update by spark sql with following error {noformat} Hive table (bucketdb.dept_part_buk) is bucketed but partition (location=nk) is not bucketed {noformat} Spark SQL which cause read failure in presto {noformat} ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION 'file:///tmp/location=nk1'; {noformat} *Root Cause* ALTER TABLE table_name [PARTITION partition_spec] SET LOCATION "loc"; is dropping bucket columns information in HMS. Repo Script {code:java} CREATE TABLE bucketdb.dept_part_buk ( deptno INT, dname STRING, location STRING) PARTITIONED BY (location) CLUSTERED BY (deptno) INTO 2 BUCKETS STORED AS textfile location 'file:///tmp'; ALTER TABLE bucketdb.dept_part_buk ADD IF NOT EXISTS PARTITION (location='nk') LOCATION 'file:///tmp/location=nk'; ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION 'file:///tmp/location=nk1'; {code} *Investigation* *HMS state before running query * {noformat} mysql> select * from SDS where SD_ID = 137; +---+---+--+--+--++-++--+ | SD_ID | CD_ID | INPUT_FORMAT | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES| LOCATION | NUM_BUCKETS | OUTPUT_FORMAT | SERDE_ID | +---+---+--+--+--++-++--+ | 137 | 106 | org.apache.hadoop.mapred.TextInputFormat | 0x00 | 0x00 | file:/tmp/location=nk | 2 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 137 | +---+---+--+--+--++-++--+ mysql> select * from BUCKETING_COLS where SD_ID = 137; +---+-+-+ | SD_ID | BUCKET_COL_NAME | INTEGER_IDX | +---+-+-+ | 137 | deptno | 0 | +---+-+-+ {noformat} *Spark Sql Query* ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION 'file:///Users/someuser/sparkdata/location=nk1'; *HMS state after Running the Spark SQL query* {noformat} mysql> select * from SDS where SD_ID = 137; +---+---+--+--+--+-+-++--+ | SD_ID | CD_ID | INPUT_FORMAT | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES| LOCATION | NUM_BUCKETS | OUTPUT_FORMAT | SERDE_ID | +---+---+--+--+--+-+-++--+ | 137 | 106 | org.apache.hadoop.mapred.TextInputFormat | 0x00 | 0x00 | file:/tmp/location=nk1 | 0 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 137 | +---+---+--+--+--+-+-++--+ mysql> select * from BUCKETING_COLS where SD_ID = 141; Empty set (0.00 sec) {noformat} *Problem* 1. NUM_BUCKETS value is set 0 instead it should be 2 2. Row containing BUCKET_COL_NAME = deptno is deleted from table BUCKETING_COLS Due to above 2 things in the read path Presto is not able to detect the bucketing information for partition nk1. was: Presto fail to read partition of hive table update by spark sql with
[jira] [Resolved] (SPARK-46543) json_tuple throw PySparkValueError for empty fields
[ https://issues.apache.org/jira/browse/SPARK-46543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-46543. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44534 [https://github.com/apache/spark/pull/44534] > json_tuple throw PySparkValueError for empty fields > --- > > Key: SPARK-46543 > URL: https://issues.apache.org/jira/browse/SPARK-46543 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46554) Upgrade slf4j to 2.0.10
[ https://issues.apache.org/jira/browse/SPARK-46554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-46554. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44544 [https://github.com/apache/spark/pull/44544] > Upgrade slf4j to 2.0.10 > --- > > Key: SPARK-46554 > URL: https://issues.apache.org/jira/browse/SPARK-46554 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46551) Refine docstring of `flatten/sequence/shuffle`
[ https://issues.apache.org/jira/browse/SPARK-46551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-46551. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44548 [https://github.com/apache/spark/pull/44548] > Refine docstring of `flatten/sequence/shuffle` > -- > > Key: SPARK-46551 > URL: https://issues.apache.org/jira/browse/SPARK-46551 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46551) Refine docstring of `flatten/sequence/shuffle`
[ https://issues.apache.org/jira/browse/SPARK-46551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-46551: Assignee: Yang Jie > Refine docstring of `flatten/sequence/shuffle` > -- > > Key: SPARK-46551 > URL: https://issues.apache.org/jira/browse/SPARK-46551 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46553) FutureWarning for interpolate with object dtype
[ https://issues.apache.org/jira/browse/SPARK-46553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46553: --- Labels: pull-request-available (was: ) > FutureWarning for interpolate with object dtype > --- > > Key: SPARK-46553 > URL: https://issues.apache.org/jira/browse/SPARK-46553 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > >>> pdf.interpolate() > :1: FutureWarning: DataFrame.interpolate with object dtype is > deprecated and will raise in a future version. Call > obj.infer_objects(copy=False) before interpolating instead. > A B > 0 a 1 > 1 b 2 > 2 c 3 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46554) Upgrade slf4j to 2.0.10
BingKun Pan created SPARK-46554: --- Summary: Upgrade slf4j to 2.0.10 Key: SPARK-46554 URL: https://issues.apache.org/jira/browse/SPARK-46554 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46553) FutureWarning for interpolate with object dtype
Haejoon Lee created SPARK-46553: --- Summary: FutureWarning for interpolate with object dtype Key: SPARK-46553 URL: https://issues.apache.org/jira/browse/SPARK-46553 Project: Spark Issue Type: Bug Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee >>> pdf.interpolate() :1: FutureWarning: DataFrame.interpolate with object dtype is deprecated and will raise in a future version. Call obj.infer_objects(copy=False) before interpolating instead. A B 0 a 1 1 b 2 2 c 3 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46552) Replace UnsupportedOperationException by SparkUnsupportedOperationException in catalyst
Max Gekk created SPARK-46552: Summary: Replace UnsupportedOperationException by SparkUnsupportedOperationException in catalyst Key: SPARK-46552 URL: https://issues.apache.org/jira/browse/SPARK-46552 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Max Gekk Assignee: Max Gekk Replace all UnsupportedOperationException by SparkUnsupportedOperationException in Catalyst code base, and introduce new legacy error classes with the _LEGACY_ERROR_TEMP_ prefix. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org