[jira] [Updated] (SPARK-49629) JsonProtocol should write 'Shuffle Push Read Metrics' and 'Merged Fallback Count' field of shuffle read metrics
[ https://issues.apache.org/jira/browse/SPARK-49629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49629: --- Labels: pull-request-available (was: ) > JsonProtocol should write 'Shuffle Push Read Metrics' and 'Merged Fallback > Count' field of shuffle read metrics > --- > > Key: SPARK-49629 > URL: https://issues.apache.org/jira/browse/SPARK-49629 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Nicholas Jiang >Priority: Major > Labels: pull-request-available > > `JsonProtocol` writes 'Push Based Shuffle' and 'Merged Fetch Fallback Count' > field of shuffle read metrics at present, which is inconsistent with fields > of task metric from json. Therefore, `JsonProtocol` should write 'Shuffle > Push Read Metrics' and 'Merged Fallback Count' field of shuffle read metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49625) Spark Cluster Happy Path State Transition Test
[ https://issues.apache.org/jira/browse/SPARK-49625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49625: --- Labels: pull-request-available (was: ) > Spark Cluster Happy Path State Transition Test > -- > > Key: SPARK-49625 > URL: https://issues.apache.org/jira/browse/SPARK-49625 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Qi Tan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49627) Run SortMergeJoin in batch
[ https://issues.apache.org/jira/browse/SPARK-49627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49627: --- Labels: pull-request-available (was: ) > Run SortMergeJoin in batch > -- > > Key: SPARK-49627 > URL: https://issues.apache.org/jira/browse/SPARK-49627 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Priority: Major > Labels: pull-request-available > Attachments: t2.snappy.parquet > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49626) Support horizontal and vertical bar plots
[ https://issues.apache.org/jira/browse/SPARK-49626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49626: --- Labels: pull-request-available (was: ) > Support horizontal and vertical bar plots > - > > Key: SPARK-49626 > URL: https://issues.apache.org/jira/browse/SPARK-49626 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > > Support horizontal and vertical bar plot -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43354) Re-enable test_create_dataframe_from_pandas_with_day_time_interval
[ https://issues.apache.org/jira/browse/SPARK-43354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43354: --- Labels: pull-request-available (was: ) > Re-enable test_create_dataframe_from_pandas_with_day_time_interval > -- > > Key: SPARK-43354 > URL: https://issues.apache.org/jira/browse/SPARK-43354 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > > This test fails with PyPy 3.8. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49623) Rename prefix `appResources` in helm chart
[ https://issues.apache.org/jira/browse/SPARK-49623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49623: --- Labels: pull-request-available (was: ) > Rename prefix `appResources` in helm chart > -- > > Key: SPARK-49623 > URL: https://issues.apache.org/jira/browse/SPARK-49623 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49621) Disable a flaky `EXEC IMMEDIATE STACK OVERFLOW` test case
[ https://issues.apache.org/jira/browse/SPARK-49621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49621: --- Labels: pull-request-available (was: ) > Disable a flaky `EXEC IMMEDIATE STACK OVERFLOW` test case > - > > Key: SPARK-49621 > URL: https://issues.apache.org/jira/browse/SPARK-49621 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49620) Fix `spark-rm` and `infra` docker files to create `pypy3.9` links
[ https://issues.apache.org/jira/browse/SPARK-49620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49620: --- Labels: pull-request-available (was: ) > Fix `spark-rm` and `infra` docker files to create `pypy3.9` links > - > > Key: SPARK-49620 > URL: https://issues.apache.org/jira/browse/SPARK-49620 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49618) Union ( & UnionExec) nodes equality not take into account unaligned positions of branches causing NO ( reuse of exchange and cached plans)
[ https://issues.apache.org/jira/browse/SPARK-49618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49618: --- Labels: pull-request-available (was: ) > Union ( & UnionExec) nodes equality not take into account unaligned positions > of branches causing NO ( reuse of exchange and cached plans) > -- > > Key: SPARK-49618 > URL: https://issues.apache.org/jira/browse/SPARK-49618 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.3.4, 3.4.3 >Reporter: Asif >Priority: Major > Labels: pull-request-available > Fix For: 3.4.4 > > > Ideally a Union( plan1, plan2) and Union (plan2, plan1) are logically equal, > so long as the output attributes of plan1 and plan2 are matching in terms of > name , data type, metdadata etc ( though differing in terms of exprId). > But because current equality and hashCode is dependent on the order of the > children, the canonicalizations do not match. > This causes reuse of exchange not happening in following situations: > > Exchange 1. = Union( plan1, plan2) > Exchange 2 = Union( plan2, plan1) > > similarly the cached lookup also misses picking the InMemoryRelation. > > Will be submitting a PR and bug tests for the above scenarios. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49619) Upgrade Gradle to 8.10.1
[ https://issues.apache.org/jira/browse/SPARK-49619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49619: --- Labels: pull-request-available (was: ) > Upgrade Gradle to 8.10.1 > > > Key: SPARK-49619 > URL: https://issues.apache.org/jira/browse/SPARK-49619 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49561) PIVOT + UNPIVOT operators
[ https://issues.apache.org/jira/browse/SPARK-49561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49561: --- Labels: pull-request-available (was: ) > PIVOT + UNPIVOT operators > - > > Key: SPARK-49561 > URL: https://issues.apache.org/jira/browse/SPARK-49561 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49557) WHERE operator
[ https://issues.apache.org/jira/browse/SPARK-49557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49557: --- Labels: pull-request-available (was: ) > WHERE operator > -- > > Key: SPARK-49557 > URL: https://issues.apache.org/jira/browse/SPARK-49557 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49610) Use plan ID as the session-local plan cache key type
[ https://issues.apache.org/jira/browse/SPARK-49610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49610: --- Labels: pull-request-available (was: ) > Use plan ID as the session-local plan cache key type > > > Key: SPARK-49610 > URL: https://issues.apache.org/jira/browse/SPARK-49610 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Changgyoo Park >Priority: Major > Labels: pull-request-available > > Comparing protobuf messages is sometimes very expensive if the message is > very large. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49611) Introduce TVF all_collations()
[ https://issues.apache.org/jira/browse/SPARK-49611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49611: --- Labels: pull-request-available (was: ) > Introduce TVF all_collations() > -- > > Key: SPARK-49611 > URL: https://issues.apache.org/jira/browse/SPARK-49611 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49597) Support non-column arguments in UDTF for simpler usage
[ https://issues.apache.org/jira/browse/SPARK-49597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49597: -- Assignee: (was: Apache Spark) > Support non-column arguments in UDTF for simpler usage > -- > > Key: SPARK-49597 > URL: https://issues.apache.org/jira/browse/SPARK-49597 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Currently UDTF only can accept column argument but users might feel a bit > inconvenience of this usage -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49597) Support non-column arguments in UDTF for simpler usage
[ https://issues.apache.org/jira/browse/SPARK-49597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49597: -- Assignee: Apache Spark > Support non-column arguments in UDTF for simpler usage > -- > > Key: SPARK-49597 > URL: https://issues.apache.org/jira/browse/SPARK-49597 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Currently UDTF only can accept column argument but users might feel a bit > inconvenience of this usage -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49609) Add API compatibility check between Classic and Connect
[ https://issues.apache.org/jira/browse/SPARK-49609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49609: -- Assignee: Apache Spark > Add API compatibility check between Classic and Connect > --- > > Key: SPARK-49609 > URL: https://issues.apache.org/jira/browse/SPARK-49609 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > We should ensure every API has same signature between Classic and Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49606) Improve documentation of Pandas on Spark plotting API
[ https://issues.apache.org/jira/browse/SPARK-49606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49606: -- Assignee: Apache Spark > Improve documentation of Pandas on Spark plotting API > - > > Key: SPARK-49606 > URL: https://issues.apache.org/jira/browse/SPARK-49606 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PS >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Improve documentation of Pandas on Spark plotting API following pandas 2.2 > (stable), see https://pandas.pydata.org/docs/reference/frame.html. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49609) Add API compatibility check between Classic and Connect
[ https://issues.apache.org/jira/browse/SPARK-49609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49609: -- Assignee: (was: Apache Spark) > Add API compatibility check between Classic and Connect > --- > > Key: SPARK-49609 > URL: https://issues.apache.org/jira/browse/SPARK-49609 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > We should ensure every API has same signature between Classic and Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49606) Improve documentation of Pandas on Spark plotting API
[ https://issues.apache.org/jira/browse/SPARK-49606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49606: -- Assignee: (was: Apache Spark) > Improve documentation of Pandas on Spark plotting API > - > > Key: SPARK-49606 > URL: https://issues.apache.org/jira/browse/SPARK-49606 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PS >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > > Improve documentation of Pandas on Spark plotting API following pandas 2.2 > (stable), see https://pandas.pydata.org/docs/reference/frame.html. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49609) Add API compatibility check between Classic and Connect
[ https://issues.apache.org/jira/browse/SPARK-49609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49609: --- Labels: pull-request-available (was: ) > Add API compatibility check between Classic and Connect > --- > > Key: SPARK-49609 > URL: https://issues.apache.org/jira/browse/SPARK-49609 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > We should ensure every API has same signature between Classic and Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49244) [M1] Further exception improvements
[ https://issues.apache.org/jira/browse/SPARK-49244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49244: -- Assignee: Apache Spark > [M1] Further exception improvements > --- > > Key: SPARK-49244 > URL: https://issues.apache.org/jira/browse/SPARK-49244 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dusan Tisma >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > We need to remove line numbers manually added to exceptions. Currently some > exceptions print the line number twice. > Label exceptions need to use backquotes, same as with variables, i.e. need to > check if toSQLId, toSQLStmt, and similar methods are applied to all > identifiers. > Maybe add some tests for \{LINE} numbers? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49244) [M1] Further exception improvements
[ https://issues.apache.org/jira/browse/SPARK-49244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49244: -- Assignee: (was: Apache Spark) > [M1] Further exception improvements > --- > > Key: SPARK-49244 > URL: https://issues.apache.org/jira/browse/SPARK-49244 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dusan Tisma >Priority: Major > Labels: pull-request-available > > We need to remove line numbers manually added to exceptions. Currently some > exceptions print the line number twice. > Label exceptions need to use backquotes, same as with variables, i.e. need to > check if toSQLId, toSQLStmt, and similar methods are applied to all > identifiers. > Maybe add some tests for \{LINE} numbers? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49162) Push down date_trunc function
[ https://issues.apache.org/jira/browse/SPARK-49162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49162: -- Assignee: Apache Spark > Push down date_trunc function > - > > Key: SPARK-49162 > URL: https://issues.apache.org/jira/browse/SPARK-49162 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.2 >Reporter: Ivan Kukrkic >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > > Postgres function date_trunc should be pushed down. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49244) [M1] Further exception improvements
[ https://issues.apache.org/jira/browse/SPARK-49244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49244: -- Assignee: (was: Apache Spark) > [M1] Further exception improvements > --- > > Key: SPARK-49244 > URL: https://issues.apache.org/jira/browse/SPARK-49244 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dusan Tisma >Priority: Major > Labels: pull-request-available > > We need to remove line numbers manually added to exceptions. Currently some > exceptions print the line number twice. > Label exceptions need to use backquotes, same as with variables, i.e. need to > check if toSQLId, toSQLStmt, and similar methods are applied to all > identifiers. > Maybe add some tests for \{LINE} numbers? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49244) [M1] Further exception improvements
[ https://issues.apache.org/jira/browse/SPARK-49244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49244: -- Assignee: Apache Spark > [M1] Further exception improvements > --- > > Key: SPARK-49244 > URL: https://issues.apache.org/jira/browse/SPARK-49244 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dusan Tisma >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > We need to remove line numbers manually added to exceptions. Currently some > exceptions print the line number twice. > Label exceptions need to use backquotes, same as with variables, i.e. need to > check if toSQLId, toSQLStmt, and similar methods are applied to all > identifiers. > Maybe add some tests for \{LINE} numbers? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49162) Push down date_trunc function
[ https://issues.apache.org/jira/browse/SPARK-49162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49162: -- Assignee: Apache Spark > Push down date_trunc function > - > > Key: SPARK-49162 > URL: https://issues.apache.org/jira/browse/SPARK-49162 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.2 >Reporter: Ivan Kukrkic >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > > Postgres function date_trunc should be pushed down. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49606) Improve documentation of Pandas on Spark plotting API
[ https://issues.apache.org/jira/browse/SPARK-49606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49606: --- Labels: pull-request-available (was: ) > Improve documentation of Pandas on Spark plotting API > - > > Key: SPARK-49606 > URL: https://issues.apache.org/jira/browse/SPARK-49606 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PS >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > > Improve documentation of Pandas on Spark plotting API following pandas 2.2 > (stable), see https://pandas.pydata.org/docs/reference/frame.html. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49605) Fix the prompt when `ascendingOrder` is `DataTypeMismatch` in `SortArray`
[ https://issues.apache.org/jira/browse/SPARK-49605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49605: --- Labels: pull-request-available (was: ) > Fix the prompt when `ascendingOrder` is `DataTypeMismatch` in `SortArray` > - > > Key: SPARK-49605 > URL: https://issues.apache.org/jira/browse/SPARK-49605 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49594) Add check on whether columnFamilies were added or removed to write StateSchemaV3 file
[ https://issues.apache.org/jira/browse/SPARK-49594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49594: --- Labels: pull-request-available (was: ) > Add check on whether columnFamilies were added or removed to write > StateSchemaV3 file > - > > Key: SPARK-49594 > URL: https://issues.apache.org/jira/browse/SPARK-49594 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Eric Marnadi >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49602) Fix `assembly/pom.xml` to use `{project.version}` instead of `{version}`
[ https://issues.apache.org/jira/browse/SPARK-49602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49602: --- Labels: pull-request-available (was: ) > Fix `assembly/pom.xml` to use `{project.version}` instead of `{version}` > > > Key: SPARK-49602 > URL: https://issues.apache.org/jira/browse/SPARK-49602 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49262) Implement full trim sensitivity support
[ https://issues.apache.org/jira/browse/SPARK-49262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49262: --- Labels: pull-request-available (was: ) > Implement full trim sensitivity support > --- > > Key: SPARK-49262 > URL: https://issues.apache.org/jira/browse/SPARK-49262 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49600) Remove `Python 3.6 and older`-related logic from `try_simplify_traceback`
[ https://issues.apache.org/jira/browse/SPARK-49600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49600: --- Labels: pull-request-available (was: ) > Remove `Python 3.6 and older`-related logic from `try_simplify_traceback` > -- > > Key: SPARK-49600 > URL: https://issues.apache.org/jira/browse/SPARK-49600 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49386) Add memory based thresholds for shuffle spill
[ https://issues.apache.org/jira/browse/SPARK-49386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49386: --- Labels: pull-request-available (was: ) > Add memory based thresholds for shuffle spill > - > > Key: SPARK-49386 > URL: https://issues.apache.org/jira/browse/SPARK-49386 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: dzcxzl >Priority: Major > Labels: pull-request-available > > We can only determine the number of spills by configuring > {{{}spark.shuffle.spill.numElementsForceSpillThreshold{}}}. In some > scenarios, the size of a row may be very large in the memory. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49599) Upgrade snappy-java to 1.1.10.7
[ https://issues.apache.org/jira/browse/SPARK-49599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49599: --- Labels: pull-request-available (was: ) > Upgrade snappy-java to 1.1.10.7 > --- > > Key: SPARK-49599 > URL: https://issues.apache.org/jira/browse/SPARK-49599 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49598) Support to add custom user defined labels on OnDemand PVCs
[ https://issues.apache.org/jira/browse/SPARK-49598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49598: --- Labels: pull-request-available (was: ) > Support to add custom user defined labels on OnDemand PVCs > -- > > Key: SPARK-49598 > URL: https://issues.apache.org/jira/browse/SPARK-49598 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Prathit Malik >Priority: Minor > Labels: pull-request-available > > Currently when user sets > volumes.persistentVolumeClaim.[VolumeName].options.claimName=OnDemand > PVCs are created with only 1 label i.e. spark-app-selector = spark.app.id. > Objective of this Jira is to allow support of custom labels for ondemand PVCs -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49597) Support non-column arguments in UDTF for simpler usage
[ https://issues.apache.org/jira/browse/SPARK-49597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49597: --- Labels: pull-request-available (was: ) > Support non-column arguments in UDTF for simpler usage > -- > > Key: SPARK-49597 > URL: https://issues.apache.org/jira/browse/SPARK-49597 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Currently UDTF only can accept column argument but users might feel a bit > inconvenience of this usage -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48355) [M1] Support for CASE statement
[ https://issues.apache.org/jira/browse/SPARK-48355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48355: -- Assignee: (was: Apache Spark) > [M1] Support for CASE statement > --- > > Key: SPARK-48355 > URL: https://issues.apache.org/jira/browse/SPARK-48355 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > Labels: pull-request-available > > Details TBD. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48355) [M1] Support for CASE statement
[ https://issues.apache.org/jira/browse/SPARK-48355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48355: -- Assignee: Apache Spark > [M1] Support for CASE statement > --- > > Key: SPARK-48355 > URL: https://issues.apache.org/jira/browse/SPARK-48355 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Details TBD. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48355) [M1] Support for CASE statement
[ https://issues.apache.org/jira/browse/SPARK-48355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48355: -- Assignee: Apache Spark > [M1] Support for CASE statement > --- > > Key: SPARK-48355 > URL: https://issues.apache.org/jira/browse/SPARK-48355 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Details TBD. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48355) [M1] Support for CASE statement
[ https://issues.apache.org/jira/browse/SPARK-48355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48355: -- Assignee: (was: Apache Spark) > [M1] Support for CASE statement > --- > > Key: SPARK-48355 > URL: https://issues.apache.org/jira/browse/SPARK-48355 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > Labels: pull-request-available > > Details TBD. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48355) [M1] Support for CASE statement
[ https://issues.apache.org/jira/browse/SPARK-48355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48355: -- Assignee: (was: Apache Spark) > [M1] Support for CASE statement > --- > > Key: SPARK-48355 > URL: https://issues.apache.org/jira/browse/SPARK-48355 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > Labels: pull-request-available > > Details TBD. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48355) [M1] Support for CASE statement
[ https://issues.apache.org/jira/browse/SPARK-48355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48355: -- Assignee: Apache Spark > [M1] Support for CASE statement > --- > > Key: SPARK-48355 > URL: https://issues.apache.org/jira/browse/SPARK-48355 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Details TBD. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49596) Improve the expression `FormatString`
[ https://issues.apache.org/jira/browse/SPARK-49596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49596: --- Labels: pull-request-available (was: ) > Improve the expression `FormatString` > - > > Key: SPARK-49596 > URL: https://issues.apache.org/jira/browse/SPARK-49596 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49162) Push down date_trunc function
[ https://issues.apache.org/jira/browse/SPARK-49162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49162: -- Assignee: (was: Apache Spark) > Push down date_trunc function > - > > Key: SPARK-49162 > URL: https://issues.apache.org/jira/browse/SPARK-49162 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.2 >Reporter: Ivan Kukrkic >Priority: Minor > Labels: pull-request-available > > Postgres function date_trunc should be pushed down. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49585) Get rid of unnecessary executions list in SessionHolder
[ https://issues.apache.org/jira/browse/SPARK-49585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49585: --- Labels: pull-request-available (was: ) > Get rid of unnecessary executions list in SessionHolder > --- > > Key: SPARK-49585 > URL: https://issues.apache.org/jira/browse/SPARK-49585 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Changgyoo Park >Priority: Minor > Labels: pull-request-available > > ExecutionManager.executions can fully substitute SessionHolder.executions. > Adverse effect. > - interrupt* will take longer if there are many sessions with many executions > -> SessionHolder manages a set of operation IDs instead of ExecuteHolders. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49595) Fix DataFrame.unpivot/melt in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-49595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49595: --- Labels: pull-request-available (was: ) > Fix DataFrame.unpivot/melt in Spark Connect > --- > > Key: SPARK-49595 > URL: https://issues.apache.org/jira/browse/SPARK-49595 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49569) Introduce Shim for missing spark/core classes
[ https://issues.apache.org/jira/browse/SPARK-49569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49569: --- Labels: pull-request-available (was: ) > Introduce Shim for missing spark/core classes > - > > Key: SPARK-49569 > URL: https://issues.apache.org/jira/browse/SPARK-49569 > Project: Spark > Issue Type: New Feature > Components: Connect, SQL >Affects Versions: 4.0.0 >Reporter: Herman van Hövell >Priority: Major > Labels: pull-request-available > > Introduce shims for SparkContext, RDD, and QueryExecution. This will have to > be a in a separate module, and is supposed to be a compile time dependency > for sql/api project, and an actual dependency for an independent Spark > Connect Client. > We need these three classes to support all user facing API in the sql/api > project. This will allow us to make the classes the primary interface for > Scala Dataset operations. > For connect these methods will throw an (actionable) error, and for the > classic client they will just work. On the connect side in the future we can > use this to build better errors, and provide (method specific) mitigations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49590) E2E test template includes invalid spec field
[ https://issues.apache.org/jira/browse/SPARK-49590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49590: --- Labels: pull-request-available (was: ) > E2E test template includes invalid spec field > - > > Key: SPARK-49590 > URL: https://issues.apache.org/jira/browse/SPARK-49590 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49574) ExpressionEncoder should track its AgnosticEncoder
[ https://issues.apache.org/jira/browse/SPARK-49574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49574: --- Labels: pull-request-available (was: ) > ExpressionEncoder should track its AgnosticEncoder > -- > > Key: SPARK-49574 > URL: https://issues.apache.org/jira/browse/SPARK-49574 > Project: Spark > Issue Type: New Feature > Components: Connect, SQL >Affects Versions: 4.0.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49584) Upgrade log4j2 to 2.24.0
[ https://issues.apache.org/jira/browse/SPARK-49584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49584: -- Assignee: (was: Apache Spark) > Upgrade log4j2 to 2.24.0 > > > Key: SPARK-49584 > URL: https://issues.apache.org/jira/browse/SPARK-49584 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49548) Get rid of coarse-locking in SparkConnectSessionManager
[ https://issues.apache.org/jira/browse/SPARK-49548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49548: -- Assignee: Apache Spark > Get rid of coarse-locking in SparkConnectSessionManager > --- > > Key: SPARK-49548 > URL: https://issues.apache.org/jira/browse/SPARK-49548 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Changgyoo Park >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > > Related to https://issues.apache.org/jira/browse/SPARK-49544. > -> This has never caused a real world problem, but we had better fix it in > tandem with https://issues.apache.org/jira/browse/SPARK-49544. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49584) Upgrade log4j2 to 2.24.0
[ https://issues.apache.org/jira/browse/SPARK-49584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49584: -- Assignee: (was: Apache Spark) > Upgrade log4j2 to 2.24.0 > > > Key: SPARK-49584 > URL: https://issues.apache.org/jira/browse/SPARK-49584 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49398) Cache Table with Parameter markers returns wrong error
[ https://issues.apache.org/jira/browse/SPARK-49398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49398: -- Assignee: Apache Spark > Cache Table with Parameter markers returns wrong error > -- > > Key: SPARK-49398 > URL: https://issues.apache.org/jira/browse/SPARK-49398 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available, starter > > When investigating the OSS structure of code it was found that > `CacheTableAsSelect` when used with parameter markers in the select part of > the query fails with `UNBOUND_SQL_PARAMETER` even though logically it should > fail with `UNSUPPORTED_FEATURE.PARAMETER_MARKER_IN_UNEXPECTED_STATEMENT`. The > reason for the second error is that `CacheTableAsSelect` creates a temporary > view which should follow rules for parameter markers as views does. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49584) Upgrade log4j2 to 2.24.0
[ https://issues.apache.org/jira/browse/SPARK-49584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49584: --- Labels: pull-request-available (was: ) > Upgrade log4j2 to 2.24.0 > > > Key: SPARK-49584 > URL: https://issues.apache.org/jira/browse/SPARK-49584 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49398) Cache Table with Parameter markers returns wrong error
[ https://issues.apache.org/jira/browse/SPARK-49398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49398: -- Assignee: (was: Apache Spark) > Cache Table with Parameter markers returns wrong error > -- > > Key: SPARK-49398 > URL: https://issues.apache.org/jira/browse/SPARK-49398 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Minor > Labels: pull-request-available, starter > > When investigating the OSS structure of code it was found that > `CacheTableAsSelect` when used with parameter markers in the select part of > the query fails with `UNBOUND_SQL_PARAMETER` even though logically it should > fail with `UNSUPPORTED_FEATURE.PARAMETER_MARKER_IN_UNEXPECTED_STATEMENT`. The > reason for the second error is that `CacheTableAsSelect` creates a temporary > view which should follow rules for parameter markers as views does. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49398) Cache Table with Parameter markers returns wrong error
[ https://issues.apache.org/jira/browse/SPARK-49398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49398: -- Assignee: Apache Spark > Cache Table with Parameter markers returns wrong error > -- > > Key: SPARK-49398 > URL: https://issues.apache.org/jira/browse/SPARK-49398 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available, starter > > When investigating the OSS structure of code it was found that > `CacheTableAsSelect` when used with parameter markers in the select part of > the query fails with `UNBOUND_SQL_PARAMETER` even though logically it should > fail with `UNSUPPORTED_FEATURE.PARAMETER_MARKER_IN_UNEXPECTED_STATEMENT`. The > reason for the second error is that `CacheTableAsSelect` creates a temporary > view which should follow rules for parameter markers as views does. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49582) Fix "dispatch_window_method" utility and documentation
[ https://issues.apache.org/jira/browse/SPARK-49582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49582: --- Labels: pull-request-available (was: ) > Fix "dispatch_window_method" utility and documentation > -- > > Key: SPARK-49582 > URL: https://issues.apache.org/jira/browse/SPARK-49582 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > > Fix "dispatch_window_method" utility and documentation -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49578) Change error message for CAST_INVALID_INPUT and CAST_OVERFLOW
[ https://issues.apache.org/jira/browse/SPARK-49578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49578: --- Labels: pull-request-available (was: ) > Change error message for CAST_INVALID_INPUT and CAST_OVERFLOW > - > > Key: SPARK-49578 > URL: https://issues.apache.org/jira/browse/SPARK-49578 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > > CAST_INVALID_INPUT and CAST_OVERFLOW both contain suggested fixes for turning > off ANSI mode. Now that in Spark 4.0.0 we have moved to ANSI mode on by > default, we want to keep suggestions of this kind to the minimal. There > exists implementation of `try_cast` which provides casting as for ANSI mode > off and that suggestion should be sufficient for users to move forward. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49579) Rename errorClass in checkError()
[ https://issues.apache.org/jira/browse/SPARK-49579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49579: --- Labels: pull-request-available (was: ) > Rename errorClass in checkError() > - > > Key: SPARK-49579 > URL: https://issues.apache.org/jira/browse/SPARK-49579 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > > Rename errorClass to condition in checkError() and related functions to > follow up the agreement of https://issues.apache.org/jira/browse/SPARK-46810 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49576) Upload Python logs in CI
[ https://issues.apache.org/jira/browse/SPARK-49576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49576: --- Labels: pull-request-available (was: ) > Upload Python logs in CI > > > Key: SPARK-49576 > URL: https://issues.apache.org/jira/browse/SPARK-49576 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > e.g., > /__w/spark/spark/python/target/28a23950-46c7-45c5-a9b7-42e7d9b21518/python3.12__pyspark.sql.tests.connect.test_connect_session__ah_ug0xu.log) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49567) Use `classic` instead of `vanilla` from PySpark code base
[ https://issues.apache.org/jira/browse/SPARK-49567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49567: --- Labels: pull-request-available (was: ) > Use `classic` instead of `vanilla` from PySpark code base > - > > Key: SPARK-49567 > URL: https://issues.apache.org/jira/browse/SPARK-49567 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > We decided to use classic for legacy PySpark -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49544) Severe lock contention in SparkConnectExecutionManager
[ https://issues.apache.org/jira/browse/SPARK-49544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49544: --- Labels: pull-request-available (was: ) > Severe lock contention in SparkConnectExecutionManager > -- > > Key: SPARK-49544 > URL: https://issues.apache.org/jira/browse/SPARK-49544 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Changgyoo Park >Priority: Major > Labels: pull-request-available > > Critical sections protected by executionsLock can become too broad when there > are too many ExecuteHolders, e.g., >= 10^4. The problem is aggravated when > there are too many threads in the system: priority inversion. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49545) Increase timeout for build from 3 to 4 hours
[ https://issues.apache.org/jira/browse/SPARK-49545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49545: --- Labels: pull-request-available (was: ) > Increase timeout for build from 3 to 4 hours > > > Key: SPARK-49545 > URL: https://issues.apache.org/jira/browse/SPARK-49545 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > https://github.com/apache/spark/actions/workflows/build_python_3.12.yml fails > with hitting 3 hours. We should increase it up to 4 hours. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49501) Catalog createTable API is double-escaping paths
[ https://issues.apache.org/jira/browse/SPARK-49501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49501: -- Assignee: (was: Apache Spark) > Catalog createTable API is double-escaping paths > > > Key: SPARK-49501 > URL: https://issues.apache.org/jira/browse/SPARK-49501 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Christos Stavrakakis >Priority: Major > Labels: pull-request-available > > Creating an external table using {{spark.catalog.createTable}} results in > incorrect escaping of special chars in paths. > Consider the following code: > {}spark.catalog.createTable({}}}{{{}"testTable", source = "parquet", > schema = new StructType().add("id", "int"), description = "", options = > Map("path" -> "/tmp/test table")){} > The above call creates a table that is stored in {{/tmp/test%20table}} > instead of {{{}/tmp/test table{}}}. Note that this behaviour is different > from the SQL API, e.g. {{create table testTable(id int) using parquet > location '/tmp/test table'}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49506) Optimize ArrayBinarySearch for foldable array
[ https://issues.apache.org/jira/browse/SPARK-49506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49506: -- Assignee: BingKun Pan (was: Apache Spark) > Optimize ArrayBinarySearch for foldable array > - > > Key: SPARK-49506 > URL: https://issues.apache.org/jira/browse/SPARK-49506 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: BingKun Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49540) Unify the usage of `distributed_sequence_id`
[ https://issues.apache.org/jira/browse/SPARK-49540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49540: --- Labels: pull-request-available (was: ) > Unify the usage of `distributed_sequence_id` > > > Key: SPARK-49540 > URL: https://issues.apache.org/jira/browse/SPARK-49540 > Project: Spark > Issue Type: Improvement > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49538) Detect unused message parameters
[ https://issues.apache.org/jira/browse/SPARK-49538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49538: --- Labels: pull-request-available (was: ) > Detect unused message parameters > > > Key: SPARK-49538 > URL: https://issues.apache.org/jira/browse/SPARK-49538 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Minor > Labels: pull-request-available > > The passed error message parameters and places holders in message format can > not be matched. From the code maintainability perspective, it would be nice > to detect such cases while running tests. > For example, the error message format could look like: > {code} > "CANNOT_UP_CAST_DATATYPE" : { > "message" : [ > "Cannot up cast from to .", > "" > ], > "sqlState" : "42846" > }, > {code} > > but the passed message parameters have extra parameter: > {code:scala} > messageParameters = Map( > "expression" -> "CAST('aaa' AS LONG)", > "sourceType" -> "STRING", > "targetType" -> "LONG", > "op" -> "CAST", // unused parameter > "details" -> "implicit cast" > )) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49537) Incorrect Join stats estimate
[ https://issues.apache.org/jira/browse/SPARK-49537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49537: --- Labels: pull-request-available (was: ) > Incorrect Join stats estimate > - > > Key: SPARK-49537 > URL: https://issues.apache.org/jira/browse/SPARK-49537 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Priority: Major > Labels: pull-request-available > Attachments: diable CBO.png, enable CBO.png > > > Error message: > {noformat} > org.apache.hive.service.cli.HiveSQLException: Error running query: > org.apache.spark.SparkException: Cannot broadcast the table that is larger > than 4GB: 4GB > at > org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:45) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:340) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:198) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > {noformat} > Left side stats: > {noformat} > 36126 bytes, 2150 rows > {noformat} > |info_name|info_value| > |col_name|brand| > |data_type|string| > |comment|NULL| > |min|NULL| > |max|NULL| > |num_nulls|1| > |distinct_count|1980| > |avg_col_len|9| > |max_col_len|38| > |histogram|NULL| > Right side stats: > {noformat} > 13250653950 bytes, 1470064309 rows > {noformat} > |info_name|info_value| > |col_name|brand| > |data_type|string| > |comment|NULL| > |min|NULL| > |max|NULL| > |num_nulls|320713790| > |distinct_count|3896196| > |avg_col_len|8| > |max_col_len|69| > |histogram|NULL| > Join plan: > {noformat} > == Optimized Logical Plan == > Project [brand#612428, leaf_categ_name#612429, leaf_categ_id#612430, > GMV_LC_AMT#615773, item_price#615665], Statistics(sizeInBytes=2.41E+25 B) > +- Join Inner, ((item_id#615802 = item_id#612432) AND (leaf_categ_id#615805 = > leaf_categ_id#612430)), Statistics(sizeInBytes=3.07E+25 B) >:- Project [brand#612428, leaf_categ_name#612429, leaf_categ_id#612430, > item_id#612432], Statistics(sizeInBytes=55.7 MiB, rowCount=8.11E+5) >: +- Join Inner, (brand#612434 = brand#612428), > Statistics(sizeInBytes=71.1 MiB, rowCount=8.11E+5) >: :- Project [brand#612428, leaf_categ_name#612429, > leaf_categ_id#612430], Statistics(sizeInBytes=136.4 KiB, rowCount=2.15E+3) >: : +- Filter (isnotnull(leaf_categ_id#612430) AND > isnotnull(brand#612428)), Statistics(sizeInBytes=170.0 KiB, rowCount=2.15E+3) >: : +- Relation > spark_catalog.tableA[brand#612428,leaf_categ_name#612429,leaf_categ_id#612430,dom_gmv#612431] > parquet, Statistics(sizeInBytes=170.1 KiB, rowCount=2.15E+3) >: +- Project [item_id#612432, brand#612434], > Statistics(sizeInBytes=38.5 GiB, rowCount=1.15E+9) >:+- Filter (isnotnull(item_id#612432) AND > isnotnull(brand#612434)), Statistics(sizeInBytes=42.8 GiB, rowCount=1.15E+9) >: +- Relation > spark_catalog.tableB[item_id#612432,auct_end_dt#612433,brand#612434] parquet, > Statistics(sizeInBytes=54.8 GiB, rowCount=1.47E+9) >+- Project [item_id#615802, leaf_categ_id#615805, CASE WHEN > tax_state#615824 IN (UK,EU) THEN cast(broundcast(quantity#615828 as > decimal(10,0)) * item_price#615827) + item_sales_tax_amt#615887) / > cast(quantity#615828 as decimal(10,0))), 2) as decimal(38,2)) ELSE > cast(item_price#615827 as decimal(38,2)) END AS item_price#615665, > coalesce(GMV_LC_AMT#615933, 0.00) AS gmv_lc_amt#615773], > Statistics(sizeInBytes=466.4 PiB) > +- Join LeftOuter, (cast(byr_curncy_id#615921 as decimal(9,0)) = > curncy_id#615796), Statistics(sizeInBytes=799.5 PiB) > :- Project [item_id#615802, leaf_categ_id#615805, tax_state#615824, > item_price#615827, quantity#615828, item_sales_tax_amt#615887, > byr_curncy_id#615921, GMV_LC_AMT#615933], Statistics(sizeInBytes=756.7 TiB) > : +- Join LeftOuter, (cast(lstg_curncy_id#615848 as decimal(9,0)) = > curncy_id#612267), Statistics(sizeInBytes=894.2 TiB) > : :- Project [item_id#615802, leaf_categ_id#615805, > tax_state#615824, item_price#615827, quantity#615828, lstg_curncy_id#615848, > item_sales_tax_amt#615887, byr_curncy_id#615921, GMV_LC_AMT#615933], > Statistics(sizeInBytes=846.3 GiB) > : : +- Filter ((isnotnull(GMV_DT#615926) AND > isnotnull(seller_id#615806)) AND (GMV_DT#615926 >= 2023-09-01)) AND > (GMV_DT#615926 <= 2024-08-31)) AND isnotnull(item_id#615802)) AND > isnotnull(leaf_categ_id#615805)) AND site_id#615804 IN (0,100)) AND NOT > checkou
[jira] [Updated] (SPARK-49536) Add error handling for python streaming data source record prefetching
[ https://issues.apache.org/jira/browse/SPARK-49536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49536: --- Labels: pull-request-available (was: ) > Add error handling for python streaming data source record prefetching > -- > > Key: SPARK-49536 > URL: https://issues.apache.org/jira/browse/SPARK-49536 > Project: Spark > Issue Type: Task > Components: PySpark, SS >Affects Versions: 4.0.0 >Reporter: Chaoqin Li >Priority: Major > Labels: pull-request-available > > Currently there is an assert that return status code from the python worker > is SpecialLengths.START_ARROW_STREAM when python source runner is prefetching > records. To improve debugability, check the status code and rethrow an > runtime error with detailed error message. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48376) [M1] Support for ITERATE statement
[ https://issues.apache.org/jira/browse/SPARK-48376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48376: --- Labels: pull-request-available (was: ) > [M1] Support for ITERATE statement > -- > > Key: SPARK-48376 > URL: https://issues.apache.org/jira/browse/SPARK-48376 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Assignee: David Milicevic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add support for ITERATE statement in WHILE (and other) loops to SQL scripting > parser & interpreter. > This is the same functionality as CONTINUE in other languages. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49424) Consolidate Encoders in sql/api
[ https://issues.apache.org/jira/browse/SPARK-49424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49424: --- Labels: pull-request-available (was: ) > Consolidate Encoders in sql/api > --- > > Key: SPARK-49424 > URL: https://issues.apache.org/jira/browse/SPARK-49424 > Project: Spark > Issue Type: New Feature > Components: Connect, SQL >Affects Versions: 4.0.0 >Reporter: Herman van Hövell >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49534) `sql/hive` should not be prepended when `spark-hive_xxx.jar` is not in the classpath
[ https://issues.apache.org/jira/browse/SPARK-49534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49534: -- Assignee: (was: Apache Spark) > `sql/hive` should not be prepended when `spark-hive_xxx.jar` is not in the > classpath > > > Key: SPARK-49534 > URL: https://issues.apache.org/jira/browse/SPARK-49534 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0, 3.5.2 >Reporter: Yang Jie >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49534) `sql/hive` should not be prepended when `spark-hive_xxx.jar` is not in the classpath
[ https://issues.apache.org/jira/browse/SPARK-49534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49534: -- Assignee: Apache Spark > `sql/hive` should not be prepended when `spark-hive_xxx.jar` is not in the > classpath > > > Key: SPARK-49534 > URL: https://issues.apache.org/jira/browse/SPARK-49534 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0, 3.5.2 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49534) `sql/hive` should not be prepended when `spark-hive_xxx.jar` is not in the classpath
[ https://issues.apache.org/jira/browse/SPARK-49534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49534: --- Labels: pull-request-available (was: ) > `sql/hive` should not be prepended when `spark-hive_xxx.jar` is not in the > classpath > > > Key: SPARK-49534 > URL: https://issues.apache.org/jira/browse/SPARK-49534 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0, 3.5.2 >Reporter: Yang Jie >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49527) Generate Spark Operator Config Property Doc
[ https://issues.apache.org/jira/browse/SPARK-49527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49527: --- Labels: pull-request-available (was: ) > Generate Spark Operator Config Property Doc > --- > > Key: SPARK-49527 > URL: https://issues.apache.org/jira/browse/SPARK-49527 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49505) Create SQL functions to generate random strings or numbers within ranges
[ https://issues.apache.org/jira/browse/SPARK-49505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49505: --- Labels: pull-request-available (was: ) > Create SQL functions to generate random strings or numbers within ranges > > > Key: SPARK-49505 > URL: https://issues.apache.org/jira/browse/SPARK-49505 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49526) Windows-style paths are unsupported in ArtifactManager
[ https://issues.apache.org/jira/browse/SPARK-49526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49526: --- Labels: pull-request-available (was: ) > Windows-style paths are unsupported in ArtifactManager > -- > > Key: SPARK-49526 > URL: https://issues.apache.org/jira/browse/SPARK-49526 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > Labels: pull-request-available > > Currently, windows-based clients will run into an issue when using the > `addArtifact` API as the path passed to the server would contain backslashes > which the server would interpret as part of the file name rather than a > separator. > E.g if the client sends the name `pyfiles\abc.txt` to the server, then the > artifact would be written out as `/pyfiles\abc.txt` > instead of the correct `\pyfiles\abc.txt -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49525) Log improvement for server side streaming query listener bus listener
[ https://issues.apache.org/jira/browse/SPARK-49525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49525: --- Labels: pull-request-available (was: ) > Log improvement for server side streaming query listener bus listener > - > > Key: SPARK-49525 > URL: https://issues.apache.org/jira/browse/SPARK-49525 > Project: Spark > Issue Type: Task > Components: Connect, SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49518) Use build-helper-maven-plugin to manage the code for volcano
[ https://issues.apache.org/jira/browse/SPARK-49518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49518: -- Assignee: Apache Spark > Use build-helper-maven-plugin to manage the code for volcano > > > Key: SPARK-49518 > URL: https://issues.apache.org/jira/browse/SPARK-49518 > Project: Spark > Issue Type: Improvement > Components: Build, Kubernetes >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49507) Fix Expected only partition pruning predicates exception
[ https://issues.apache.org/jira/browse/SPARK-49507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49507: --- Labels: pull-request-available (was: ) > Fix Expected only partition pruning predicates exception > > > Key: SPARK-49507 > URL: https://issues.apache.org/jira/browse/SPARK-49507 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0, 4.0.0, 3.5.3 >Reporter: Yuming Wang >Priority: Major > Labels: pull-request-available > > How to reproduce: > {code:scala} > sql("CREATE TABLE t (ID BIGINT, DT STRING) USING parquet PARTITIONED BY (DT)") > sql("set spark.sql.hive.metastorePartitionPruningFastFallback=true") > sql("select * from t where dt=20240820").show > {code} > {noformat} > org.apache.spark.sql.AnalysisException: Expected only partition pruning > predicates: List(isnotnull(DT#21), (cast(DT#21 as bigint) = 20240820)). > at > org.apache.spark.sql.errors.QueryCompilationErrors$.nonPartitionPruningPredicatesNotExpectedError(QueryCompilationErrors.scala:2414) > at > org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.generatePartitionPredicateByFilter(ExternalCatalo > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49519) Combine options of table and relation when create CSVScanBuilder
[ https://issues.apache.org/jira/browse/SPARK-49519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49519: --- Labels: pull-request-available (was: ) > Combine options of table and relation when create CSVScanBuilder > > > Key: SPARK-49519 > URL: https://issues.apache.org/jira/browse/SPARK-49519 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiayi Liu >Priority: Major > Labels: pull-request-available > > Currently, the {{CSVTable}} only uses the options from {{relation}} when > constructing the {{{}CSVScanBuilder{}}}, which leads to the omission of the > contents in {{{}CSVTable.options{}}}. For the {{{}TableCatalog{}}}, the > {{dsOptions}} can be set into the {{CSVTable.options}} returned by the > {{TableCatalog.loadTable}} method. If only the relation {{options}} are used > here, the {{TableCatalog}} will not be able to pass {{dsOptions}} that > contains CSV options to {{{}CSVScan{}}}. > Combining the two options is a better option. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49516) Upgrade the minimum K8s version to v1.28
[ https://issues.apache.org/jira/browse/SPARK-49516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49516: --- Labels: pull-request-available (was: ) > Upgrade the minimum K8s version to v1.28 > > > Key: SPARK-49516 > URL: https://issues.apache.org/jira/browse/SPARK-49516 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Project Infra >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49509) Use Platform.allocateDirectBuffer instead of ByteBuffer.allocateDirect
[ https://issues.apache.org/jira/browse/SPARK-49509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49509: -- Assignee: Apache Spark > Use Platform.allocateDirectBuffer instead of ByteBuffer.allocateDirect > -- > > Key: SPARK-49509 > URL: https://issues.apache.org/jira/browse/SPARK-49509 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: dzcxzl >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49509) Use Platform.allocateDirectBuffer instead of ByteBuffer.allocateDirect
[ https://issues.apache.org/jira/browse/SPARK-49509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49509: --- Labels: pull-request-available (was: ) > Use Platform.allocateDirectBuffer instead of ByteBuffer.allocateDirect > -- > > Key: SPARK-49509 > URL: https://issues.apache.org/jira/browse/SPARK-49509 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: dzcxzl >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49506) Optimize ArrayBinarySearch for foldable array
[ https://issues.apache.org/jira/browse/SPARK-49506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49506: --- Labels: pull-request-available (was: ) > Optimize ArrayBinarySearch for foldable array > - > > Key: SPARK-49506 > URL: https://issues.apache.org/jira/browse/SPARK-49506 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: BingKun Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49307) Support Kryo Serialization with AgnosticEncoders
[ https://issues.apache.org/jira/browse/SPARK-49307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49307: --- Labels: pull-request-available (was: ) > Support Kryo Serialization with AgnosticEncoders > > > Key: SPARK-49307 > URL: https://issues.apache.org/jira/browse/SPARK-49307 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 4.0.0 >Reporter: Herman van Hövell >Priority: Major > Labels: pull-request-available > > Add support for Kryo based serialization to Agnostic encoders. This will > allow us to port the entire Encoders class from sql/core to sql/api. > Unfortunately supporting connect is not really possible at this moment. We > cannot share the configuration of the Kryo objects between the server and > connect. This is not possible due to the fact that connect - by design - does > not have a all classes needed on its classpath. This makes constructing the > same configuration (with the same class ids) almost impossible. On top of > this, backwards compatibility will be a problem. > For connect the only way forward is to have a separately configured version > of the kryo serializer that leverages hard coded class ids. That is probably > going to need some form of configurability. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48965) toJSON produces wrong values if DecimalType information is lost in as[Product]
[ https://issues.apache.org/jira/browse/SPARK-48965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48965: --- Labels: correctness pull-request-available (was: correctness) > toJSON produces wrong values if DecimalType information is lost in as[Product] > -- > > Key: SPARK-48965 > URL: https://issues.apache.org/jira/browse/SPARK-48965 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1, 3.5.1 >Reporter: Dmitry Lapshin >Priority: Major > Labels: correctness, pull-request-available > > Consider this example: > {code:scala} > package com.jetbrains.jetstat.etl > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.types.DecimalType > object A { > case class Example(x: BigDecimal) > def main(args: Array[String]): Unit = { > val spark = SparkSession.builder() > .master("local[1]") > .getOrCreate() > import spark.implicits._ > val originalRaw = BigDecimal("123.456") > val original = Example(originalRaw) > val ds1 = spark.createDataset(Seq(original)) > val ds2 = ds1 > .withColumn("x", $"x" cast DecimalType(12, 6)) > val ds3 = ds2 > .as[Example] > println(s"DS1: schema=${ds1.schema}, > encoder.schema=${ds1.encoder.schema}") > println(s"DS2: schema=${ds1.schema}, > encoder.schema=${ds2.encoder.schema}") > println(s"DS3: schema=${ds1.schema}, > encoder.schema=${ds3.encoder.schema}") > val json1 = ds1.toJSON.collect().head > val json2 = ds2.toJSON.collect().head > val json3 = ds3.toJSON.collect().head > val collect1 = ds1.collect().head > val collect2_ = ds2.collect().head > val collect2 = collect2_.getDecimal(collect2_.fieldIndex("x")) > val collect3 = ds3.collect().head > println(s"Original: $original (scale = ${original.x.scale}, precision = > ${original.x.precision})") > println(s"Collect1: $collect1 (scale = ${collect1.x.scale}, precision = > ${collect1.x.precision})") > println(s"Collect2: $collect2 (scale = ${collect2.scale}, precision = > ${collect2.precision})") > println(s"Collect3: $collect3 (scale = ${collect3.x.scale}, precision = > ${collect3.x.precision})") > println(s"json1: $json1") > println(s"json2: $json2") > println(s"json3: $json3") > } > } > {code} > Running it you'd see that json3 contains very much wrong data. After a bit of > debugging, and sorry since I'm bad with Spark internals, I've found that: > * In-memory representation of the data in this example used {{UnsafeRow}}, > whose {{.getDecimal}} uses compression to store small Decimal values as > longs, but doesn't remember decimal sizing parameters, > * However, there are at least two sources for precision & scale to pass to > that method: {{Dataset.schema}} (which is based on query execution, always > contains 38,18 for me) and {{Dataset.encoder.schema}} (that gets updated in > `ds2` to 12,6 but then is reset in `ds3`). Also, there is a > {{Dataset.deserializer}} that seems to be combining those two non-trivially. > * This doesn't seem to affect {{Dataset.collect()}} methods since they use > {{deserializer}}, but {{Dataset.toJSON}} only uses the first schema. > Seems to me that either {{.toJSON}} should be more aware of what's going on > or {{.as[]}} should be doing something else. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49413) Create shared RuntimeConf interface
[ https://issues.apache.org/jira/browse/SPARK-49413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49413: --- Labels: pull-request-available (was: ) > Create shared RuntimeConf interface > --- > > Key: SPARK-49413 > URL: https://issues.apache.org/jira/browse/SPARK-49413 > Project: Spark > Issue Type: New Feature > Components: Connect, SQL >Affects Versions: 4.0.0 >Reporter: Herman van Hövell >Priority: Major > Labels: pull-request-available > > Create a shared RuntimeConf interface in org.apache.spark.sql that is shared > between Classic and Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49504) Add `jjwt` profile
[ https://issues.apache.org/jira/browse/SPARK-49504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49504: --- Labels: pull-request-available (was: ) > Add `jjwt` profile > -- > > Key: SPARK-49504 > URL: https://issues.apache.org/jira/browse/SPARK-49504 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > > This issue aims to add a new profile `jjwt` to provide `jjwt-impl` and > `jjwt-jackson` jars files in a Spark distribution -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49467) Add support for state data source reader and list state
[ https://issues.apache.org/jira/browse/SPARK-49467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49467: --- Labels: pull-request-available (was: ) > Add support for state data source reader and list state > --- > > Key: SPARK-49467 > URL: https://issues.apache.org/jira/browse/SPARK-49467 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Anish Shrigondekar >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27085) Migrate CSV to File Data Source V2
[ https://issues.apache.org/jira/browse/SPARK-27085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-27085: --- Labels: pull-request-available (was: ) > Migrate CSV to File Data Source V2 > -- > > Key: SPARK-27085 > URL: https://issues.apache.org/jira/browse/SPARK-27085 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49502) Avoid NPE in SparkEnv.get.shuffleManager.unregisterShuffle
[ https://issues.apache.org/jira/browse/SPARK-49502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49502: --- Labels: pull-request-available (was: ) > Avoid NPE in SparkEnv.get.shuffleManager.unregisterShuffle > -- > > Key: SPARK-49502 > URL: https://issues.apache.org/jira/browse/SPARK-49502 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: dzcxzl >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49501) Catalog createTable API is double-escaping paths
[ https://issues.apache.org/jira/browse/SPARK-49501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49501: --- Labels: pull-request-available (was: ) > Catalog createTable API is double-escaping paths > > > Key: SPARK-49501 > URL: https://issues.apache.org/jira/browse/SPARK-49501 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Christos Stavrakakis >Priority: Major > Labels: pull-request-available > > Creating an external table using {{spark.catalog.createTable}} results in > incorrect escaping of special chars in paths. > Consider the following code: > {}spark.catalog.createTable({}}}{{{}"testTable", source = "parquet", > schema = new StructType().add("id", "int"), description = "", options = > Map("path" -> "/tmp/test table")){} > The above call creates a table that is stored in {{/tmp/test%20table}} > instead of {{{}/tmp/test table{}}}. Note that this behaviour is different > from the SQL API, e.g. {{create table testTable(id int) using parquet > location '/tmp/test table'}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49414) Create a shared DataFrameReader interface
[ https://issues.apache.org/jira/browse/SPARK-49414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-49414: --- Labels: pull-request-available (was: ) > Create a shared DataFrameReader interface > - > > Key: SPARK-49414 > URL: https://issues.apache.org/jira/browse/SPARK-49414 > Project: Spark > Issue Type: New Feature > Components: Connect, SQL >Affects Versions: 4.0.0 >Reporter: Herman van Hövell >Priority: Major > Labels: pull-request-available > > Create a shared DataFrameReader in org.apache.spark.sql.api that is shared > between Classic and Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48698) Support analyze column stats for tables with collated columns
[ https://issues.apache.org/jira/browse/SPARK-48698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48698: --- Labels: pull-request-available (was: ) > Support analyze column stats for tables with collated columns > - > > Key: SPARK-48698 > URL: https://issues.apache.org/jira/browse/SPARK-48698 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Nikola Mandic >Priority: Major > Labels: pull-request-available > > Following sequence fails: > {code:java} > > create table t(s string collate utf8_lcase) using parquet; > > insert into t values ('A'); > > analyze table t compute statistics for all columns; > [UNSUPPORTED_FEATURE.ANALYZE_UNSUPPORTED_COLUMN_TYPE] The feature is not > supported: The ANALYZE TABLE FOR COLUMNS command does not support the type > "STRING COLLATE UTF8_LCASE" of the column `s` in the table > `spark_catalog`.`default`.`t`. SQLSTATE: 0A000 > {code} > Users should be able to run ANALYZE commands on tables which have columns > with collated type. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48348) [M0] Support for LEAVE statement
[ https://issues.apache.org/jira/browse/SPARK-48348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48348: --- Labels: pull-request-available (was: ) > [M0] Support for LEAVE statement > > > Key: SPARK-48348 > URL: https://issues.apache.org/jira/browse/SPARK-48348 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Priority: Major > Labels: pull-request-available > > Add support for LEAVE statement in WHILE (and other) loops to SQL scripting > parser & interpreter. > This is the same functionality as BREAK in other languages. > > For more details: > * Design doc in parent Jira item. > * [SQL ref > spec|https://docs.google.com/document/d/1cpSuR3KxRuTSJ4ZMQ73FJ4_-hjouNNU2zfI4vri6yhs/edit]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49202) Register `binary_search_for_buckets` in the Scala side
[ https://issues.apache.org/jira/browse/SPARK-49202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49202: -- Assignee: Ruifeng Zheng (was: Apache Spark) > Register `binary_search_for_buckets` in the Scala side > -- > > Key: SPARK-49202 > URL: https://issues.apache.org/jira/browse/SPARK-49202 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49202) Register `binary_search_for_buckets` in the Scala side
[ https://issues.apache.org/jira/browse/SPARK-49202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49202: -- Assignee: Ruifeng Zheng (was: Apache Spark) > Register `binary_search_for_buckets` in the Scala side > -- > > Key: SPARK-49202 > URL: https://issues.apache.org/jira/browse/SPARK-49202 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-49202) Register `binary_search_for_buckets` in the Scala side
[ https://issues.apache.org/jira/browse/SPARK-49202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-49202: -- Assignee: Apache Spark (was: Ruifeng Zheng) > Register `binary_search_for_buckets` in the Scala side > -- > > Key: SPARK-49202 > URL: https://issues.apache.org/jira/browse/SPARK-49202 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org