[jira] [Commented] (SPARK-43956) Fix the bug doesn't display column's sql for Percentile[Cont|Disc]
[ https://issues.apache.org/jira/browse/SPARK-43956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728926#comment-17728926 ] jiaan.geng commented on SPARK-43956: resolved by https://github.com/apache/spark/pull/41436 > Fix the bug doesn't display column's sql for Percentile[Cont|Disc] > -- > > Key: SPARK-43956 > URL: https://issues.apache.org/jira/browse/SPARK-43956 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.5.0 > > > Last year, I committed Percentile[Cont|Disc] functions for Spark SQL. > Recently, I found the sql method of Percentile[Cont|Disc] doesn't display > column's sql suitably. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43956) Fix the bug doesn't display column's sql for Percentile[Cont|Disc]
[ https://issues.apache.org/jira/browse/SPARK-43956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-43956. - Assignee: jiaan.geng Resolution: Fixed > Fix the bug doesn't display column's sql for Percentile[Cont|Disc] > -- > > Key: SPARK-43956 > URL: https://issues.apache.org/jira/browse/SPARK-43956 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.5.0 > > > Last year, I committed Percentile[Cont|Disc] functions for Spark SQL. > Recently, I found the sql method of Percentile[Cont|Disc] doesn't display > column's sql suitably. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43956) Fix the bug doesn't display column's sql for Percentile[Cont|Disc]
jiaan.geng created SPARK-43956: -- Summary: Fix the bug doesn't display column's sql for Percentile[Cont|Disc] Key: SPARK-43956 URL: https://issues.apache.org/jira/browse/SPARK-43956 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: jiaan.geng Fix For: 3.5.0 Last year, I committed Percentile[Cont|Disc] functions for Spark SQL. Recently, I found the sql method of Percentile[Cont|Disc] doesn't display column's sql suitably. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43954) Upgrade sbt from 1.8.3 to 1.9.0
[ https://issues.apache.org/jira/browse/SPARK-43954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-43954: Description: Release notes: [https://github.com/sbt/sbt/releases/tag/v1.9.0] > Upgrade sbt from 1.8.3 to 1.9.0 > --- > > Key: SPARK-43954 > URL: https://issues.apache.org/jira/browse/SPARK-43954 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > > Release notes: [https://github.com/sbt/sbt/releases/tag/v1.9.0] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43955) Upgrade `scalafmt` from 3.7.3 to 3.7.4
[ https://issues.apache.org/jira/browse/SPARK-43955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-43955: Description: Release notes: https://github.com/scalameta/scalafmt/releases/tag/v3.7.4 > Upgrade `scalafmt` from 3.7.3 to 3.7.4 > -- > > Key: SPARK-43955 > URL: https://issues.apache.org/jira/browse/SPARK-43955 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > > Release notes: https://github.com/scalameta/scalafmt/releases/tag/v3.7.4 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43955) Upgrade `scalafmt` from 3.7.3 to 3.7.4
BingKun Pan created SPARK-43955: --- Summary: Upgrade `scalafmt` from 3.7.3 to 3.7.4 Key: SPARK-43955 URL: https://issues.apache.org/jira/browse/SPARK-43955 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43954) Upgrade sbt from 1.8.3 to 1.9.0
[ https://issues.apache.org/jira/browse/SPARK-43954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-43954: Summary: Upgrade sbt from 1.8.3 to 1.9.0 (was: Upgrade sbt from 1.8.2 to 1.9.0) > Upgrade sbt from 1.8.3 to 1.9.0 > --- > > Key: SPARK-43954 > URL: https://issues.apache.org/jira/browse/SPARK-43954 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43954) Upgrade sbt from 1.8.2 to 1.9.0
BingKun Pan created SPARK-43954: --- Summary: Upgrade sbt from 1.8.2 to 1.9.0 Key: SPARK-43954 URL: https://issues.apache.org/jira/browse/SPARK-43954 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43380) Fix Avro data type conversion issues to avoid producing incorrect results
[ https://issues.apache.org/jira/browse/SPARK-43380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-43380. Fix Version/s: 3.5.0 (was: 3.4.0) Resolution: Fixed Issue resolved by pull request 41052 [https://github.com/apache/spark/pull/41052] > Fix Avro data type conversion issues to avoid producing incorrect results > - > > Key: SPARK-43380 > URL: https://issues.apache.org/jira/browse/SPARK-43380 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zerui Bao >Priority: Major > Fix For: 3.5.0 > > > We found the following issues with open-source Avro: > * Interval types can be read as date or timestamp types that would lead to > wildly different results > * Decimal types can be read with lower precision, that leads to data being > read as {{null}} instead of suggesting that a wider decimal format should be > provided -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36612) Support left outer join build left or right outer join build right in shuffled hash join
[ https://issues.apache.org/jira/browse/SPARK-36612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-36612. -- Fix Version/s: 3.5.0 Resolution: Fixed > Support left outer join build left or right outer join build right in > shuffled hash join > > > Key: SPARK-36612 > URL: https://issues.apache.org/jira/browse/SPARK-36612 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: mcdull_zhang >Assignee: Szehon Ho >Priority: Major > Fix For: 3.5.0 > > > Currently spark sql does not support build left side when left outer join (or > build right side when right outer join). > However, in our production environment, there are a large number of scenarios > where small tables are left join large tables, and many times, large tables > have data skew (currently AQE can't handle this kind of skew). > Inspired by SPARK-32399, we can use similar ideas to realize left outer join > build left. > I think this treatment is very meaningful, but I don’t know how members > consider this matter? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36612) Support left outer join build left or right outer join build right in shuffled hash join
[ https://issues.apache.org/jira/browse/SPARK-36612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-36612: Assignee: Szehon Ho > Support left outer join build left or right outer join build right in > shuffled hash join > > > Key: SPARK-36612 > URL: https://issues.apache.org/jira/browse/SPARK-36612 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: mcdull_zhang >Assignee: Szehon Ho >Priority: Major > > Currently spark sql does not support build left side when left outer join (or > build right side when right outer join). > However, in our production environment, there are a large number of scenarios > where small tables are left join large tables, and many times, large tables > have data skew (currently AQE can't handle this kind of skew). > Inspired by SPARK-32399, we can use similar ideas to realize left outer join > build left. > I think this treatment is very meaningful, but I don’t know how members > consider this matter? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43953) Remove pass
Bjørn Jørgensen created SPARK-43953: --- Summary: Remove pass Key: SPARK-43953 URL: https://issues.apache.org/jira/browse/SPARK-43953 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Bjørn Jørgensen -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43904) Upgrade jackson to 2.15.2
[ https://issues.apache.org/jira/browse/SPARK-43904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-43904. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41414 [https://github.com/apache/spark/pull/41414] > Upgrade jackson to 2.15.2 > - > > Key: SPARK-43904 > URL: https://issues.apache.org/jira/browse/SPARK-43904 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43950) Upgrade kubernetes-client to 6.7.0
[ https://issues.apache.org/jira/browse/SPARK-43950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-43950: - Assignee: Bjørn Jørgensen > Upgrade kubernetes-client to 6.7.0 > -- > > Key: SPARK-43950 > URL: https://issues.apache.org/jira/browse/SPARK-43950 > Project: Spark > Issue Type: Dependency upgrade > Components: Build, Kubernetes >Affects Versions: 3.5.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43950) Upgrade kubernetes-client to 6.7.0
[ https://issues.apache.org/jira/browse/SPARK-43950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-43950. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41434 [https://github.com/apache/spark/pull/41434] > Upgrade kubernetes-client to 6.7.0 > -- > > Key: SPARK-43950 > URL: https://issues.apache.org/jira/browse/SPARK-43950 > Project: Spark > Issue Type: Dependency upgrade > Components: Build, Kubernetes >Affects Versions: 3.5.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43904) Upgrade jackson to 2.15.2
[ https://issues.apache.org/jira/browse/SPARK-43904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-43904: - Assignee: BingKun Pan > Upgrade jackson to 2.15.2 > - > > Key: SPARK-43904 > URL: https://issues.apache.org/jira/browse/SPARK-43904 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43945) Fix bug for `SQLQueryTestSuite` when run on local env
[ https://issues.apache.org/jira/browse/SPARK-43945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-43945: Assignee: BingKun Pan > Fix bug for `SQLQueryTestSuite` when run on local env > - > > Key: SPARK-43945 > URL: https://issues.apache.org/jira/browse/SPARK-43945 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43945) Fix bug for `SQLQueryTestSuite` when run on local env
[ https://issues.apache.org/jira/browse/SPARK-43945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-43945. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41431 [https://github.com/apache/spark/pull/41431] > Fix bug for `SQLQueryTestSuite` when run on local env > - > > Key: SPARK-43945 > URL: https://issues.apache.org/jira/browse/SPARK-43945 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43922) Add named argument support in parser for function call
[ https://issues.apache.org/jira/browse/SPARK-43922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Yu updated SPARK-43922: --- Summary: Add named argument support in parser for function call (was: Add named parameter support in parser for function call) > Add named argument support in parser for function call > -- > > Key: SPARK-43922 > URL: https://issues.apache.org/jira/browse/SPARK-43922 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.5.0 >Reporter: Richard Yu >Priority: Major > > Today, we are implementing named parameter support for user defined > functions, some built-in functions, and table-valued functions. For the first > step towards building such a feature, we need to make some requisite changes > in the parser. > To accomplish this, in this issue, we plan to add some new syntax tokens to > the parser in Spark. Changes will also be made in the abstract syntax tree > builder as well to reflect these new tokens. Such changes will first be > restricted to normal function calls (table value functions will be treated > separately). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43922) Add named argument support in parser for function call
[ https://issues.apache.org/jira/browse/SPARK-43922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Yu updated SPARK-43922: --- Description: Today, we are implementing named argument support for user defined functions, some built-in functions, and table-valued functions. For the first step towards building such a feature, we need to make some requisite changes in the parser. To accomplish this, in this issue, we plan to add some new syntax tokens to the parser in Spark. Changes will also be made in the abstract syntax tree builder as well to reflect these new tokens. Such changes will first be restricted to normal function calls (table value functions will be treated separately). was: Today, we are implementing named parameter support for user defined functions, some built-in functions, and table-valued functions. For the first step towards building such a feature, we need to make some requisite changes in the parser. To accomplish this, in this issue, we plan to add some new syntax tokens to the parser in Spark. Changes will also be made in the abstract syntax tree builder as well to reflect these new tokens. Such changes will first be restricted to normal function calls (table value functions will be treated separately). > Add named argument support in parser for function call > -- > > Key: SPARK-43922 > URL: https://issues.apache.org/jira/browse/SPARK-43922 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.5.0 >Reporter: Richard Yu >Priority: Major > > Today, we are implementing named argument support for user defined functions, > some built-in functions, and table-valued functions. For the first step > towards building such a feature, we need to make some requisite changes in > the parser. > To accomplish this, in this issue, we plan to add some new syntax tokens to > the parser in Spark. Changes will also be made in the abstract syntax tree > builder as well to reflect these new tokens. Such changes will first be > restricted to normal function calls (table value functions will be treated > separately). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43952) Cancel Spark jobs not only by a single "jobgroup", but allow multiple "job tags"
[ https://issues.apache.org/jira/browse/SPARK-43952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728791#comment-17728791 ] Juliusz Sompolski commented on SPARK-43952: --- Indirectly related with https://issues.apache.org/jira/browse/SPARK-43754 to allow Spark Connect cancellation of queries not conflict with other places setting job groups. > Cancel Spark jobs not only by a single "jobgroup", but allow multiple "job > tags" > > > Key: SPARK-43952 > URL: https://issues.apache.org/jira/browse/SPARK-43952 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Priority: Major > > Currently, the only way to cancel running Spark Jobs is by using > SparkContext.cancelJobGroup, using a job group name that was previously set > using SparkContext.setJobGroup. This is problematic if multiple different > parts of the system want to do cancellation, and set their own ids. > For example, > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L133] > sets it's own job group, which may override job group set by user. This way, > if user cancels the job group they set, it will not cancel these broadcast > jobs launches from within their jobs... > As a solution, consider adding SparkContext.addJobTag / > SparkContext.removeJobTag, which would allow to have multiple "tags" on the > jobs, and introduce SparkContext.cancelJobsByTag to allow more flexible > cancelling of jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43754) Spark Connect Session & Query lifecycle
[ https://issues.apache.org/jira/browse/SPARK-43754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728790#comment-17728790 ] Juliusz Sompolski commented on SPARK-43754: --- Indirectly related to https://issues.apache.org/jira/browse/SPARK-43952 (to allow Spark Connect cancellation of queries not conflict with other places setting job groups) > Spark Connect Session & Query lifecycle > --- > > Key: SPARK-43754 > URL: https://issues.apache.org/jira/browse/SPARK-43754 > Project: Spark > Issue Type: Epic > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Priority: Major > > Currently, queries in Spark Connect are executed within the RPC handler. > We want to detach the RPC interface from actual sessions and execution, so > that we can make the interface more flexible > * maintain long running sessions, independent of unbroken GRPC channel > * be able to cancel queries > * have different interfaces to query results than push from server -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43952) Cancel Spark jobs not only by a single "jobgroup", but allow multiple "job tags"
Juliusz Sompolski created SPARK-43952: - Summary: Cancel Spark jobs not only by a single "jobgroup", but allow multiple "job tags" Key: SPARK-43952 URL: https://issues.apache.org/jira/browse/SPARK-43952 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 3.5.0 Reporter: Juliusz Sompolski Currently, the only way to cancel running Spark Jobs is by using SparkContext.cancelJobGroup, using a job group name that was previously set using SparkContext.setJobGroup. This is problematic if multiple different parts of the system want to do cancellation, and set their own ids. For example, [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L133] sets it's own job group, which may override job group set by user. This way, if user cancels the job group they set, it will not cancel these broadcast jobs launches from within their jobs... As a solution, consider adding SparkContext.addJobTag / SparkContext.removeJobTag, which would allow to have multiple "tags" on the jobs, and introduce SparkContext.cancelJobsByTag to allow more flexible cancelling of jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-43864) Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL
[ https://issues.apache.org/jira/browse/SPARK-43864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728759#comment-17728759 ] BingKun Pan edited comment on SPARK-43864 at 6/2/23 2:20 PM: - @[~gaoyajun02] Okay, Let me investigate it first. was (Author: panbingkun): [~gaoyajun02] Okay, Let me investigate it first. > Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before > 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL > > > Key: SPARK-43864 > URL: https://issues.apache.org/jira/browse/SPARK-43864 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: gaoyajun02 >Priority: Minor > > CVE-2023-26119 Detail: [https://nvd.nist.gov/vuln/detail/CVE-2023-26119] > It is recommended to replace 'net.sourceforge.htmlunit'' by 'org.htmlunit' in > spark > {code:java} > > org.htmlunit > htmlunit > test > > > org.htmlunit > htmlunit-core-js > test > {code} > see: [https://www.htmlunit.org/migration.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43864) Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL
[ https://issues.apache.org/jira/browse/SPARK-43864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728759#comment-17728759 ] BingKun Pan commented on SPARK-43864: - [~gaoyajun02] Okay, Let me investigate it first. > Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before > 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL > > > Key: SPARK-43864 > URL: https://issues.apache.org/jira/browse/SPARK-43864 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: gaoyajun02 >Priority: Minor > > CVE-2023-26119 Detail: [https://nvd.nist.gov/vuln/detail/CVE-2023-26119] > It is recommended to replace 'net.sourceforge.htmlunit'' by 'org.htmlunit' in > spark > {code:java} > > org.htmlunit > htmlunit > test > > > org.htmlunit > htmlunit-core-js > test > {code} > see: [https://www.htmlunit.org/migration.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43943) Add math functions to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728738#comment-17728738 ] GridGain Integration commented on SPARK-43943: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/41435 > Add math functions to Scala and Python > -- > > Key: SPARK-43943 > URL: https://issues.apache.org/jira/browse/SPARK-43943 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > > Add following functions: > * ceiling > * e > * pi > * ln > * negative > * positive > * power > * sign > * std > * width_bucket > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43951) RocksDB state store can become corrupt on task retries
[ https://issues.apache.org/jira/browse/SPARK-43951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Binford resolved SPARK-43951. -- Resolution: Fixed > RocksDB state store can become corrupt on task retries > -- > > Key: SPARK-43951 > URL: https://issues.apache.org/jira/browse/SPARK-43951 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Adam Binford >Priority: Major > > A couple of our streaming jobs have failed since upgrading to Spark 3.4 with > an error such as: > org.rocksdb.RocksDBException: Mismatch in unique ID on table file ###. > Expected: [###,###} Actual\{###,###} in file /MANIFEST- > This is due to the change from > [https://github.com/facebook/rocksdb/commit/6de7081cf37169989e289a4801187097f0c50fae] > that enabled unique ID checks by default, and I finally tracked down the > exact sequence of steps that leads to this failure in the way RocksDB state > store is used. > # A task fails after uploading the checkpoint to HDFS. Lets say it uploaded > 11.zip to version 11 of the table, but the task failed before it could finish > after successfully uploading the checkpoint. > # The same task is retried and goes back to load version 10 of the table as > expected. > # Cleanup/maintenance is called for this partition, which looks in HDFS for > persisted versions and sees up through version 11 since that zip file was > successfully uploaded on the previous task. > # As part of resolving what SST files are part of each table version, > versionToRocksDBFiles.put(version, newResolvedFiles) is called for version 11 > with its SST files that were uploaded in the first failed task. > # The second attempt at the task commits and goes to sync its checkpoint to > HDFS. > # versionToRocksDBFiles contains the SST files to upload from step 4, and > these files are considered "the same" as what's in the local working dir > because the name and file size match. > # No SST files are uploaded because they matched above, but in reality the > unique ID inside the SST files is different (presumably this is just randomly > generated and inserted into each SST file?), it just doesn't affect the size. > # A new METADATA file is uploaded which has the new unique IDs listed inside. > # When version 11 of the table is read during the next batch, the unique IDs > in the METADATA file don't match the unique IDS in the SST files, which > causes the exception. > > This is basically a ticking time bomb for anyone using RocksDB. Thoughts on > possible fixes would be: > * Disable unique ID verification. I don't currently see a binding for this > in the RocksDB java wrapper, so that would probably have to be added first. > * Disable checking if files are already uploaded with the same size, and > just always upload SST files no matter what. > * Update the "same file" check to also be able to do some kind of CRC > comparison or something like that. > * Update the mainteance/cleanup to not update the versionToRocksDBFiles map. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43951) RocksDB state store can become corrupt on task retries
[ https://issues.apache.org/jira/browse/SPARK-43951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728736#comment-17728736 ] Adam Binford commented on SPARK-43951: -- Of course as soon as I finish figuring all this out I found https://github.com/apache/spark/pull/41089 > RocksDB state store can become corrupt on task retries > -- > > Key: SPARK-43951 > URL: https://issues.apache.org/jira/browse/SPARK-43951 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Adam Binford >Priority: Major > > A couple of our streaming jobs have failed since upgrading to Spark 3.4 with > an error such as: > org.rocksdb.RocksDBException: Mismatch in unique ID on table file ###. > Expected: [###,###} Actual\{###,###} in file /MANIFEST- > This is due to the change from > [https://github.com/facebook/rocksdb/commit/6de7081cf37169989e289a4801187097f0c50fae] > that enabled unique ID checks by default, and I finally tracked down the > exact sequence of steps that leads to this failure in the way RocksDB state > store is used. > # A task fails after uploading the checkpoint to HDFS. Lets say it uploaded > 11.zip to version 11 of the table, but the task failed before it could finish > after successfully uploading the checkpoint. > # The same task is retried and goes back to load version 10 of the table as > expected. > # Cleanup/maintenance is called for this partition, which looks in HDFS for > persisted versions and sees up through version 11 since that zip file was > successfully uploaded on the previous task. > # As part of resolving what SST files are part of each table version, > versionToRocksDBFiles.put(version, newResolvedFiles) is called for version 11 > with its SST files that were uploaded in the first failed task. > # The second attempt at the task commits and goes to sync its checkpoint to > HDFS. > # versionToRocksDBFiles contains the SST files to upload from step 4, and > these files are considered "the same" as what's in the local working dir > because the name and file size match. > # No SST files are uploaded because they matched above, but in reality the > unique ID inside the SST files is different (presumably this is just randomly > generated and inserted into each SST file?), it just doesn't affect the size. > # A new METADATA file is uploaded which has the new unique IDs listed inside. > # When version 11 of the table is read during the next batch, the unique IDs > in the METADATA file don't match the unique IDS in the SST files, which > causes the exception. > > This is basically a ticking time bomb for anyone using RocksDB. Thoughts on > possible fixes would be: > * Disable unique ID verification. I don't currently see a binding for this > in the RocksDB java wrapper, so that would probably have to be added first. > * Disable checking if files are already uploaded with the same size, and > just always upload SST files no matter what. > * Update the "same file" check to also be able to do some kind of CRC > comparison or something like that. > * Update the mainteance/cleanup to not update the versionToRocksDBFiles map. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43951) RocksDB state store can become corrupt on task retries
Adam Binford created SPARK-43951: Summary: RocksDB state store can become corrupt on task retries Key: SPARK-43951 URL: https://issues.apache.org/jira/browse/SPARK-43951 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.0 Reporter: Adam Binford A couple of our streaming jobs have failed since upgrading to Spark 3.4 with an error such as: org.rocksdb.RocksDBException: Mismatch in unique ID on table file ###. Expected: [###,###} Actual\{###,###} in file /MANIFEST- This is due to the change from [https://github.com/facebook/rocksdb/commit/6de7081cf37169989e289a4801187097f0c50fae] that enabled unique ID checks by default, and I finally tracked down the exact sequence of steps that leads to this failure in the way RocksDB state store is used. # A task fails after uploading the checkpoint to HDFS. Lets say it uploaded 11.zip to version 11 of the table, but the task failed before it could finish after successfully uploading the checkpoint. # The same task is retried and goes back to load version 10 of the table as expected. # Cleanup/maintenance is called for this partition, which looks in HDFS for persisted versions and sees up through version 11 since that zip file was successfully uploaded on the previous task. # As part of resolving what SST files are part of each table version, versionToRocksDBFiles.put(version, newResolvedFiles) is called for version 11 with its SST files that were uploaded in the first failed task. # The second attempt at the task commits and goes to sync its checkpoint to HDFS. # versionToRocksDBFiles contains the SST files to upload from step 4, and these files are considered "the same" as what's in the local working dir because the name and file size match. # No SST files are uploaded because they matched above, but in reality the unique ID inside the SST files is different (presumably this is just randomly generated and inserted into each SST file?), it just doesn't affect the size. # A new METADATA file is uploaded which has the new unique IDs listed inside. # When version 11 of the table is read during the next batch, the unique IDs in the METADATA file don't match the unique IDS in the SST files, which causes the exception. This is basically a ticking time bomb for anyone using RocksDB. Thoughts on possible fixes would be: * Disable unique ID verification. I don't currently see a binding for this in the RocksDB java wrapper, so that would probably have to be added first. * Disable checking if files are already uploaded with the same size, and just always upload SST files no matter what. * Update the "same file" check to also be able to do some kind of CRC comparison or something like that. * Update the mainteance/cleanup to not update the versionToRocksDBFiles map. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43864) Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL
[ https://issues.apache.org/jira/browse/SPARK-43864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728724#comment-17728724 ] gaoyajun02 commented on SPARK-43864: It looks like a series of test package dependencies need to be changed.I'm not very familiar with these, Can you solve it? @[~panbingkun] > Versions of the package net.sourceforge.htmlunit:htmlunit from 0 and before > 3.0.0 are vulnerable to Remote Code Execution (RCE) via XSTL > > > Key: SPARK-43864 > URL: https://issues.apache.org/jira/browse/SPARK-43864 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: gaoyajun02 >Priority: Minor > > CVE-2023-26119 Detail: [https://nvd.nist.gov/vuln/detail/CVE-2023-26119] > It is recommended to replace 'net.sourceforge.htmlunit'' by 'org.htmlunit' in > spark > {code:java} > > org.htmlunit > htmlunit > test > > > org.htmlunit > htmlunit-core-js > test > {code} > see: [https://www.htmlunit.org/migration.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43351) Support Golang in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-43351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43351. -- Assignee: BoYang Resolution: Fixed Fixed in https://github.com/apache/spark-connect-go/pull/6 > Support Golang in Spark Connect > --- > > Key: SPARK-43351 > URL: https://issues.apache.org/jira/browse/SPARK-43351 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: BoYang >Assignee: BoYang >Priority: Major > > Support Spark Connect client side in Go programming language -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43950) Upgrade kubernetes-client to 6.7.0
Bjørn Jørgensen created SPARK-43950: --- Summary: Upgrade kubernetes-client to 6.7.0 Key: SPARK-43950 URL: https://issues.apache.org/jira/browse/SPARK-43950 Project: Spark Issue Type: Dependency upgrade Components: Build, Kubernetes Affects Versions: 3.5.0 Reporter: Bjørn Jørgensen -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43949) Upgrade Cloudpickle to 2.2.1
[ https://issues.apache.org/jira/browse/SPARK-43949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43949. -- Fix Version/s: 3.4.1 3.5.0 Assignee: Hyukjin Kwon Resolution: Fixed Fixed in https://github.com/apache/spark/pull/41433 > Upgrade Cloudpickle to 2.2.1 > > > Key: SPARK-43949 > URL: https://issues.apache.org/jira/browse/SPARK-43949 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.2, 3.4.0, 3.5.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.1, 3.5.0 > > > Cloudpickle 2.2.1 has a fix for named tuple issue > (https://github.com/cloudpipe/cloudpickle/issues/460). PySpark relies on > namedtuple heavily especially for RDD. We should upgrade and fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43075) Change gRPC to grpcio when it is not installed.
[ https://issues.apache.org/jira/browse/SPARK-43075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728681#comment-17728681 ] GridGain Integration commented on SPARK-43075: -- User 'bjornjorgensen' has created a pull request for this issue: https://github.com/apache/spark/pull/40716 > Change gRPC to grpcio when it is not installed. > --- > > Key: SPARK-43075 > URL: https://issues.apache.org/jira/browse/SPARK-43075 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0, 3.5.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43063) `df.show` handle null should print NULL instead of null
[ https://issues.apache.org/jira/browse/SPARK-43063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728682#comment-17728682 ] GridGain Integration commented on SPARK-43063: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/41432 > `df.show` handle null should print NULL instead of null > --- > > Key: SPARK-43063 > URL: https://issues.apache.org/jira/browse/SPARK-43063 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: yikaifei >Assignee: yikaifei >Priority: Trivial > Fix For: 3.5.0 > > > `df.show` handle null should print NULL instead of null to consistent > behavior; > {code:java} > Like as the following behavior is currently inconsistent: > ``` shell > scala> spark.sql("select decode(6, 1, 'Southlake', 2, 'San Francisco', 3, > 'New Jersey', 4, 'Seattle') as result").show(false) > +--+ > |result| > +--+ > |null | > +--+ > ``` > ``` shell > spark-sql> DESC FUNCTION EXTENDED decode; > function_desc > Function: decode > Class: org.apache.spark.sql.catalyst.expressions.Decode > Usage: > decode(bin, charset) - Decodes the first argument using the second > argument character set. > decode(expr, search, result [, search, result ] ... [, default]) - > Compares expr > to each search value in order. If expr is equal to a search value, > decode returns > the corresponding result. If no match is found, then it returns > default. If default > is omitted, it returns null. > Extended Usage: > Examples: > > SELECT decode(encode('abc', 'utf-8'), 'utf-8'); >abc > > SELECT decode(2, 1, 'Southlake', 2, 'San Francisco', 3, 'New Jersey', > 4, 'Seattle', 'Non domestic'); >San Francisco > > SELECT decode(6, 1, 'Southlake', 2, 'San Francisco', 3, 'New Jersey', > 4, 'Seattle', 'Non domestic'); >Non domestic > > SELECT decode(6, 1, 'Southlake', 2, 'San Francisco', 3, 'New Jersey', > 4, 'Seattle'); >NULL > Since: 3.2.0 > Time taken: 0.074 seconds, Fetched 4 row(s) > ``` > ``` shell > spark-sql> select decode(6, 1, 'Southlake', 2, 'San Francisco', 3, 'New > Jersey', 4, 'Seattle'); > NULL > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43916) Add percentile like functions to Scala and Python API
[ https://issues.apache.org/jira/browse/SPARK-43916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-43916: --- Summary: Add percentile like functions to Scala and Python API (was: Add percentile* to Scala and Python API) > Add percentile like functions to Scala and Python API > - > > Key: SPARK-43916 > URL: https://issues.apache.org/jira/browse/SPARK-43916 > Project: Spark > Issue Type: Sub-task > Components: PySpark, R, SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43916) Add percentile* to Scala and Python API
[ https://issues.apache.org/jira/browse/SPARK-43916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-43916: --- Summary: Add percentile* to Scala and Python API (was: Add percentile to Scala and Python API) > Add percentile* to Scala and Python API > --- > > Key: SPARK-43916 > URL: https://issues.apache.org/jira/browse/SPARK-43916 > Project: Spark > Issue Type: Sub-task > Components: PySpark, R, SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43911) Use toSet to deduplicate the iterator data to prevent the creation of large Array
[ https://issues.apache.org/jira/browse/SPARK-43911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mcdull_zhang updated SPARK-43911: - Summary: Use toSet to deduplicate the iterator data to prevent the creation of large Array (was: Directly use Set to consume iterator data to deduplicate, thereby reducing memory usage) > Use toSet to deduplicate the iterator data to prevent the creation of large > Array > - > > Key: SPARK-43911 > URL: https://issues.apache.org/jira/browse/SPARK-43911 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: mcdull_zhang >Priority: Minor > > When SubqueryBroadcastExec reuses the keys of Broadcast HashedRelation for > dynamic partition pruning, it will put all the keys in an Array, and then > call the distinct of the Array to remove the duplicates. > In general, Broadcast HashedRelation may have many rows, and the repetition > rate of this key is high. Doing so will cause this Array to occupy a large > amount of memory (and this memory is not managed by MemoryManager), which may > trigger OOM. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43866) Partition filter condition should pushed down to metastore query if it is equivalence Predicate
[ https://issues.apache.org/jira/browse/SPARK-43866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728643#comment-17728643 ] ASF GitHub Bot commented on SPARK-43866: User 'ming95' has created a pull request for this issue: https://github.com/apache/spark/pull/41370 > Partition filter condition should pushed down to metastore query if it is > equivalence Predicate > --- > > Key: SPARK-43866 > URL: https://issues.apache.org/jira/browse/SPARK-43866 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: ming95 >Priority: Major > > Typically, hive partition fields are created as string types. > {code:java} > ``` > CREATE TABLE if not exists test_tb ( > id int > ) > PARTITIONED BY (dt string) > ```{code} > However, cast data conversions are often introduced inadvertently during use. > For example > {code:java} > ``` > select * from test_tb where dt=20230505; > ```{code} > it will prevent the condition `dt=20230505` from being pushed down into the > metastore , because `20230505` is an IntegralType. And resulting in a request > for all partitions. However, in the case of equivalent predicates, partition > field pushdown should be supported. > This can affect execution performance in cases where the data table has very > many partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43879) Decouple handle command and send response on server side
[ https://issues.apache.org/jira/browse/SPARK-43879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728642#comment-17728642 ] ASF GitHub Bot commented on SPARK-43879: User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/41379 > Decouple handle command and send response on server side > > > Key: SPARK-43879 > URL: https://issues.apache.org/jira/browse/SPARK-43879 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > > SparkConnectStreamHandler treat the request from connect client and send the > response back to connect client. SparkConnectStreamHandler hold a component > StreamObserver which is used to send response. > So I think we should keep the StreamObserver could be accessed only with > SparkConnectStreamHandler. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43943) Add math functions to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-43943: -- Description: Add following functions: * ceiling * e * pi * ln * negative * positive * power * sign * std * width_bucket to: * Scala API * Python API * Spark Connect Scala Client * Spark Connect Python Client was: Add following functions: * ceiling * e * pi * ln * mod * negative * positive * power * sign * std * width_bucket to: * Scala API * Python API * Spark Connect Scala Client * Spark Connect Python Client > Add math functions to Scala and Python > -- > > Key: SPARK-43943 > URL: https://issues.apache.org/jira/browse/SPARK-43943 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > > Add following functions: > * ceiling > * e > * pi > * ln > * negative > * positive > * power > * sign > * std > * width_bucket > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43911) Directly use Set to consume iterator data to deduplicate, thereby reducing memory usage
[ https://issues.apache.org/jira/browse/SPARK-43911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728634#comment-17728634 ] ASF GitHub Bot commented on SPARK-43911: User 'mcdull-zhang' has created a pull request for this issue: https://github.com/apache/spark/pull/41419 > Directly use Set to consume iterator data to deduplicate, thereby reducing > memory usage > --- > > Key: SPARK-43911 > URL: https://issues.apache.org/jira/browse/SPARK-43911 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: mcdull_zhang >Priority: Minor > > When SubqueryBroadcastExec reuses the keys of Broadcast HashedRelation for > dynamic partition pruning, it will put all the keys in an Array, and then > call the distinct of the Array to remove the duplicates. > In general, Broadcast HashedRelation may have many rows, and the repetition > rate of this key is high. Doing so will cause this Array to occupy a large > amount of memory (and this memory is not managed by MemoryManager), which may > trigger OOM. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43949) Upgrade Cloudpickle to 2.2.1
Hyukjin Kwon created SPARK-43949: Summary: Upgrade Cloudpickle to 2.2.1 Key: SPARK-43949 URL: https://issues.apache.org/jira/browse/SPARK-43949 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.4.0, 3.3.2, 3.5.0 Reporter: Hyukjin Kwon Cloudpickle 2.2.1 has a fix for named tuple issue (https://github.com/cloudpipe/cloudpickle/issues/460). PySpark relies on namedtuple heavily especially for RDD. We should upgrade and fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43948) Assign names to the error class _LEGACY_ERROR_TEMP_[0050|0058|0059|1204]
BingKun Pan created SPARK-43948: --- Summary: Assign names to the error class _LEGACY_ERROR_TEMP_[0050|0058|0059|1204] Key: SPARK-43948 URL: https://issues.apache.org/jira/browse/SPARK-43948 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43946) Add rule to remove unused CTEDef
[ https://issues.apache.org/jira/browse/SPARK-43946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728620#comment-17728620 ] jinhai-cloud commented on SPARK-43946: -- [~cloud_fan], Can you take a look at this issue for me? > Add rule to remove unused CTEDef > > > Key: SPARK-43946 > URL: https://issues.apache.org/jira/browse/SPARK-43946 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.4, 3.3.2, 3.4.0 >Reporter: jinhai-cloud >Priority: Major > > {code:java} > // code placeholder > with t1 as ( > select rand() c3 > ), > t2 as (select * from t1) > select c3 from t1 where c3 > 0 {code} > {code:java} > // code placeholder > === Applying Rule org.apache.spark.sql.catalyst.optimizer.InlineCTE === > WithCTE WithCTE > :- CTERelationDef 0, false :- CTERelationDef 0, > false > : +- Project [rand(3418873542988342437) AS c3#236] : +- Project > [rand(3418873542988342437) AS c3#236] > : +- OneRowRelation : +- OneRowRelation > !:- CTERelationDef 1, false +- Project [c3#236] > !: +- Project [c3#236] +- Filter (c3#236 > > cast(0 as double)) > !: +- CTERelationRef 0, true, [c3#236] +- > CTERelationRef 0, true, [c3#236] > !+- Project [c3#236] > ! +- Filter (c3#236 > cast(0 as double)) > ! +- CTERelationRef 0, true, [c3#236] > {code} > When the above query applies the inlineCTE rule, inline is not possible > because the refCount of CTERelationDef 0 is equal to 2. > However, according to the optimized logicalplan, the plan can be further > optimized because the refCount of CTERelationDef 0 is equal to 1. > Therefore, we can add the rule *RemoveUnusedCTEDef* to delete the > unreferenced CTERelationDef to prevent the refCount from being miscalculated > {code:java} > // code placeholder > Project [c3#236] > +- Filter (c3#236 > cast(0 as double)) > +- Project [rand(-7871530451581327544) AS c3#236] > +- OneRowRelation {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43946) Add rule to remove unused CTEDef
[ https://issues.apache.org/jira/browse/SPARK-43946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jinhai-cloud updated SPARK-43946: - Description: {code:java} // code placeholder with t1 as ( select rand() c3 ), t2 as (select * from t1) select c3 from t1 where c3 > 0 {code} {code:java} // code placeholder === Applying Rule org.apache.spark.sql.catalyst.optimizer.InlineCTE === WithCTE WithCTE :- CTERelationDef 0, false :- CTERelationDef 0, false : +- Project [rand(3418873542988342437) AS c3#236] : +- Project [rand(3418873542988342437) AS c3#236] : +- OneRowRelation : +- OneRowRelation !:- CTERelationDef 1, false +- Project [c3#236] !: +- Project [c3#236] +- Filter (c3#236 > cast(0 as double)) !: +- CTERelationRef 0, true, [c3#236] +- CTERelationRef 0, true, [c3#236] !+- Project [c3#236] ! +- Filter (c3#236 > cast(0 as double)) ! +- CTERelationRef 0, true, [c3#236] {code} When the above query applies the inlineCTE rule, inline is not possible because the refCount of CTERelationDef 0 is equal to 2. However, according to the optimized logicalplan, the plan can be further optimized because the refCount of CTERelationDef 0 is equal to 1. Therefore, we can add the rule *RemoveUnusedCTEDef* to delete the unreferenced CTERelationDef to prevent the refCount from being miscalculated {code:java} // code placeholder Project [c3#236] +- Filter (c3#236 > cast(0 as double)) +- Project [rand(-7871530451581327544) AS c3#236] +- OneRowRelation {code} was: {code:java} // code placeholder with t1 as ( select rand() c3 ), t2 as (select * from t1) select c3 from t1 where c3 > 0 {code} {code:java} // code placeholder === Applying Rule org.apache.spark.sql.catalyst.optimizer.InlineCTE === WithCTE WithCTE :- CTERelationDef 0, false :- CTERelationDef 0, false : +- Project [rand(3418873542988342437) AS c3#236] : +- Project [rand(3418873542988342437) AS c3#236] : +- OneRowRelation : +- OneRowRelation !:- CTERelationDef 1, false +- Project [c3#236] !: +- Project [c3#236] +- Filter (c3#236 > cast(0 as double)) !: +- CTERelationRef 0, true, [c3#236] +- CTERelationRef 0, true, [c3#236] !+- Project [c3#236] ! +- Filter (c3#236 > cast(0 as double)) ! +- CTERelationRef 0, true, [c3#236] {code} When the above query applies the inlineCTE rule, inline is not possible because the refCount of CTERelationDef 0 is equal to 2. However, according to the optimized logicalplan, the plan can be further optimized because the refCount of CTERelationDef 0 is equal to 1. Therefore, we can add the rule *RemoveRedundantCTEDef* to delete the unreferenced CTERelationDef to prevent the refCount from being miscalculated {code:java} // code placeholder Project [c3#236] +- Filter (c3#236 > cast(0 as double)) +- Project [rand(-7871530451581327544) AS c3#236] +- OneRowRelation {code} > Add rule to remove unused CTEDef > > > Key: SPARK-43946 > URL: https://issues.apache.org/jira/browse/SPARK-43946 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.4, 3.3.2, 3.4.0 >Reporter: jinhai-cloud >Priority: Major > > {code:java} > // code placeholder > with t1 as ( > select rand() c3 > ), > t2 as (select * from t1) > select c3 from t1 where c3 > 0 {code} > {code:java} > // code placeholder > === Applying Rule org.apache.spark.sql.catalyst.optimizer.InlineCTE === > WithCTE WithCTE > :- CTERelationDef 0, false :- CTERelationDef 0, > false > : +- Project [rand(3418873542988342437) AS c3#236] : +- Project > [rand(3418873542988342437) AS c3#236] > : +- OneRowRelation : +- OneRowRelation > !:- CTERelationDef 1, false +- Project [c3#236] > !: +- Project [c3#236] +- Filter (c3#236 > > cast(0 as double)) > !: +- CTERelationRef 0, true, [c3#236] +- > CTERelationRef 0, true, [c3#236] > !+- Project [c3#236] > ! +- Filter (c3#236 > cast(0 as double)) > ! +- CTERelationRef 0, true, [c3#236] > {code} > When the above query applies the inlineCTE rule, inline is not possible > because the refCount of CTERelationDef 0 is equal to 2
[jira] [Updated] (SPARK-43946) Add rule to remove unused CTEDef
[ https://issues.apache.org/jira/browse/SPARK-43946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jinhai-cloud updated SPARK-43946: - Description: {code:java} // code placeholder with t1 as ( select rand() c3 ), t2 as (select * from t1) select c3 from t1 where c3 > 0 {code} {code:java} // code placeholder === Applying Rule org.apache.spark.sql.catalyst.optimizer.InlineCTE === WithCTE WithCTE :- CTERelationDef 0, false :- CTERelationDef 0, false : +- Project [rand(3418873542988342437) AS c3#236] : +- Project [rand(3418873542988342437) AS c3#236] : +- OneRowRelation : +- OneRowRelation !:- CTERelationDef 1, false +- Project [c3#236] !: +- Project [c3#236] +- Filter (c3#236 > cast(0 as double)) !: +- CTERelationRef 0, true, [c3#236] +- CTERelationRef 0, true, [c3#236] !+- Project [c3#236] ! +- Filter (c3#236 > cast(0 as double)) ! +- CTERelationRef 0, true, [c3#236] {code} When the above query applies the inlineCTE rule, inline is not possible because the refCount of CTERelationDef 0 is equal to 2. However, according to the optimized logicalplan, the plan can be further optimized because the refCount of CTERelationDef 0 is equal to 1. Therefore, we can add the rule *RemoveRedundantCTEDef* to delete the unreferenced CTERelationDef to prevent the refCount from being miscalculated {code:java} // code placeholder Project [c3#236] +- Filter (c3#236 > cast(0 as double)) +- Project [rand(-7871530451581327544) AS c3#236] +- OneRowRelation {code} > Add rule to remove unused CTEDef > > > Key: SPARK-43946 > URL: https://issues.apache.org/jira/browse/SPARK-43946 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.4, 3.3.2, 3.4.0 >Reporter: jinhai-cloud >Priority: Major > > {code:java} > // code placeholder > with t1 as ( > select rand() c3 > ), > t2 as (select * from t1) > select c3 from t1 where c3 > 0 {code} > {code:java} > // code placeholder > === Applying Rule org.apache.spark.sql.catalyst.optimizer.InlineCTE === > WithCTE WithCTE > :- CTERelationDef 0, false :- CTERelationDef 0, > false > : +- Project [rand(3418873542988342437) AS c3#236] : +- Project > [rand(3418873542988342437) AS c3#236] > : +- OneRowRelation : +- OneRowRelation > !:- CTERelationDef 1, false +- Project [c3#236] > !: +- Project [c3#236] +- Filter (c3#236 > > cast(0 as double)) > !: +- CTERelationRef 0, true, [c3#236] +- > CTERelationRef 0, true, [c3#236] > !+- Project [c3#236] > ! +- Filter (c3#236 > cast(0 as double)) > ! +- CTERelationRef 0, true, [c3#236] > {code} > When the above query applies the inlineCTE rule, inline is not possible > because the refCount of CTERelationDef 0 is equal to 2. > However, according to the optimized logicalplan, the plan can be further > optimized because the refCount of CTERelationDef 0 is equal to 1. > Therefore, we can add the rule *RemoveRedundantCTEDef* to delete the > unreferenced CTERelationDef to prevent the refCount from being miscalculated > {code:java} > // code placeholder > Project [c3#236] > +- Filter (c3#236 > cast(0 as double)) > +- Project [rand(-7871530451581327544) AS c3#236] > +- OneRowRelation {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43947) Incorrect SparkException when missing config in resources in Stage-Level Scheduling
[ https://issues.apache.org/jira/browse/SPARK-43947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacek Laskowski updated SPARK-43947: Summary: Incorrect SparkException when missing config in resources in Stage-Level Scheduling (was: Incorrect SparkException when missing amount in resources in Stage-Level Scheduling) > Incorrect SparkException when missing config in resources in Stage-Level > Scheduling > --- > > Key: SPARK-43947 > URL: https://issues.apache.org/jira/browse/SPARK-43947 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 3.4.0 >Reporter: Jacek Laskowski >Priority: Minor > > [ResourceUtils.listResourceIds|https://github.com/apache/spark/blob/807abf9c53ee8c1c7ef69646ebd8a266f60d5580/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala#L152-L155] > can throw an exception for any missing config, not just `amount`. > {code:scala} > val index = key.indexOf('.') > if (index < 0) { > throw new SparkException(s"You must specify an amount config for > resource: $key " + > s"config: $componentName.$RESOURCE_PREFIX.$key") > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43947) Incorrect SparkException when missing amount in resources in Stage-Level Scheduling
Jacek Laskowski created SPARK-43947: --- Summary: Incorrect SparkException when missing amount in resources in Stage-Level Scheduling Key: SPARK-43947 URL: https://issues.apache.org/jira/browse/SPARK-43947 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 3.4.0 Reporter: Jacek Laskowski [ResourceUtils.listResourceIds|https://github.com/apache/spark/blob/807abf9c53ee8c1c7ef69646ebd8a266f60d5580/core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala#L152-L155] can throw an exception for any missing config, not just `amount`. {code:scala} val index = key.indexOf('.') if (index < 0) { throw new SparkException(s"You must specify an amount config for resource: $key " + s"config: $componentName.$RESOURCE_PREFIX.$key") } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43946) Add rule to remove unused CTEDef
jinhai-cloud created SPARK-43946: Summary: Add rule to remove unused CTEDef Key: SPARK-43946 URL: https://issues.apache.org/jira/browse/SPARK-43946 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.0, 3.3.2, 3.2.4 Reporter: jinhai-cloud -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43945) Fix bug for `SQLQueryTestSuite` when run on local env
[ https://issues.apache.org/jira/browse/SPARK-43945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728603#comment-17728603 ] Hudson commented on SPARK-43945: User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/41431 > Fix bug for `SQLQueryTestSuite` when run on local env > - > > Key: SPARK-43945 > URL: https://issues.apache.org/jira/browse/SPARK-43945 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43798) Initial support for Python UDTFs
[ https://issues.apache.org/jira/browse/SPARK-43798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728601#comment-17728601 ] Hudson commented on SPARK-43798: User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/41316 > Initial support for Python UDTFs > > > Key: SPARK-43798 > URL: https://issues.apache.org/jira/browse/SPARK-43798 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Allison Wang >Priority: Major > > Support Python user-defined table functions with batch eval. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43945) Fix bug for `SQLQueryTestSuite` when run on local env
BingKun Pan created SPARK-43945: --- Summary: Fix bug for `SQLQueryTestSuite` when run on local env Key: SPARK-43945 URL: https://issues.apache.org/jira/browse/SPARK-43945 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 3.5.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43916) Add percentile to Scala and Python API
[ https://issues.apache.org/jira/browse/SPARK-43916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-43916: --- Summary: Add percentile to Scala and Python API (was: Add percentile to Scala, Python and R API) > Add percentile to Scala and Python API > -- > > Key: SPARK-43916 > URL: https://issues.apache.org/jira/browse/SPARK-43916 > Project: Spark > Issue Type: Sub-task > Components: PySpark, R, SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43927) Add cast alias to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43927. --- Resolution: Not A Problem > Add cast alias to Scala and Python > -- > > Key: SPARK-43927 > URL: https://issues.apache.org/jira/browse/SPARK-43927 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > > add functions for: > * castAlias("boolean", BooleanType), > * castAlias("tinyint", ByteType), > * castAlias("smallint", ShortType), > * castAlias("int", IntegerType), > * castAlias("bigint", LongType), > * castAlias("float", FloatType), > * castAlias("double", DoubleType), > * castAlias("decimal", DecimalType.USER_DEFAULT), > * castAlias("date", DateType), > * castAlias("timestamp", TimestampType), > * castAlias("binary", BinaryType), > * castAlias("string", StringType), -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43907) Add SQL functions into Scala, Python and R API
[ https://issues.apache.org/jira/browse/SPARK-43907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728598#comment-17728598 ] jiaan.geng commented on SPARK-43907: [~gurwls223]Thank you for your feedback. > Add SQL functions into Scala, Python and R API > -- > > Key: SPARK-43907 > URL: https://issues.apache.org/jira/browse/SPARK-43907 > Project: Spark > Issue Type: Umbrella > Components: PySpark, SparkR, SQL >Affects Versions: 3.5.0 >Reporter: Hyukjin Kwon >Priority: Major > > See the discussion in dev mailing list > (https://lists.apache.org/thread/0tdcfyzxzcv8w46qbgwys2rormhdgyqg). > This is an umbrella JIRA to implement all SQL functions in Scala, Python and R -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org