[jira] [Created] (SPARK-47947) Add AssertDataFrameEquality util function for scala
Anh Tuan Pham created SPARK-47947: - Summary: Add AssertDataFrameEquality util function for scala Key: SPARK-47947 URL: https://issues.apache.org/jira/browse/SPARK-47947 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.5.2 Reporter: Anh Tuan Pham Fix For: 3.5.2 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47943) Add Operator CI Task for Java Build and Test
[ https://issues.apache.org/jira/browse/SPARK-47943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47943: - Assignee: Zhou JIANG > Add Operator CI Task for Java Build and Test > > > Key: SPARK-47943 > URL: https://issues.apache.org/jira/browse/SPARK-47943 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > > We need to add CI task to build and test Java code for upcoming operator pull > requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47943) Add Operator CI Task for Java Build and Test
[ https://issues.apache.org/jira/browse/SPARK-47943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47943. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 7 [https://github.com/apache/spark-kubernetes-operator/pull/7] > Add Operator CI Task for Java Build and Test > > > Key: SPARK-47943 > URL: https://issues.apache.org/jira/browse/SPARK-47943 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We need to add CI task to build and test Java code for upcoming operator pull > requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47929) Setup Static Analysis for Operator
[ https://issues.apache.org/jira/browse/SPARK-47929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47929. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 6 [https://github.com/apache/spark-kubernetes-operator/pull/6] > Setup Static Analysis for Operator > -- > > Key: SPARK-47929 > URL: https://issues.apache.org/jira/browse/SPARK-47929 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add common analysis tasks including checkstyle, spotbugs, jacoco. Also > include spotless for style fix. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47938) MsSQLServer: Cannot find data type BYTE error
[ https://issues.apache.org/jira/browse/SPARK-47938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47938. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46164 [https://github.com/apache/spark/pull/46164] > MsSQLServer: Cannot find data type BYTE error > - > > Key: SPARK-47938 > URL: https://issues.apache.org/jira/browse/SPARK-47938 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47600) MLLib: Migrate logInfo with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-47600. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46151 [https://github.com/apache/spark/pull/46151] > MLLib: Migrate logInfo with variables to structured logging framework > - > > Key: SPARK-47600 > URL: https://issues.apache.org/jira/browse/SPARK-47600 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47933) Parent Column class for Spark Connect and Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-47933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47933: Assignee: Hyukjin Kwon > Parent Column class for Spark Connect and Spark Classic > --- > > Key: SPARK-47933 > URL: https://issues.apache.org/jira/browse/SPARK-47933 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47933) Parent Column class for Spark Connect and Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-47933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47933. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46155 [https://github.com/apache/spark/pull/46155] > Parent Column class for Spark Connect and Spark Classic > --- > > Key: SPARK-47933 > URL: https://issues.apache.org/jira/browse/SPARK-47933 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47946) Nested field's nullable value could be invalid after extracted using GetStructField
Junyoung Cho created SPARK-47946: Summary: Nested field's nullable value could be invalid after extracted using GetStructField Key: SPARK-47946 URL: https://issues.apache.org/jira/browse/SPARK-47946 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 3.4.2 Reporter: Junyoung Cho I've got error when append to table using DataFrameWriterV2. The error was occured in TableOutputResolver.checkNullability. This error occurs when the data type of the schema is the same, but the order of the fields is different. I found that GetStructField.nullable returns unexpected result. {code:java} override def nullable: Boolean = child.nullable || childSchema(ordinal).nullable {code} Even if nested field has not nullability attribute, it returns true when parent struct has nullability attribute. ||Parent nullability||Child nullability||Result|| |true|true|true| |{color:#FF}true{color}|{color:#FF}false{color}|{color:#FF}true{color}| |{color:#FF}false{color}|{color:#FF}true{color}|{color:#FF}true{color}| |false|false|false| I think the logic should be changed to AND operation, because both of parent and child should be nullable to be considered nullable. I want to check current logic is reasonable, or my suggestion can occur other side effect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47899) StageFailed event should attach the exception chain
[ https://issues.apache.org/jira/browse/SPARK-47899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arjun Sahoo updated SPARK-47899: Affects Version/s: 3.5.1 > StageFailed event should attach the exception chain > --- > > Key: SPARK-47899 > URL: https://issues.apache.org/jira/browse/SPARK-47899 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Affects Versions: 3.4.0, 3.5.1 >Reporter: Arjun Sahoo >Assignee: BingKun Pan >Priority: Minor > > As part of SPARK-39195, task is marked as failed but the exception chain was > not sent, ultimately the cause becomes `null` in SparkException. It is not > convenient to find the root cause from the detailed message. > {code} > /** >* Called by the OutputCommitCoordinator to cancel stage due to data > duplication may happen. >*/ > private[scheduler] def stageFailed(stageId: Int, reason: String): Unit = { > eventProcessLoop.post(StageFailed(stageId, reason, None)) > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47943) Add Operator CI Task for Java Build and Test
[ https://issues.apache.org/jira/browse/SPARK-47943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47943: --- Labels: pull-request-available (was: ) > Add Operator CI Task for Java Build and Test > > > Key: SPARK-47943 > URL: https://issues.apache.org/jira/browse/SPARK-47943 > Project: Spark > Issue Type: Sub-task > Components: k8s >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Priority: Major > Labels: pull-request-available > > We need to add CI task to build and test Java code for upcoming operator pull > requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47945) MsSQLServer: Document Mapping Spark SQL Data Types from Microsoft SQL Server and add tests
[ https://issues.apache.org/jira/browse/SPARK-47945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47945: --- Labels: pull-request-available (was: ) > MsSQLServer: Document Mapping Spark SQL Data Types from Microsoft SQL Server > and add tests > -- > > Key: SPARK-47945 > URL: https://issues.apache.org/jira/browse/SPARK-47945 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47945) MsSQLServer: Document Mapping Spark SQL Data Types from Microsoft SQL Server and add tests
Kent Yao created SPARK-47945: Summary: MsSQLServer: Document Mapping Spark SQL Data Types from Microsoft SQL Server and add tests Key: SPARK-47945 URL: https://issues.apache.org/jira/browse/SPARK-47945 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47937) Fix docstring of `hll_sketch_agg`
[ https://issues.apache.org/jira/browse/SPARK-47937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47937. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46163 [https://github.com/apache/spark/pull/46163] > Fix docstring of `hll_sketch_agg` > - > > Key: SPARK-47937 > URL: https://issues.apache.org/jira/browse/SPARK-47937 > Project: Spark > Issue Type: Improvement > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47937) Fix docstring of `hll_sketch_agg`
[ https://issues.apache.org/jira/browse/SPARK-47937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47937: - Assignee: Ruifeng Zheng > Fix docstring of `hll_sketch_agg` > - > > Key: SPARK-47937 > URL: https://issues.apache.org/jira/browse/SPARK-47937 > Project: Spark > Issue Type: Improvement > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47903) Add remaining scalar types to the Python variant library
[ https://issues.apache.org/jira/browse/SPARK-47903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839897#comment-17839897 ] Harsh Motwani commented on SPARK-47903: --- Follow Up: Some changes from another branch were accidentally pushed during the late stages of this PR. Making another PR to resolve this problem. > Add remaining scalar types to the Python variant library > > > Key: SPARK-47903 > URL: https://issues.apache.org/jira/browse/SPARK-47903 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.0.0 >Reporter: Harsh Motwani >Assignee: Harsh Motwani >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Added support for reading the remaining scalar data types (binary, timestamp, > timestamp_ntz, date, float) to the Python Variant library. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47943) Add Operator CI Task for Java Build and Test
Zhou JIANG created SPARK-47943: -- Summary: Add Operator CI Task for Java Build and Test Key: SPARK-47943 URL: https://issues.apache.org/jira/browse/SPARK-47943 Project: Spark Issue Type: Sub-task Components: k8s Affects Versions: kubernetes-operator-0.1.0 Reporter: Zhou JIANG We need to add CI task to build and test Java code for upcoming operator pull requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47903) Add remaining scalar types to the Python variant library
[ https://issues.apache.org/jira/browse/SPARK-47903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47903. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46122 [https://github.com/apache/spark/pull/46122] > Add remaining scalar types to the Python variant library > > > Key: SPARK-47903 > URL: https://issues.apache.org/jira/browse/SPARK-47903 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.0.0 >Reporter: Harsh Motwani >Assignee: Harsh Motwani >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Added support for reading the remaining scalar data types (binary, timestamp, > timestamp_ntz, date, float) to the Python Variant library. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47903) Add remaining scalar types to the Python variant library
[ https://issues.apache.org/jira/browse/SPARK-47903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47903: Assignee: Harsh Motwani > Add remaining scalar types to the Python variant library > > > Key: SPARK-47903 > URL: https://issues.apache.org/jira/browse/SPARK-47903 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.0.0 >Reporter: Harsh Motwani >Assignee: Harsh Motwani >Priority: Major > Labels: pull-request-available > > Added support for reading the remaining scalar data types (binary, timestamp, > timestamp_ntz, date, float) to the Python Variant library. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47904) Preserve case in Avro schema when using enableStableIdentifiersForUnionType
[ https://issues.apache.org/jira/browse/SPARK-47904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47904. --- Fix Version/s: 3.5.2 Resolution: Fixed Issue resolved by pull request 46169 [https://github.com/apache/spark/pull/46169] > Preserve case in Avro schema when using enableStableIdentifiersForUnionType > --- > > Key: SPARK-47904 > URL: https://issues.apache.org/jira/browse/SPARK-47904 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Ivan Sadikov >Assignee: Ivan Sadikov >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2 > > > When enableStableIdentifiersForUnionType is enabled, all of the types are > lowercased which creates a problem when field types are case-sensitive: > {code:java} > Schema.createEnum("myENUM", "", null, List[String]("E1", "e2").asJava), > Schema.createRecord("myRecord2", "", null, false, List[Schema.Field](new > Schema.Field("F", Schema.create(Type.FLOAT))).asJava){code} > would become > {code:java} > struct> {code} > but instead should be > {code:java} > struct> {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47904) Preserve case in Avro schema when using enableStableIdentifiersForUnionType
[ https://issues.apache.org/jira/browse/SPARK-47904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47904: -- Fix Version/s: 4.0.0 > Preserve case in Avro schema when using enableStableIdentifiersForUnionType > --- > > Key: SPARK-47904 > URL: https://issues.apache.org/jira/browse/SPARK-47904 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Ivan Sadikov >Assignee: Ivan Sadikov >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > > When enableStableIdentifiersForUnionType is enabled, all of the types are > lowercased which creates a problem when field types are case-sensitive: > {code:java} > Schema.createEnum("myENUM", "", null, List[String]("E1", "e2").asJava), > Schema.createRecord("myRecord2", "", null, false, List[Schema.Field](new > Schema.Field("F", Schema.create(Type.FLOAT))).asJava){code} > would become > {code:java} > struct> {code} > but instead should be > {code:java} > struct> {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47904) Preserve case in Avro schema when using enableStableIdentifiersForUnionType
[ https://issues.apache.org/jira/browse/SPARK-47904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47904: - Assignee: Ivan Sadikov > Preserve case in Avro schema when using enableStableIdentifiersForUnionType > --- > > Key: SPARK-47904 > URL: https://issues.apache.org/jira/browse/SPARK-47904 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.2 >Reporter: Ivan Sadikov >Assignee: Ivan Sadikov >Priority: Major > Labels: pull-request-available > > When enableStableIdentifiersForUnionType is enabled, all of the types are > lowercased which creates a problem when field types are case-sensitive: > {code:java} > Schema.createEnum("myENUM", "", null, List[String]("E1", "e2").asJava), > Schema.createRecord("myRecord2", "", null, false, List[Schema.Field](new > Schema.Field("F", Schema.create(Type.FLOAT))).asJava){code} > would become > {code:java} > struct> {code} > but instead should be > {code:java} > struct> {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47942) Drop K8s v1.26 Support
[ https://issues.apache.org/jira/browse/SPARK-47942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47942. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46168 [https://github.com/apache/spark/pull/46168] > Drop K8s v1.26 Support > -- > > Key: SPARK-47942 > URL: https://issues.apache.org/jira/browse/SPARK-47942 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47942) Drop K8s v1.26 Support
[ https://issues.apache.org/jira/browse/SPARK-47942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47942: - Assignee: Dongjoon Hyun > Drop K8s v1.26 Support > -- > > Key: SPARK-47942 > URL: https://issues.apache.org/jira/browse/SPARK-47942 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47907) Put removal of '!' as a synonym for 'NOT' on a keyword level under a config
[ https://issues.apache.org/jira/browse/SPARK-47907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-47907: -- Assignee: Serge Rielau > Put removal of '!' as a synonym for 'NOT' on a keyword level under a config > --- > > Key: SPARK-47907 > URL: https://issues.apache.org/jira/browse/SPARK-47907 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Serge Rielau >Assignee: Serge Rielau >Priority: Major > Labels: pull-request-available > > Recently we dissolved the lexer equivalence between '!' and 'NOT'. > ! is a prefix operator and a synonym for NOT only in that case. > But NOT is used in many more cases in the grammar. > Given that there are a handful of known scenearios where users have exploited > the undocumented loophole it's best to add a config. > Usage found so far is: > `c1 ! IN(1, 2)` > `c1 ! BETWEEN 1 AND 2` > `c1 ! LIKE 'a%'` > But there are worse cases: > c1 IS ! NULL > CREATE TABLE T(c1 INT ! NULL) > or even > CREATE TABLE IF ! EXISTS T(c1 INT) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47907) Put removal of '!' as a synonym for 'NOT' on a keyword level under a config
[ https://issues.apache.org/jira/browse/SPARK-47907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-47907. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46138 [https://github.com/apache/spark/pull/46138] > Put removal of '!' as a synonym for 'NOT' on a keyword level under a config > --- > > Key: SPARK-47907 > URL: https://issues.apache.org/jira/browse/SPARK-47907 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Serge Rielau >Assignee: Serge Rielau >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Recently we dissolved the lexer equivalence between '!' and 'NOT'. > ! is a prefix operator and a synonym for NOT only in that case. > But NOT is used in many more cases in the grammar. > Given that there are a handful of known scenearios where users have exploited > the undocumented loophole it's best to add a config. > Usage found so far is: > `c1 ! IN(1, 2)` > `c1 ! BETWEEN 1 AND 2` > `c1 ! LIKE 'a%'` > But there are worse cases: > c1 IS ! NULL > CREATE TABLE T(c1 INT ! NULL) > or even > CREATE TABLE IF ! EXISTS T(c1 INT) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47942) Drop K8s v1.26 Support
[ https://issues.apache.org/jira/browse/SPARK-47942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47942: --- Labels: pull-request-available (was: ) > Drop K8s v1.26 Support > -- > > Key: SPARK-47942 > URL: https://issues.apache.org/jira/browse/SPARK-47942 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47942) Drop K8s v1.26 Support
Dongjoon Hyun created SPARK-47942: - Summary: Drop K8s v1.26 Support Key: SPARK-47942 URL: https://issues.apache.org/jira/browse/SPARK-47942 Project: Spark Issue Type: Sub-task Components: Kubernetes Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47940) Upgrade `guava` dependency to `33.1.0-jre` in Docker IT
[ https://issues.apache.org/jira/browse/SPARK-47940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47940: -- Reporter: Cheng Pan (was: Dongjoon Hyun) > Upgrade `guava` dependency to `33.1.0-jre` in Docker IT > --- > > Key: SPARK-47940 > URL: https://issues.apache.org/jira/browse/SPARK-47940 > Project: Spark > Issue Type: Sub-task > Components: Build, Tests >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47940) Upgrade `guava` dependency to `33.1.0-jre` in Docker IT
[ https://issues.apache.org/jira/browse/SPARK-47940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47940. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46167 [https://github.com/apache/spark/pull/46167] > Upgrade `guava` dependency to `33.1.0-jre` in Docker IT > --- > > Key: SPARK-47940 > URL: https://issues.apache.org/jira/browse/SPARK-47940 > Project: Spark > Issue Type: Sub-task > Components: Build, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Cheng Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47941) Propagate ForeachBatch worker initialization errors to users for PySpark
[ https://issues.apache.org/jira/browse/SPARK-47941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47941: --- Labels: pull-request-available (was: ) > Propagate ForeachBatch worker initialization errors to users for PySpark > > > Key: SPARK-47941 > URL: https://issues.apache.org/jira/browse/SPARK-47941 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Eric Marnadi >Priority: Major > Labels: pull-request-available > > Ensure that errors and exceptions thrown during foreachBatch worker > initialization are propagated to the user, instead of just stderr. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47941) Propagate ForeachBatch worker initialization errors to users for PySpark
[ https://issues.apache.org/jira/browse/SPARK-47941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Marnadi updated SPARK-47941: - Summary: Propagate ForeachBatch worker initialization errors to users for PySpark (was: Propagate ForeachBatch initialization errors to users) > Propagate ForeachBatch worker initialization errors to users for PySpark > > > Key: SPARK-47941 > URL: https://issues.apache.org/jira/browse/SPARK-47941 > Project: Spark > Issue Type: Improvement > Components: Connect, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Eric Marnadi >Priority: Major > > Ensure that errors and exceptions thrown during foreachBatch worker > initialization are propagated to the user, instead of just stderr. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47940) Upgrade `guava` dependency to `33.1.0-jre` in Docker IT
[ https://issues.apache.org/jira/browse/SPARK-47940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47940: --- Labels: pull-request-available (was: ) > Upgrade `guava` dependency to `33.1.0-jre` in Docker IT > --- > > Key: SPARK-47940 > URL: https://issues.apache.org/jira/browse/SPARK-47940 > Project: Spark > Issue Type: Sub-task > Components: Build, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47940) Upgrade `guava` dependency to `33.1.0-jre` in Docker IT
Dongjoon Hyun created SPARK-47940: - Summary: Upgrade `guava` dependency to `33.1.0-jre` in Docker IT Key: SPARK-47940 URL: https://issues.apache.org/jira/browse/SPARK-47940 Project: Spark Issue Type: Sub-task Components: Build, Tests Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47935) Pin pandas==2.0.3 for pypy3.8
[ https://issues.apache.org/jira/browse/SPARK-47935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47935: - Assignee: Ruifeng Zheng > Pin pandas==2.0.3 for pypy3.8 > - > > Key: SPARK-47935 > URL: https://issues.apache.org/jira/browse/SPARK-47935 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47935) Pin pandas==2.0.3 for pypy3.8
[ https://issues.apache.org/jira/browse/SPARK-47935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47935. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46159 [https://github.com/apache/spark/pull/46159] > Pin pandas==2.0.3 for pypy3.8 > - > > Key: SPARK-47935 > URL: https://issues.apache.org/jira/browse/SPARK-47935 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47010) Kubernetes: support csi driver for volume type
[ https://issues.apache.org/jira/browse/SPARK-47010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839707#comment-17839707 ] Oleg Frenkel commented on SPARK-47010: -- Posted question on Stackoverflow: https://stackoverflow.com/questions/78366961/apache-spark-supporting-csi-driver-for-volume-type > Kubernetes: support csi driver for volume type > -- > > Key: SPARK-47010 > URL: https://issues.apache.org/jira/browse/SPARK-47010 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Oleg Frenkel >Priority: Major > > Today Spark supports the following types of Kubernetes > [volumes|https://kubernetes.io/docs/concepts/storage/volumes/]: hostPath, > emptyDir, nfs and persistentVolumeClaim. > In our case, Kubernetes cluster is multi-tenant and we cannot make > cluster-wide changes when deploying our application to the Kubernetes > cluster. Our application requires static shared file system. So, we cannot > use hostPath (don't have control of hosting VMs) and persistentVolumeClaim > (requires cluster-wide change when deploying PV). Our security department > does not allow nfs. > What would help in our case, is the use of csi driver (taken from here: > https://github.com/kubernetes-sigs/azurefile-csi-driver/blob/master/deploy/example/e2e_usage.md#option3-inline-volume): > {code:java} > kind: Pod > apiVersion: v1 > metadata: > name: nginx-azurefile-inline-volume > spec: > nodeSelector: > "kubernetes.io/os": linux > containers: > - image: mcr.microsoft.com/oss/nginx/nginx:1.19.5 > name: nginx-azurefile > command: > - "/bin/bash" > - "-c" > - set -euo pipefail; while true; do echo $(date) >> > /mnt/azurefile/outfile; sleep 1; done > volumeMounts: > - name: persistent-storage > mountPath: "/mnt/azurefile" > readOnly: false > volumes: > - name: persistent-storage > csi: > driver: file.csi.azure.com > volumeAttributes: > shareName: EXISTING_SHARE_NAME # required > secretName: azure-secret # required > mountOptions: > "dir_mode=0777,file_mode=0777,cache=strict,actimeo=30,nosharesock" # > optional {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47869) Upgrade built in hive to Hive-4.0
[ https://issues.apache.org/jira/browse/SPARK-47869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839669#comment-17839669 ] Cheng Pan commented on SPARK-47869: --- cross link SPARK-44114 > Upgrade built in hive to Hive-4.0 > - > > Key: SPARK-47869 > URL: https://issues.apache.org/jira/browse/SPARK-47869 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 3.5.1 >Reporter: Simhadri Govindappa >Priority: Major > > Hive 4.0 has been released. It brings in a lot of new features, bug fixes and > performance improvements. > We would like to update the version of hive used in spark to hive-4.0 > [https://lists.apache.org/thread/2jqpvsx8n801zb5pmlhb8f4zloq27p82] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47939) Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER error
Vladimir Golubev created SPARK-47939: Summary: Parameterized queries fail for DESCRIBE & EXPLAIN w/ UNBOUND_SQL_PARAMETER error Key: SPARK-47939 URL: https://issues.apache.org/jira/browse/SPARK-47939 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Vladimir Golubev *Succeeds:* scala> spark.sql("select ?", Array(1)).show(); *Fails:* spark.sql("describe select ?", Array(1)).show(); *Fails:* spark.sql("explain select ?", Array(1)).show(); Failures are of the form: org.apache.spark.sql.catalyst.ExtendedAnalysisException: [UNBOUND_SQL_PARAMETER] Found the unbound parameter: _16. Please, fix `args` and provide a mapping of the parameter to either a SQL literal or collection constructor functions such as `map()`, `array()`, `struct()`. SQLSTATE: 42P02; line 1 pos 16; 'Project [unresolvedalias(posparameter(16))] +- OneRowRelation -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47411) StringInstr, FindInSet (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47411: --- Assignee: Milan Dankovic > StringInstr, FindInSet (all collations) > --- > > Key: SPARK-47411 > URL: https://issues.apache.org/jira/browse/SPARK-47411 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Milan Dankovic >Priority: Major > Labels: pull-request-available > > Enable collation support for the *StringInstr* and *FindInSet* built-in > string functions in Spark. First confirm what is the expected behaviour for > these functions when given collated strings, and then move on to > implementation and testing. One way to go about this is to consider using > {_}StringSearch{_}, an efficient ICU service for string matching. Implement > the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringInstr* and > *FindInSet* functions so that they support all collation types currently > supported in Spark. To understand what changes were introduced in order to > enable full collation support for other existing functions in Spark, take a > look at the Spark PRs and Jira tickets for completed tasks in this parent > (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47411) StringInstr, FindInSet (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47411. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45643 [https://github.com/apache/spark/pull/45643] > StringInstr, FindInSet (all collations) > --- > > Key: SPARK-47411 > URL: https://issues.apache.org/jira/browse/SPARK-47411 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Milan Dankovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *StringInstr* and *FindInSet* built-in > string functions in Spark. First confirm what is the expected behaviour for > these functions when given collated strings, and then move on to > implementation and testing. One way to go about this is to consider using > {_}StringSearch{_}, an efficient ICU service for string matching. Implement > the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests > (CollationSuite) to reflect how this function should be used with collation > in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *StringInstr* and > *FindInSet* functions so that they support all collation types currently > supported in Spark. To understand what changes were introduced in order to > enable full collation support for other existing functions in Spark, take a > look at the Spark PRs and Jira tickets for completed tasks in this parent > (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47938) MsSQLServer: Cannot find data type BYTE error
[ https://issues.apache.org/jira/browse/SPARK-47938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47938: --- Labels: pull-request-available (was: ) > MsSQLServer: Cannot find data type BYTE error > - > > Key: SPARK-47938 > URL: https://issues.apache.org/jira/browse/SPARK-47938 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47938) MsSQLServer: Cannot find data type BYTE error
Kent Yao created SPARK-47938: Summary: MsSQLServer: Cannot find data type BYTE error Key: SPARK-47938 URL: https://issues.apache.org/jira/browse/SPARK-47938 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47928) Speed up test "Add jar support Ivy URI in SQL"
[ https://issues.apache.org/jira/browse/SPARK-47928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-47928. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46150 [https://github.com/apache/spark/pull/46150] > Speed up test "Add jar support Ivy URI in SQL" > -- > > Key: SPARK-47928 > URL: https://issues.apache.org/jira/browse/SPARK-47928 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.2.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47928) Speed up test "Add jar support Ivy URI in SQL"
[ https://issues.apache.org/jira/browse/SPARK-47928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-47928: Assignee: Cheng Pan > Speed up test "Add jar support Ivy URI in SQL" > -- > > Key: SPARK-47928 > URL: https://issues.apache.org/jira/browse/SPARK-47928 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.2.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47351) StringToMap & Mask (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47351: - Summary: StringToMap & Mask (all collations) (was: StringToMap (all collations)) > StringToMap & Mask (all collations) > --- > > Key: SPARK-47351 > URL: https://issues.apache.org/jira/browse/SPARK-47351 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47421) TBD
[ https://issues.apache.org/jira/browse/SPARK-47421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47421: - Summary: TBD (was: Mask (all collations)) > TBD > --- > > Key: SPARK-47421 > URL: https://issues.apache.org/jira/browse/SPARK-47421 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47937) Fix docstring of `hll_sketch_agg`
[ https://issues.apache.org/jira/browse/SPARK-47937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47937: --- Labels: pull-request-available (was: ) > Fix docstring of `hll_sketch_agg` > - > > Key: SPARK-47937 > URL: https://issues.apache.org/jira/browse/SPARK-47937 > Project: Spark > Issue Type: Improvement > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47937) Fix docstring of `hll_sketch_agg`
Ruifeng Zheng created SPARK-47937: - Summary: Fix docstring of `hll_sketch_agg` Key: SPARK-47937 URL: https://issues.apache.org/jira/browse/SPARK-47937 Project: Spark Issue Type: Improvement Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47936) Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules
[ https://issues.apache.org/jira/browse/SPARK-47936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47936: -- Assignee: (was: Apache Spark) > Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules > -- > > Key: SPARK-47936 > URL: https://issues.apache.org/jira/browse/SPARK-47936 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47936) Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules
[ https://issues.apache.org/jira/browse/SPARK-47936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47936: -- Assignee: (was: Apache Spark) > Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules > -- > > Key: SPARK-47936 > URL: https://issues.apache.org/jira/browse/SPARK-47936 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47936) Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules
[ https://issues.apache.org/jira/browse/SPARK-47936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47936: --- Labels: pull-request-available (was: ) > Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules > -- > > Key: SPARK-47936 > URL: https://issues.apache.org/jira/browse/SPARK-47936 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47936) Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules
[ https://issues.apache.org/jira/browse/SPARK-47936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47936: -- Assignee: Apache Spark > Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules > -- > > Key: SPARK-47936 > URL: https://issues.apache.org/jira/browse/SPARK-47936 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47936) Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules
[ https://issues.apache.org/jira/browse/SPARK-47936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47936: -- Assignee: Apache Spark > Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules > -- > > Key: SPARK-47936 > URL: https://issues.apache.org/jira/browse/SPARK-47936 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47873) Write collated strings to hive as regular strings
[ https://issues.apache.org/jira/browse/SPARK-47873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47873: -- Assignee: (was: Apache Spark) > Write collated strings to hive as regular strings > - > > Key: SPARK-47873 > URL: https://issues.apache.org/jira/browse/SPARK-47873 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Priority: Major > Labels: pull-request-available > > As hive doesn't support collations we should write collated strings with a > regular string type but keep the collation in table metadata to properly read > them back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47873) Write collated strings to hive as regular strings
[ https://issues.apache.org/jira/browse/SPARK-47873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47873: -- Assignee: Apache Spark > Write collated strings to hive as regular strings > - > > Key: SPARK-47873 > URL: https://issues.apache.org/jira/browse/SPARK-47873 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > As hive doesn't support collations we should write collated strings with a > regular string type but keep the collation in table metadata to properly read > them back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47936) Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules
BingKun Pan created SPARK-47936: --- Summary: Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules Key: SPARK-47936 URL: https://issues.apache.org/jira/browse/SPARK-47936 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47351) StringToMap (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839550#comment-17839550 ] Uroš Bojanić commented on SPARK-47351: -- working on this > StringToMap (all collations) > > > Key: SPARK-47351 > URL: https://issues.apache.org/jira/browse/SPARK-47351 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47935) Pin pandas==2.0.3 for pypy3.8
[ https://issues.apache.org/jira/browse/SPARK-47935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47935: --- Labels: pull-request-available (was: ) > Pin pandas==2.0.3 for pypy3.8 > - > > Key: SPARK-47935 > URL: https://issues.apache.org/jira/browse/SPARK-47935 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47935) Pin pandas==2.0.3 for pypy3.8
Ruifeng Zheng created SPARK-47935: - Summary: Pin pandas==2.0.3 for pypy3.8 Key: SPARK-47935 URL: https://issues.apache.org/jira/browse/SPARK-47935 Project: Spark Issue Type: Improvement Components: Project Infra, PySpark Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47350) SplitPart (binary & lowercase collation only)
[ https://issues.apache.org/jira/browse/SPARK-47350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47350: --- Labels: pull-request-available (was: ) > SplitPart (binary & lowercase collation only) > - > > Key: SPARK-47350 > URL: https://issues.apache.org/jira/browse/SPARK-47350 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47900) Fix check for implicit collation
[ https://issues.apache.org/jira/browse/SPARK-47900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47900. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46116 [https://github.com/apache/spark/pull/46116] > Fix check for implicit collation > > > Key: SPARK-47900 > URL: https://issues.apache.org/jira/browse/SPARK-47900 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47930) Upgrade RoaringBitmap to 1.0.6
[ https://issues.apache.org/jira/browse/SPARK-47930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47930. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46152 [https://github.com/apache/spark/pull/46152] > Upgrade RoaringBitmap to 1.0.6 > -- > > Key: SPARK-47930 > URL: https://issues.apache.org/jira/browse/SPARK-47930 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47934) Inefficient Redirect Handling Due to Missing Trailing Slashes in URL Redirection
[ https://issues.apache.org/jira/browse/SPARK-47934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47934: --- Labels: pull-request-available (was: ) > Inefficient Redirect Handling Due to Missing Trailing Slashes in URL > Redirection > > > Key: SPARK-47934 > URL: https://issues.apache.org/jira/browse/SPARK-47934 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.4, 3.3.2, 3.5.1, 3.4.3 >Reporter: huangzhir >Priority: Trivial > Labels: pull-request-available > Attachments: image-2024-04-22-15-14-13-468.png > > > *Summary:* > The current implementation of URL redirection in Spark's history web UI does > not consistently add trailing slashes to URLs when constructing redirection > targets. This inconsistency leads to additional HTTP redirects by Jetty, > which increases the load time and reduces the efficiency of the Spark UI. > *Problem Description:* > When constructing redirect URLs, particularly in scenarios where an attempt > ID needs to be appended, the system does not ensure that the base URL ends > with a slash. This omission results in the generated URL being redirected by > Jetty to add a trailing slash, thus causing an unnecessary additional HTTP > redirect. > For example, when the `shouldAppendAttemptId` flag is true, the URL is formed > without a trailing slash before the attempt ID is appended, leading to two > redirects: one by our logic to add the attempt ID, and another by Jetty to > correct the missing slash. > !image-2024-04-22-15-14-13-468.png! > *Proposed Solution:* > [https://github.com/apache/spark/blob/2d0b56c3eac611e743c41d16ea8e439bc8a504e4/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala#L118] > Ensure that all redirect URLs uniformly end with a trailing slash regardless > of whether an attempt ID is appended. This can be achieved by modifying the > URL construction logic as follows: > ```scala > val redirect = if (shouldAppendAttemptId) > { req.getRequestURI.stripSuffix("/") + "/" + attemptId.get + "/" } > else > { req.getRequestURI.stripSuffix("/") + "/" } > ``` > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47890) Add python and scala dataframe variant expression aliases.
[ https://issues.apache.org/jira/browse/SPARK-47890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47890. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46123 [https://github.com/apache/spark/pull/46123] > Add python and scala dataframe variant expression aliases. > -- > > Key: SPARK-47890 > URL: https://issues.apache.org/jira/browse/SPARK-47890 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47413) Substring, Right, Left (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47413: --- Assignee: Gideon P > Substring, Right, Left (all collations) > --- > > Key: SPARK-47413 > URL: https://issues.apache.org/jira/browse/SPARK-47413 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Gideon P >Priority: Major > Labels: pull-request-available > > Enable collation support for the *Substring* built-in string function in > Spark (including *Right* and *Left* functions). First confirm what is the > expected behaviour for these functions when given collated strings, then move > on to the implementation that would enable handling strings of all collation > types. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the {*}Substring{*}, > {*}Right{*}, and *Left* functions so that they support all collation types > currently supported in Spark. To understand what changes were introduced in > order to enable full collation support for other existing functions in Spark, > take a look at the Spark PRs and Jira tickets for completed tasks in this > parent (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47413) Substring, Right, Left (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47413. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46040 [https://github.com/apache/spark/pull/46040] > Substring, Right, Left (all collations) > --- > > Key: SPARK-47413 > URL: https://issues.apache.org/jira/browse/SPARK-47413 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Gideon P >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for the *Substring* built-in string function in > Spark (including *Right* and *Left* functions). First confirm what is the > expected behaviour for these functions when given collated strings, then move > on to the implementation that would enable handling strings of all collation > types. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the {*}Substring{*}, > {*}Right{*}, and *Left* functions so that they support all collation types > currently supported in Spark. To understand what changes were introduced in > order to enable full collation support for other existing functions in Spark, > take a look at the Spark PRs and Jira tickets for completed tasks in this > parent (for example: Contains, StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47934) Inefficient Redirect Handling Due to Missing Trailing Slashes in URL Redirection
[ https://issues.apache.org/jira/browse/SPARK-47934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huangzhir updated SPARK-47934: -- Attachment: image-2024-04-22-15-14-13-468.png > Inefficient Redirect Handling Due to Missing Trailing Slashes in URL > Redirection > > > Key: SPARK-47934 > URL: https://issues.apache.org/jira/browse/SPARK-47934 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.4, 3.3.2, 3.5.1, 3.4.3 >Reporter: huangzhir >Priority: Trivial > Attachments: image-2024-04-22-15-14-13-468.png > > > *{*}Summary:{*}* > The current implementation of URL redirection in Spark's history web UI does > not consistently add trailing slashes to URLs when constructing redirection > targets. This inconsistency leads to additional HTTP redirects by Jetty, > which increases the load time and reduces the efficiency of the Spark UI. > *{*}Problem Description:{*}* > When constructing redirect URLs, particularly in scenarios where an attempt > ID needs to be appended, the system does not ensure that the base URL ends > with a slash. This omission results in the generated URL being redirected by > Jetty to add a trailing slash, thus causing an unnecessary additional HTTP > redirect. > For example, when the `shouldAppendAttemptId` flag is true, the URL is formed > without a trailing slash before the attempt ID is appended, leading to two > redirects: one by our logic to add the attempt ID, and another by Jetty to > correct the missing slash. > !image-2024-04-22-15-06-29-357.png! > *{*}Proposed Solution:{*}* > [https://github.com/apache/spark/blob/2d0b56c3eac611e743c41d16ea8e439bc8a504e4/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala#L118] > Ensure that all redirect URLs uniformly end with a trailing slash regardless > of whether an attempt ID is appended. This can be achieved by modifying the > URL construction logic as follows: > ```scala > val redirect = if (shouldAppendAttemptId) { > req.getRequestURI.stripSuffix("/") + "/" + attemptId.get + "/" > } else { > req.getRequestURI.stripSuffix("/") + "/" > } > ``` > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47934) Inefficient Redirect Handling Due to Missing Trailing Slashes in URL Redirection
[ https://issues.apache.org/jira/browse/SPARK-47934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huangzhir updated SPARK-47934: -- Description: *Summary:* The current implementation of URL redirection in Spark's history web UI does not consistently add trailing slashes to URLs when constructing redirection targets. This inconsistency leads to additional HTTP redirects by Jetty, which increases the load time and reduces the efficiency of the Spark UI. *Problem Description:* When constructing redirect URLs, particularly in scenarios where an attempt ID needs to be appended, the system does not ensure that the base URL ends with a slash. This omission results in the generated URL being redirected by Jetty to add a trailing slash, thus causing an unnecessary additional HTTP redirect. For example, when the `shouldAppendAttemptId` flag is true, the URL is formed without a trailing slash before the attempt ID is appended, leading to two redirects: one by our logic to add the attempt ID, and another by Jetty to correct the missing slash. !image-2024-04-22-15-14-13-468.png! *Proposed Solution:* [https://github.com/apache/spark/blob/2d0b56c3eac611e743c41d16ea8e439bc8a504e4/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala#L118] Ensure that all redirect URLs uniformly end with a trailing slash regardless of whether an attempt ID is appended. This can be achieved by modifying the URL construction logic as follows: ```scala val redirect = if (shouldAppendAttemptId) { req.getRequestURI.stripSuffix("/") + "/" + attemptId.get + "/" } else { req.getRequestURI.stripSuffix("/") + "/" } ``` was: {*}{{*}}Summary:{{*}}{*} The current implementation of URL redirection in Spark's history web UI does not consistently add trailing slashes to URLs when constructing redirection targets. This inconsistency leads to additional HTTP redirects by Jetty, which increases the load time and reduces the efficiency of the Spark UI. {*}{{*}}Problem Description:{{*}}{*} When constructing redirect URLs, particularly in scenarios where an attempt ID needs to be appended, the system does not ensure that the base URL ends with a slash. This omission results in the generated URL being redirected by Jetty to add a trailing slash, thus causing an unnecessary additional HTTP redirect. For example, when the `shouldAppendAttemptId` flag is true, the URL is formed without a trailing slash before the attempt ID is appended, leading to two redirects: one by our logic to add the attempt ID, and another by Jetty to correct the missing slash. !image-2024-04-22-15-14-13-468.png! {*}{{*}}Proposed Solution:{{*}}{*} [https://github.com/apache/spark/blob/2d0b56c3eac611e743c41d16ea8e439bc8a504e4/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala#L118] Ensure that all redirect URLs uniformly end with a trailing slash regardless of whether an attempt ID is appended. This can be achieved by modifying the URL construction logic as follows: ```scala val redirect = if (shouldAppendAttemptId) { req.getRequestURI.stripSuffix("/") + "/" + attemptId.get + "/" } else { req.getRequestURI.stripSuffix("/") + "/" } ``` > Inefficient Redirect Handling Due to Missing Trailing Slashes in URL > Redirection > > > Key: SPARK-47934 > URL: https://issues.apache.org/jira/browse/SPARK-47934 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.4, 3.3.2, 3.5.1, 3.4.3 >Reporter: huangzhir >Priority: Trivial > Attachments: image-2024-04-22-15-14-13-468.png > > > *Summary:* > The current implementation of URL redirection in Spark's history web UI does > not consistently add trailing slashes to URLs when constructing redirection > targets. This inconsistency leads to additional HTTP redirects by Jetty, > which increases the load time and reduces the efficiency of the Spark UI. > *Problem Description:* > When constructing redirect URLs, particularly in scenarios where an attempt > ID needs to be appended, the system does not ensure that the base URL ends > with a slash. This omission results in the generated URL being redirected by > Jetty to add a trailing slash, thus causing an unnecessary additional HTTP > redirect. > For example, when the `shouldAppendAttemptId` flag is true, the URL is formed > without a trailing slash before the attempt ID is appended, leading to two > redirects: one by our logic to add the attempt ID, and another by Jetty to > correct the missing slash. > !image-2024-04-22-15-14-13-468.png! > *Proposed Solution:* > [https://github.com/apache/spark/blob/2d0b56c3eac611e743c41d16ea8e439bc8a504e4/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala#L118] > Ensure that all
[jira] [Updated] (SPARK-47934) Inefficient Redirect Handling Due to Missing Trailing Slashes in URL Redirection
[ https://issues.apache.org/jira/browse/SPARK-47934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huangzhir updated SPARK-47934: -- Description: {*}{{*}}Summary:{{*}}{*} The current implementation of URL redirection in Spark's history web UI does not consistently add trailing slashes to URLs when constructing redirection targets. This inconsistency leads to additional HTTP redirects by Jetty, which increases the load time and reduces the efficiency of the Spark UI. {*}{{*}}Problem Description:{{*}}{*} When constructing redirect URLs, particularly in scenarios where an attempt ID needs to be appended, the system does not ensure that the base URL ends with a slash. This omission results in the generated URL being redirected by Jetty to add a trailing slash, thus causing an unnecessary additional HTTP redirect. For example, when the `shouldAppendAttemptId` flag is true, the URL is formed without a trailing slash before the attempt ID is appended, leading to two redirects: one by our logic to add the attempt ID, and another by Jetty to correct the missing slash. !image-2024-04-22-15-14-13-468.png! {*}{{*}}Proposed Solution:{{*}}{*} [https://github.com/apache/spark/blob/2d0b56c3eac611e743c41d16ea8e439bc8a504e4/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala#L118] Ensure that all redirect URLs uniformly end with a trailing slash regardless of whether an attempt ID is appended. This can be achieved by modifying the URL construction logic as follows: ```scala val redirect = if (shouldAppendAttemptId) { req.getRequestURI.stripSuffix("/") + "/" + attemptId.get + "/" } else { req.getRequestURI.stripSuffix("/") + "/" } ``` was: *{*}Summary:{*}* The current implementation of URL redirection in Spark's history web UI does not consistently add trailing slashes to URLs when constructing redirection targets. This inconsistency leads to additional HTTP redirects by Jetty, which increases the load time and reduces the efficiency of the Spark UI. *{*}Problem Description:{*}* When constructing redirect URLs, particularly in scenarios where an attempt ID needs to be appended, the system does not ensure that the base URL ends with a slash. This omission results in the generated URL being redirected by Jetty to add a trailing slash, thus causing an unnecessary additional HTTP redirect. For example, when the `shouldAppendAttemptId` flag is true, the URL is formed without a trailing slash before the attempt ID is appended, leading to two redirects: one by our logic to add the attempt ID, and another by Jetty to correct the missing slash. !image-2024-04-22-15-06-29-357.png! *{*}Proposed Solution:{*}* [https://github.com/apache/spark/blob/2d0b56c3eac611e743c41d16ea8e439bc8a504e4/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala#L118] Ensure that all redirect URLs uniformly end with a trailing slash regardless of whether an attempt ID is appended. This can be achieved by modifying the URL construction logic as follows: ```scala val redirect = if (shouldAppendAttemptId) { req.getRequestURI.stripSuffix("/") + "/" + attemptId.get + "/" } else { req.getRequestURI.stripSuffix("/") + "/" } ``` > Inefficient Redirect Handling Due to Missing Trailing Slashes in URL > Redirection > > > Key: SPARK-47934 > URL: https://issues.apache.org/jira/browse/SPARK-47934 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.4, 3.3.2, 3.5.1, 3.4.3 >Reporter: huangzhir >Priority: Trivial > Attachments: image-2024-04-22-15-14-13-468.png > > > {*}{{*}}Summary:{{*}}{*} > The current implementation of URL redirection in Spark's history web UI does > not consistently add trailing slashes to URLs when constructing redirection > targets. This inconsistency leads to additional HTTP redirects by Jetty, > which increases the load time and reduces the efficiency of the Spark UI. > {*}{{*}}Problem Description:{{*}}{*} > When constructing redirect URLs, particularly in scenarios where an attempt > ID needs to be appended, the system does not ensure that the base URL ends > with a slash. This omission results in the generated URL being redirected by > Jetty to add a trailing slash, thus causing an unnecessary additional HTTP > redirect. > For example, when the `shouldAppendAttemptId` flag is true, the URL is formed > without a trailing slash before the attempt ID is appended, leading to two > redirects: one by our logic to add the attempt ID, and another by Jetty to > correct the missing slash. > !image-2024-04-22-15-14-13-468.png! > {*}{{*}}Proposed Solution:{{*}}{*} >
[jira] [Created] (SPARK-47934) Inefficient Redirect Handling Due to Missing Trailing Slashes in URL Redirection
huangzhir created SPARK-47934: - Summary: Inefficient Redirect Handling Due to Missing Trailing Slashes in URL Redirection Key: SPARK-47934 URL: https://issues.apache.org/jira/browse/SPARK-47934 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.3, 3.5.1, 3.3.2, 3.2.4 Reporter: huangzhir *{*}Summary:{*}* The current implementation of URL redirection in Spark's history web UI does not consistently add trailing slashes to URLs when constructing redirection targets. This inconsistency leads to additional HTTP redirects by Jetty, which increases the load time and reduces the efficiency of the Spark UI. *{*}Problem Description:{*}* When constructing redirect URLs, particularly in scenarios where an attempt ID needs to be appended, the system does not ensure that the base URL ends with a slash. This omission results in the generated URL being redirected by Jetty to add a trailing slash, thus causing an unnecessary additional HTTP redirect. For example, when the `shouldAppendAttemptId` flag is true, the URL is formed without a trailing slash before the attempt ID is appended, leading to two redirects: one by our logic to add the attempt ID, and another by Jetty to correct the missing slash. !image-2024-04-22-15-06-29-357.png! *{*}Proposed Solution:{*}* [https://github.com/apache/spark/blob/2d0b56c3eac611e743c41d16ea8e439bc8a504e4/core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala#L118] Ensure that all redirect URLs uniformly end with a trailing slash regardless of whether an attempt ID is appended. This can be achieved by modifying the URL construction logic as follows: ```scala val redirect = if (shouldAppendAttemptId) { req.getRequestURI.stripSuffix("/") + "/" + attemptId.get + "/" } else { req.getRequestURI.stripSuffix("/") + "/" } ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47927) Nullability after join not respected in UDF
[ https://issues.apache.org/jira/browse/SPARK-47927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47927: --- Labels: pull-request-available (was: ) > Nullability after join not respected in UDF > --- > > Key: SPARK-47927 > URL: https://issues.apache.org/jira/browse/SPARK-47927 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: Emil Ejbyfeldt >Priority: Major > Labels: pull-request-available > > {code:java} > val ds1 = Seq(1).toDS() > val ds2 = Seq[Int]().toDS() > val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity) > ds1.join(ds2, ds1("value") === ds2("value"), > "outer").select(f(struct(ds1("value"), ds2("value".show() > ds1.join(ds2, ds1("value") === ds2("value"), > "outer").select(struct(ds1("value"), ds2("value"))).show() {code} > outputs > {code:java} > +---+ > |UDF(struct(value, value, value, value))| > +---+ > | {1, 0}| > +---+ > ++ > |struct(value, value)| > ++ > | {1, NULL}| > ++ {code} > So when the result is passed to UDF the null-ability after the the join is > not respected and we incorrectly end up with a 0 value instead of a null/None > value. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47932) Avoid using legacy commons-lang
[ https://issues.apache.org/jira/browse/SPARK-47932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-47932. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46154 [https://github.com/apache/spark/pull/46154] > Avoid using legacy commons-lang > --- > > Key: SPARK-47932 > URL: https://issues.apache.org/jira/browse/SPARK-47932 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines
[ https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated SPARK-47773: --- Description: SPIP doc: [https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] Refined SPIP doc: [https://docs.google.com/document/d/1oY26KtqXoJJNHbAhtmVgaXSVt6NlO6t1iWYEuGvCc1s/edit#heading=h.ud7930xhlsm6] This [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] outlines the integration of Gluten's physical plan conversion, validation, and fallback framework into Apache Spark. The goal is to enhance Spark's flexibility and robustness in executing physical plans and to leverage Gluten's performance optimizations. Currently, Spark lacks an official cross-platform execution support for physical plans. Gluten's mechanism, which employs the Substrait standard, can convert and optimize Spark's physical plans, thus improving portability, interoperability, and execution efficiency. The design proposal advocates for the incorporation of the TransformSupport interface and its specialized variants—LeafTransformSupport, UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in streamlining the conversion of different operator types into a Substrait-based common format. The validation phase entails a thorough assessment of the Substrait plan against native backends to ensure compatibility. In instances where validation does not succeed, Spark's native operators will be deployed, with requisite transformations to adapt data formats accordingly. The proposal emphasizes the centrality of the plan transformation phase, positing it as the foundational step. The subsequent validation and fallback procedures are slated for consideration upon the successful establishment of the initial phase. The integration of Gluten into Spark has already shown significant performance improvements with ClickHouse and Velox backends and has been successfully deployed in production by several customers. was: SPIP doc: https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing This [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] outlines the integration of Gluten's physical plan conversion, validation, and fallback framework into Apache Spark. The goal is to enhance Spark's flexibility and robustness in executing physical plans and to leverage Gluten's performance optimizations. Currently, Spark lacks an official cross-platform execution support for physical plans. Gluten's mechanism, which employs the Substrait standard, can convert and optimize Spark's physical plans, thus improving portability, interoperability, and execution efficiency. The design proposal advocates for the incorporation of the TransformSupport interface and its specialized variants—LeafTransformSupport, UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in streamlining the conversion of different operator types into a Substrait-based common format. The validation phase entails a thorough assessment of the Substrait plan against native backends to ensure compatibility. In instances where validation does not succeed, Spark's native operators will be deployed, with requisite transformations to adapt data formats accordingly. The proposal emphasizes the centrality of the plan transformation phase, positing it as the foundational step. The subsequent validation and fallback procedures are slated for consideration upon the successful establishment of the initial phase. The integration of Gluten into Spark has already shown significant performance improvements with ClickHouse and Velox backends and has been successfully deployed in production by several customers. > Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on > Various Native Engines > > > Key: SPARK-47773 > URL: https://issues.apache.org/jira/browse/SPARK-47773 > Project: Spark > Issue Type: Epic > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ke Jia >Priority: Major > > SPIP doc: > [https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] > > Refined SPIP doc: > [https://docs.google.com/document/d/1oY26KtqXoJJNHbAhtmVgaXSVt6NlO6t1iWYEuGvCc1s/edit#heading=h.ud7930xhlsm6] > > This > [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] > outlines the integration of Gluten's physical plan conversion, validation, > and fallback framework into Apache Spark. The goal is to enhance Spark's > flexibility and robustness in executing physical
[jira] [Comment Edited] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines
[ https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839497#comment-17839497 ] Ke Jia edited comment on SPARK-47773 at 4/22/24 6:22 AM: - We have refined the above SPIP in accordance with the specifications from the Spark community. The latest version of the SPIP is now available [here|https://docs.google.com/document/d/1oY26KtqXoJJNHbAhtmVgaXSVt6NlO6t1iWYEuGvCc1s/edit?usp=sharing]. Welcome and value your suggestions and comments. was (Author: jk_self): We have refined the above [SPIP |[https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]]in accordance with the specifications from the Spark community. The latest version of the SPIP is now available [here|https://docs.google.com/document/d/1oY26KtqXoJJNHbAhtmVgaXSVt6NlO6t1iWYEuGvCc1s/edit?usp=sharing]. Welcome and value your suggestions and comments. > Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on > Various Native Engines > > > Key: SPARK-47773 > URL: https://issues.apache.org/jira/browse/SPARK-47773 > Project: Spark > Issue Type: Epic > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ke Jia >Priority: Major > > SPIP doc: > https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing > This > [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] > outlines the integration of Gluten's physical plan conversion, validation, > and fallback framework into Apache Spark. The goal is to enhance Spark's > flexibility and robustness in executing physical plans and to leverage > Gluten's performance optimizations. Currently, Spark lacks an official > cross-platform execution support for physical plans. Gluten's mechanism, > which employs the Substrait standard, can convert and optimize Spark's > physical plans, thus improving portability, interoperability, and execution > efficiency. > The design proposal advocates for the incorporation of the TransformSupport > interface and its specialized variants—LeafTransformSupport, > UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in > streamlining the conversion of different operator types into a > Substrait-based common format. The validation phase entails a thorough > assessment of the Substrait plan against native backends to ensure > compatibility. In instances where validation does not succeed, Spark's native > operators will be deployed, with requisite transformations to adapt data > formats accordingly. The proposal emphasizes the centrality of the plan > transformation phase, positing it as the foundational step. The subsequent > validation and fallback procedures are slated for consideration upon the > successful establishment of the initial phase. > The integration of Gluten into Spark has already shown significant > performance improvements with ClickHouse and Velox backends and has been > successfully deployed in production by several customers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines
[ https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839497#comment-17839497 ] Ke Jia edited comment on SPARK-47773 at 4/22/24 6:22 AM: - We have refined the above [SPIP |[https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]]in accordance with the specifications from the Spark community. The latest version of the SPIP is now available [here|https://docs.google.com/document/d/1oY26KtqXoJJNHbAhtmVgaXSVt6NlO6t1iWYEuGvCc1s/edit?usp=sharing]. Welcome and value your suggestions and comments. was (Author: jk_self): We have refined the above [SPIP|[https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]] in accordance with the specifications from the Spark community. The latest version of the SPIP is now available [here|https://docs.google.com/document/d/1oY26KtqXoJJNHbAhtmVgaXSVt6NlO6t1iWYEuGvCc1s/edit?usp=sharing]. Welcome and value your suggestions and comments. > Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on > Various Native Engines > > > Key: SPARK-47773 > URL: https://issues.apache.org/jira/browse/SPARK-47773 > Project: Spark > Issue Type: Epic > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ke Jia >Priority: Major > > SPIP doc: > https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing > This > [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] > outlines the integration of Gluten's physical plan conversion, validation, > and fallback framework into Apache Spark. The goal is to enhance Spark's > flexibility and robustness in executing physical plans and to leverage > Gluten's performance optimizations. Currently, Spark lacks an official > cross-platform execution support for physical plans. Gluten's mechanism, > which employs the Substrait standard, can convert and optimize Spark's > physical plans, thus improving portability, interoperability, and execution > efficiency. > The design proposal advocates for the incorporation of the TransformSupport > interface and its specialized variants—LeafTransformSupport, > UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in > streamlining the conversion of different operator types into a > Substrait-based common format. The validation phase entails a thorough > assessment of the Substrait plan against native backends to ensure > compatibility. In instances where validation does not succeed, Spark's native > operators will be deployed, with requisite transformations to adapt data > formats accordingly. The proposal emphasizes the centrality of the plan > transformation phase, positing it as the foundational step. The subsequent > validation and fallback procedures are slated for consideration upon the > successful establishment of the initial phase. > The integration of Gluten into Spark has already shown significant > performance improvements with ClickHouse and Velox backends and has been > successfully deployed in production by several customers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines
[ https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839497#comment-17839497 ] Ke Jia commented on SPARK-47773: We have refined the above [SPIP|[https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]] in accordance with the specifications from the Spark community. The latest version of the SPIP is now available [here|https://docs.google.com/document/d/1oY26KtqXoJJNHbAhtmVgaXSVt6NlO6t1iWYEuGvCc1s/edit?usp=sharing]. Welcome and value your suggestions and comments. > Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on > Various Native Engines > > > Key: SPARK-47773 > URL: https://issues.apache.org/jira/browse/SPARK-47773 > Project: Spark > Issue Type: Epic > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ke Jia >Priority: Major > > SPIP doc: > https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing > This > [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing] > outlines the integration of Gluten's physical plan conversion, validation, > and fallback framework into Apache Spark. The goal is to enhance Spark's > flexibility and robustness in executing physical plans and to leverage > Gluten's performance optimizations. Currently, Spark lacks an official > cross-platform execution support for physical plans. Gluten's mechanism, > which employs the Substrait standard, can convert and optimize Spark's > physical plans, thus improving portability, interoperability, and execution > efficiency. > The design proposal advocates for the incorporation of the TransformSupport > interface and its specialized variants—LeafTransformSupport, > UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in > streamlining the conversion of different operator types into a > Substrait-based common format. The validation phase entails a thorough > assessment of the Substrait plan against native backends to ensure > compatibility. In instances where validation does not succeed, Spark's native > operators will be deployed, with requisite transformations to adapt data > formats accordingly. The proposal emphasizes the centrality of the plan > transformation phase, positing it as the foundational step. The subsequent > validation and fallback procedures are slated for consideration upon the > successful establishment of the initial phase. > The integration of Gluten into Spark has already shown significant > performance improvements with ClickHouse and Velox backends and has been > successfully deployed in production by several customers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org