[jira] [Updated] (SPARK-48267) Regression e2e test with SPARK-47305
[ https://issues.apache.org/jira/browse/SPARK-48267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-48267: - Fix Version/s: 3.5.2 > Regression e2e test with SPARK-47305 > > > Key: SPARK-48267 > URL: https://issues.apache.org/jira/browse/SPARK-48267 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > > In SPARK-47305, we fixed a bug in QO rule and added a test to demonstrate the > issue and verify the fix works, but the scope of test was just to bugfix > itself because we had no clear idea of (simpler) reproducer for e2e example. > We finally came up with simple reproducer which is e2e streaming query. We'd > like to put this reproducer into test as regression test. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48267) Regression e2e test with SPARK-47305
[ https://issues.apache.org/jira/browse/SPARK-48267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-48267. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46569 [https://github.com/apache/spark/pull/46569] > Regression e2e test with SPARK-47305 > > > Key: SPARK-48267 > URL: https://issues.apache.org/jira/browse/SPARK-48267 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > In SPARK-47305, we fixed a bug in QO rule and added a test to demonstrate the > issue and verify the fix works, but the scope of test was just to bugfix > itself because we had no clear idea of (simpler) reproducer for e2e example. > We finally came up with simple reproducer which is e2e streaming query. We'd > like to put this reproducer into test as regression test. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48267) Regression e2e test with SPARK-47305
[ https://issues.apache.org/jira/browse/SPARK-48267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-48267: Assignee: Jungtaek Lim > Regression e2e test with SPARK-47305 > > > Key: SPARK-48267 > URL: https://issues.apache.org/jira/browse/SPARK-48267 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Labels: pull-request-available > > In SPARK-47305, we fixed a bug in QO rule and added a test to demonstrate the > issue and verify the fix works, but the scope of test was just to bugfix > itself because we had no clear idea of (simpler) reproducer for e2e example. > We finally came up with simple reproducer which is e2e streaming query. We'd > like to put this reproducer into test as regression test. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48157) CSV expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48157. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46504 [https://github.com/apache/spark/pull/46504] > CSV expressions (all collations) > > > Key: SPARK-48157 > URL: https://issues.apache.org/jira/browse/SPARK-48157 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Enable collation support for *CSV* built-in string functions in Spark > ({*}CsvToStructs{*}, {*}SchemaOfCsv{*}, {*}StructsToCsv{*}). First confirm > what is the expected behaviour for these functions when given collated > strings, and then move on to implementation and testing. You will find these > expressions in the *csvExpressions.scala* file, and they should mostly be > pass-through functions. Implement the corresponding E2E SQL tests > (CollationSQLExpressionsSuite) to reflect how this function should be used > with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor > to experiment with the existing functions to learn more about how they work. > In addition, look into the possible use-cases and implementation of similar > functions within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *CSV* expressions so that > they support all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Ascii, Chr, Base64, > UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48157) CSV expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48157: --- Assignee: Uroš Bojanić > CSV expressions (all collations) > > > Key: SPARK-48157 > URL: https://issues.apache.org/jira/browse/SPARK-48157 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Enable collation support for *CSV* built-in string functions in Spark > ({*}CsvToStructs{*}, {*}SchemaOfCsv{*}, {*}StructsToCsv{*}). First confirm > what is the expected behaviour for these functions when given collated > strings, and then move on to implementation and testing. You will find these > expressions in the *csvExpressions.scala* file, and they should mostly be > pass-through functions. Implement the corresponding E2E SQL tests > (CollationSQLExpressionsSuite) to reflect how this function should be used > with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor > to experiment with the existing functions to learn more about how they work. > In addition, look into the possible use-cases and implementation of similar > functions within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *CSV* expressions so that > they support all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Ascii, Chr, Base64, > UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48157) CSV expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48157: --- Labels: pull-request-available (was: ) > CSV expressions (all collations) > > > Key: SPARK-48157 > URL: https://issues.apache.org/jira/browse/SPARK-48157 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Enable collation support for *CSV* built-in string functions in Spark > ({*}CsvToStructs{*}, {*}SchemaOfCsv{*}, {*}StructsToCsv{*}). First confirm > what is the expected behaviour for these functions when given collated > strings, and then move on to implementation and testing. You will find these > expressions in the *csvExpressions.scala* file, and they should mostly be > pass-through functions. Implement the corresponding E2E SQL tests > (CollationSQLExpressionsSuite) to reflect how this function should be used > with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor > to experiment with the existing functions to learn more about how they work. > In addition, look into the possible use-cases and implementation of similar > functions within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *CSV* expressions so that > they support all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Ascii, Chr, Base64, > UnBase64, Decode, StringDecode, Encode, ToBinary, FormatNumber, Sentences). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48229) inputFile expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48229. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46503 [https://github.com/apache/spark/pull/46503] > inputFile expressions (all collations) > -- > > Key: SPARK-48229 > URL: https://issues.apache.org/jira/browse/SPARK-48229 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48265) Infer window group limit batch should do constant folding
[ https://issues.apache.org/jira/browse/SPARK-48265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48265: --- Assignee: angerszhu > Infer window group limit batch should do constant folding > - > > Key: SPARK-48265 > URL: https://issues.apache.org/jira/browse/SPARK-48265 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > > {code:java} > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: > === Result of Batch LocalRelation === > GlobalLimit 21 > GlobalLimit 21 > +- LocalLimit 21 > +- LocalLimit 21 > ! +- Union false, false > +- > LocalLimit 21 > ! :- LocalLimit 21 > +- > Project [item_id#647L] > ! : +- Project [item_id#647L] > +- > Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND > (grass_region#735 = BR)) AND isnotnull(grass_region#735)) > ! : +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) > AND (grass_region#735 = BR)) AND isnotnull(grass_region#735)) > +- Relation db.table[,... 91 more fields] parquet > ! : +- Relation db.table[,... 91 more fields] parquet > ! +- LocalLimit 21 > ! +- Project [item_id#738L] > ! +- LocalRelation , [, ... 91 more fields] > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Check Cartesian > Products has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch RewriteSubquery has no > effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch > NormalizeFloatingNumbers has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch > ReplaceUpdateFieldsExpression has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Optimize Metadata Only > Query has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch PartitionPruning has > no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch InjectRuntimeFilter > has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Pushdown Filters from > PartitionPruning has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Cleanup filters that > cannot be pushed down has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Extract Python UDFs > has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: > === Applying Rule org.apache.spark.sql.catalyst.optimizer.EliminateLimits === > GlobalLimit 21 > GlobalLimit 21 > !+- LocalLimit 21 > +- LocalLimit > least(, ... 2 more fields) > ! +- LocalLimit 21 > +- Project > [item_id#647L] > ! +- Project [item_id#647L] > +- Filter > (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = > BR)) AND isnotnull(grass_region#735)) > ! +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND > (grass_region#735 = BR)) AND isnotnull(grass_region#735)) +- > Relation db.table[,... 91 more fields] parquet > ! +- Relation db.table[,... 91 more fields] parquet > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48265) Infer window group limit batch should do constant folding
[ https://issues.apache.org/jira/browse/SPARK-48265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48265. - Fix Version/s: 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46568 [https://github.com/apache/spark/pull/46568] > Infer window group limit batch should do constant folding > - > > Key: SPARK-48265 > URL: https://issues.apache.org/jira/browse/SPARK-48265 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > > {code:java} > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: > === Result of Batch LocalRelation === > GlobalLimit 21 > GlobalLimit 21 > +- LocalLimit 21 > +- LocalLimit 21 > ! +- Union false, false > +- > LocalLimit 21 > ! :- LocalLimit 21 > +- > Project [item_id#647L] > ! : +- Project [item_id#647L] > +- > Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND > (grass_region#735 = BR)) AND isnotnull(grass_region#735)) > ! : +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) > AND (grass_region#735 = BR)) AND isnotnull(grass_region#735)) > +- Relation db.table[,... 91 more fields] parquet > ! : +- Relation db.table[,... 91 more fields] parquet > ! +- LocalLimit 21 > ! +- Project [item_id#738L] > ! +- LocalRelation , [, ... 91 more fields] > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Check Cartesian > Products has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch RewriteSubquery has no > effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch > NormalizeFloatingNumbers has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch > ReplaceUpdateFieldsExpression has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Optimize Metadata Only > Query has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch PartitionPruning has > no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch InjectRuntimeFilter > has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Pushdown Filters from > PartitionPruning has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Cleanup filters that > cannot be pushed down has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Extract Python UDFs > has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: > === Applying Rule org.apache.spark.sql.catalyst.optimizer.EliminateLimits === > GlobalLimit 21 > GlobalLimit 21 > !+- LocalLimit 21 > +- LocalLimit > least(, ... 2 more fields) > ! +- LocalLimit 21 > +- Project > [item_id#647L] > ! +- Project [item_id#647L] > +- Filter > (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = > BR)) AND isnotnull(grass_region#735)) > ! +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND > (grass_region#735 = BR)) AND isnotnull(grass_region#735)) +- > Relation db.table[,... 91 more fields] parquet > ! +- Relation db.table[,... 91 more fields] parquet > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48266) Move o.a.spark.sql.connect.dsl to test dir
[ https://issues.apache.org/jira/browse/SPARK-48266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48266. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46567 [https://github.com/apache/spark/pull/46567] > Move o.a.spark.sql.connect.dsl to test dir > -- > > Key: SPARK-48266 > URL: https://issues.apache.org/jira/browse/SPARK-48266 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48268) Add a configuration for SparkContext.setCheckpointDir
[ https://issues.apache.org/jira/browse/SPARK-48268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48268: --- Labels: pull-request-available (was: ) > Add a configuration for SparkContext.setCheckpointDir > - > > Key: SPARK-48268 > URL: https://issues.apache.org/jira/browse/SPARK-48268 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > Would be great to have it -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48241) CSV parsing failure with char/varchar type columns
[ https://issues.apache.org/jira/browse/SPARK-48241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48241. - Fix Version/s: 3.5.2 Resolution: Fixed Issue resolved by pull request 46565 [https://github.com/apache/spark/pull/46565] > CSV parsing failure with char/varchar type columns > -- > > Key: SPARK-48241 > URL: https://issues.apache.org/jira/browse/SPARK-48241 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.1 >Reporter: Jiayi Liu >Assignee: Jiayi Liu >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > > CSV table containing char and varchar columns will result in the following > error when selecting from the CSV table: > {code:java} > java.lang.IllegalArgumentException: requirement failed: requiredSchema > (struct) should be the subset of dataSchema > (struct). > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.sql.catalyst.csv.UnivocityParser.(UnivocityParser.scala:56) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125){code} > The reason for the error is that the StringType columns in the dataSchema and > requiredSchema of UnivocityParser are not consistent. It is due to the > metadata contained in the StringType StructField of the dataSchema, which is > missing in the requiredSchema. We need to retain the metadata when resolving > schema. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48241) CSV parsing failure with char/varchar type columns
[ https://issues.apache.org/jira/browse/SPARK-48241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48241: --- Assignee: Jiayi Liu > CSV parsing failure with char/varchar type columns > -- > > Key: SPARK-48241 > URL: https://issues.apache.org/jira/browse/SPARK-48241 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.1 >Reporter: Jiayi Liu >Assignee: Jiayi Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > CSV table containing char and varchar columns will result in the following > error when selecting from the CSV table: > {code:java} > java.lang.IllegalArgumentException: requirement failed: requiredSchema > (struct) should be the subset of dataSchema > (struct). > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.sql.catalyst.csv.UnivocityParser.(UnivocityParser.scala:56) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125){code} > The reason for the error is that the StringType columns in the dataSchema and > requiredSchema of UnivocityParser are not consistent. It is due to the > metadata contained in the StringType StructField of the dataSchema, which is > missing in the requiredSchema. We need to retain the metadata when resolving > schema. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48268) Add a configuration for SparkContext.setCheckpointDir
Hyukjin Kwon created SPARK-48268: Summary: Add a configuration for SparkContext.setCheckpointDir Key: SPARK-48268 URL: https://issues.apache.org/jira/browse/SPARK-48268 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Would be great to have it -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48258) Implement DataFrame.checkpoint and DataFrame.localCheckpoint
[ https://issues.apache.org/jira/browse/SPARK-48258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48258: --- Labels: pull-request-available (was: ) > Implement DataFrame.checkpoint and DataFrame.localCheckpoint > > > Key: SPARK-48258 > URL: https://issues.apache.org/jira/browse/SPARK-48258 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > We should add DataFrame.checkpoint and DataFrame.localCheckpoint for feature > parity. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48261) RoundRobin based coalesce in spark
[ https://issues.apache.org/jira/browse/SPARK-48261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subham Singhal resolved SPARK-48261. Resolution: Abandoned > RoundRobin based coalesce in spark > -- > > Key: SPARK-48261 > URL: https://issues.apache.org/jira/browse/SPARK-48261 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.5.1 >Reporter: Subham Singhal >Priority: Minor > > Currently default coalsce does not take partition size into account and > simply merges partitions. This often results in non-uniform data > distribution. There have been proposal for size based > coalesce([https://github.com/apache/spark/pull/27248).|https://github.com/apache/spark/pull/27248),] > I am proposing a custom roundrobin coalesce which will distribute data evenly > across partitions within same executor. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48209) Common (java side): Migrate `error/warn/info` with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-48209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-48209: -- Assignee: BingKun Pan > Common (java side): Migrate `error/warn/info` with variables to structured > logging framework > > > Key: SPARK-48209 > URL: https://issues.apache.org/jira/browse/SPARK-48209 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Critical > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48209) Common (java side): Migrate `error/warn/info` with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-48209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-48209. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46493 [https://github.com/apache/spark/pull/46493] > Common (java side): Migrate `error/warn/info` with variables to structured > logging framework > > > Key: SPARK-48209 > URL: https://issues.apache.org/jira/browse/SPARK-48209 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48267) Regression e2e test with SPARK-47305
[ https://issues.apache.org/jira/browse/SPARK-48267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48267: --- Labels: pull-request-available (was: ) > Regression e2e test with SPARK-47305 > > > Key: SPARK-48267 > URL: https://issues.apache.org/jira/browse/SPARK-48267 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Jungtaek Lim >Priority: Major > Labels: pull-request-available > > In SPARK-47305, we fixed a bug in QO rule and added a test to demonstrate the > issue and verify the fix works, but the scope of test was just to bugfix > itself because we had no clear idea of (simpler) reproducer for e2e example. > We finally came up with simple reproducer which is e2e streaming query. We'd > like to put this reproducer into test as regression test. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48267) Regression e2e test with SPARK-47305
Jungtaek Lim created SPARK-48267: Summary: Regression e2e test with SPARK-47305 Key: SPARK-48267 URL: https://issues.apache.org/jira/browse/SPARK-48267 Project: Spark Issue Type: Test Components: Structured Streaming Affects Versions: 4.0.0 Reporter: Jungtaek Lim In SPARK-47305, we fixed a bug in QO rule and added a test to demonstrate the issue and verify the fix works, but the scope of test was just to bugfix itself because we had no clear idea of (simpler) reproducer for e2e example. We finally came up with simple reproducer which is e2e streaming query. We'd like to put this reproducer into test as regression test. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48265) Infer window group limit batch should do constant folding
[ https://issues.apache.org/jira/browse/SPARK-48265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48265: --- Labels: pull-request-available (was: ) > Infer window group limit batch should do constant folding > - > > Key: SPARK-48265 > URL: https://issues.apache.org/jira/browse/SPARK-48265 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: angerszhu >Priority: Major > Labels: pull-request-available > > {code:java} > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: > === Result of Batch LocalRelation === > GlobalLimit 21 > GlobalLimit 21 > +- LocalLimit 21 > +- LocalLimit 21 > ! +- Union false, false > +- > LocalLimit 21 > ! :- LocalLimit 21 > +- > Project [item_id#647L] > ! : +- Project [item_id#647L] > +- > Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND > (grass_region#735 = BR)) AND isnotnull(grass_region#735)) > ! : +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) > AND (grass_region#735 = BR)) AND isnotnull(grass_region#735)) > +- Relation db.table[,... 91 more fields] parquet > ! : +- Relation db.table[,... 91 more fields] parquet > ! +- LocalLimit 21 > ! +- Project [item_id#738L] > ! +- LocalRelation , [, ... 91 more fields] > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Check Cartesian > Products has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch RewriteSubquery has no > effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch > NormalizeFloatingNumbers has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch > ReplaceUpdateFieldsExpression has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Optimize Metadata Only > Query has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch PartitionPruning has > no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch InjectRuntimeFilter > has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Pushdown Filters from > PartitionPruning has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Cleanup filters that > cannot be pushed down has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Extract Python UDFs > has no effect. > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: > === Applying Rule org.apache.spark.sql.catalyst.optimizer.EliminateLimits === > GlobalLimit 21 > GlobalLimit 21 > !+- LocalLimit 21 > +- LocalLimit > least(, ... 2 more fields) > ! +- LocalLimit 21 > +- Project > [item_id#647L] > ! +- Project [item_id#647L] > +- Filter > (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = > BR)) AND isnotnull(grass_region#735)) > ! +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND > (grass_region#735 = BR)) AND isnotnull(grass_region#735)) +- > Relation db.table[,... 91 more fields] parquet > ! +- Relation db.table[,... 91 more fields] parquet > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48265) Infer window group limit batch should do constant folding
[ https://issues.apache.org/jira/browse/SPARK-48265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-48265: -- Description: {code:java} 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: === Result of Batch LocalRelation === GlobalLimit 21 GlobalLimit 21 +- LocalLimit 21 +- LocalLimit 21 ! +- Union false, false +- LocalLimit 21 ! :- LocalLimit 21 +- Project [item_id#647L] ! : +- Project [item_id#647L] +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = BR)) AND isnotnull(grass_region#735)) ! : +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = BR)) AND isnotnull(grass_region#735)) +- Relation db.table[,... 91 more fields] parquet ! : +- Relation db.table[,... 91 more fields] parquet ! +- LocalLimit 21 ! +- Project [item_id#738L] ! +- LocalRelation , [, ... 91 more fields] 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Check Cartesian Products has no effect. 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch RewriteSubquery has no effect. 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch NormalizeFloatingNumbers has no effect. 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch ReplaceUpdateFieldsExpression has no effect. 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Optimize Metadata Only Query has no effect. 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch PartitionPruning has no effect. 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch InjectRuntimeFilter has no effect. 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Pushdown Filters from PartitionPruning has no effect. 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Cleanup filters that cannot be pushed down has no effect. 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: Batch Extract Python UDFs has no effect. 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: === Applying Rule org.apache.spark.sql.catalyst.optimizer.EliminateLimits === GlobalLimit 21 GlobalLimit 21 !+- LocalLimit 21 +- LocalLimit least(, ... 2 more fields) ! +- LocalLimit 21 +- Project [item_id#647L] ! +- Project [item_id#647L] +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = BR)) AND isnotnull(grass_region#735)) ! +- Filter (((isnotnull(tz_type#734) AND (tz_type#734 = local)) AND (grass_region#735 = BR)) AND isnotnull(grass_region#735)) +- Relation db.table[,... 91 more fields] parquet ! +- Relation db.table[,... 91 more fields] parquet {code} > Infer window group limit batch should do constant folding > - > > Key: SPARK-48265 > URL: https://issues.apache.org/jira/browse/SPARK-48265 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: angerszhu >Priority: Major > > {code:java} > 24/05/13 17:39:25 ERROR [main] PlanChangeLogger: > === Result of Batch LocalRelation === > GlobalLimit 21 > GlobalLimit 21 > +- LocalLimit 21 > +- LocalLimit 21 > ! +- Union false, false > +- > LocalLimit 21 > ! :- LocalLimit 21 > +- > Project [item_id#647L] > ! : +- Project [item_id#647L]
[jira] [Created] (SPARK-48265) Infer window group limit batch should do constant folding
angerszhu created SPARK-48265: - Summary: Infer window group limit batch should do constant folding Key: SPARK-48265 URL: https://issues.apache.org/jira/browse/SPARK-48265 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.1, 4.0.0 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48259) Add 3 missing methods in dsl
[ https://issues.apache.org/jira/browse/SPARK-48259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-48259. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46559 [https://github.com/apache/spark/pull/46559] > Add 3 missing methods in dsl > > > Key: SPARK-48259 > URL: https://issues.apache.org/jira/browse/SPARK-48259 > Project: Spark > Issue Type: Test > Components: Connect, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48264) Upgrade `datasketches-java` to 6.0.0
[ https://issues.apache.org/jira/browse/SPARK-48264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48264: --- Labels: pull-request-available (was: ) > Upgrade `datasketches-java` to 6.0.0 > > > Key: SPARK-48264 > URL: https://issues.apache.org/jira/browse/SPARK-48264 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48260) disable output committer coordination in one test of ParquetIOSuite
[ https://issues.apache.org/jira/browse/SPARK-48260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-48260: -- Assignee: Gengliang Wang > disable output committer coordination in one test of ParquetIOSuite > --- > > Key: SPARK-48260 > URL: https://issues.apache.org/jira/browse/SPARK-48260 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48260) disable output committer coordination in one test of ParquetIOSuite
[ https://issues.apache.org/jira/browse/SPARK-48260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-48260. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46562 [https://github.com/apache/spark/pull/46562] > disable output committer coordination in one test of ParquetIOSuite > --- > > Key: SPARK-48260 > URL: https://issues.apache.org/jira/browse/SPARK-48260 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44953) Log a warning (or automatically disable) when shuffle tracking is enabled along side another DA supported mechanism
[ https://issues.apache.org/jira/browse/SPARK-44953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau reassigned SPARK-44953: Assignee: binjie yang > Log a warning (or automatically disable) when shuffle tracking is enabled > along side another DA supported mechanism > --- > > Key: SPARK-44953 > URL: https://issues.apache.org/jira/browse/SPARK-44953 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Holden Karau >Assignee: binjie yang >Priority: Major > Labels: pull-request-available > > Some people enable both shuffle tracking and another mechanism (like > migration) and then are confused when their jobs don't scale down. > > We should at least log a warning here (or automatically disable shuffle > tracking?) when it is configured alongside another DA supported mechanism. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44953) Log a warning (or automatically disable) when shuffle tracking is enabled along side another DA supported mechanism
[ https://issues.apache.org/jira/browse/SPARK-44953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau resolved SPARK-44953. -- Resolution: Fixed > Log a warning (or automatically disable) when shuffle tracking is enabled > along side another DA supported mechanism > --- > > Key: SPARK-44953 > URL: https://issues.apache.org/jira/browse/SPARK-44953 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Holden Karau >Assignee: binjie yang >Priority: Major > Labels: pull-request-available > > Some people enable both shuffle tracking and another mechanism (like > migration) and then are confused when their jobs don't scale down. > > We should at least log a warning here (or automatically disable shuffle > tracking?) when it is configured alongside another DA supported mechanism. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48233) Tests for non-stateful streaming with collations
[ https://issues.apache.org/jira/browse/SPARK-48233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48233: --- Labels: pull-request-available (was: ) > Tests for non-stateful streaming with collations > > > Key: SPARK-48233 > URL: https://issues.apache.org/jira/browse/SPARK-48233 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33752) Avoid the getSimpleMessage of AnalysisException adds semicolon repeatedly
[ https://issues.apache.org/jira/browse/SPARK-33752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-33752: --- Labels: pull-request-available (was: ) > Avoid the getSimpleMessage of AnalysisException adds semicolon repeatedly > - > > Key: SPARK-33752 > URL: https://issues.apache.org/jira/browse/SPARK-33752 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > Fix For: 3.1.0 > > > The current getSimpleMessage of AnalysisException may adds semicolon > repeatedly. There show an example below: > {code:java} > select decode() > {code} > The output will be: > {code:java} > org.apache.spark.sql.AnalysisException > Invalid number of arguments for function decode. Expected: 2; Found: 0;; line > 1 pos 7 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48263) Collate expression not working when default collation set
[ https://issues.apache.org/jira/browse/SPARK-48263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846035#comment-17846035 ] Nebojsa Savic commented on SPARK-48263: --- Working on it. > Collate expression not working when default collation set > - > > Key: SPARK-48263 > URL: https://issues.apache.org/jira/browse/SPARK-48263 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Nebojsa Savic >Priority: Major > > When default collation level config is set to some collation other than > UTF8_BINARY (i.e. UTF8_BINARY_LCASE) and when we try to execute COLLATE (or > collation) expression, this will fail because it is only accepting > StringType(0) as argument for collation name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47008) Spark to support S3 Express One Zone Storage
[ https://issues.apache.org/jira/browse/SPARK-47008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846014#comment-17846014 ] Steve Loughran commented on SPARK-47008: yes, that looks like it. real PITA this feature, though apparently its there to let you know that you have outstanding uploads to purge -no lifecycle rules, see. FWIW you can explictly create the real situation with a touch command under a __magic path: {code} hadoop fs -touch s3a://stevel--usw1-az2--x-s3/cli/__magic/__base/d/file.txt {code} this creates an incomplete upload under /cli//d/file.txt > Spark to support S3 Express One Zone Storage > > > Key: SPARK-47008 > URL: https://issues.apache.org/jira/browse/SPARK-47008 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Steve Loughran >Priority: Major > > Hadoop 3.4.0 adds support for AWS S3 Express One Zone Storage. > Most of this is transparent. However, one aspect which can surface as an > issue is that these stores report prefixes in a listing when there are > pending uploads, *even when there are no files underneath* > This leads to a situation where a listStatus of a path returns a list of file > status entries which appears to contain one or more directories -but a > listStatus on that path raises a FileNotFoundException: there is nothing > there. > HADOOP-18996 handles this in all of hadoop code, including FileInputFormat, > A filesystem can now be probed for inconsistent directoriy listings through > {{fs.hasPathCapability(path, "fs.capability.directory.listing.inconsistent")}} > If true, then treewalking code SHOULD NOT report a failure if, when walking > into a subdirectory, a list/getFileStatus on that directory raises a > FileNotFoundException. > Although most of this is handled in the hadoop code, but there some places > where treewalking is done inside spark These need to be identified and make > resilient to failure on the recurse down the tree > * SparkHadoopUtil list methods , > * especially listLeafStatuses used by OrcFileOperator > org.apache.spark.util.Utils#fetchHcfsFile > {{org.apache.hadoop.fs.FileUtil.maybeIgnoreMissingDirectory()}} can assist > here, or the logic can be replicated. Using the hadoop implementation would > be better from a maintenance perspective -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48263) Collate expression not working when default collation set
Nebojsa Savic created SPARK-48263: - Summary: Collate expression not working when default collation set Key: SPARK-48263 URL: https://issues.apache.org/jira/browse/SPARK-48263 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Nebojsa Savic When default collation level config is set to some collation other than UTF8_BINARY (i.e. UTF8_BINARY_LCASE) and when we try to execute COLLATE (or collation) expression, this will fail because it is only accepting StringType(0) as argument for collation name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48206) Add tests for window expression rewrites in RewriteWithExpression
[ https://issues.apache.org/jira/browse/SPARK-48206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48206. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46492 [https://github.com/apache/spark/pull/46492] > Add tests for window expression rewrites in RewriteWithExpression > - > > Key: SPARK-48206 > URL: https://issues.apache.org/jira/browse/SPARK-48206 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Window expressions can be potentially problematic if we pull out a window > expression outside a `Window` operator. Right now this shouldn't happen but > we should add some tests to make sure it doesn't break. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48206) Add tests for window expression rewrites in RewriteWithExpression
[ https://issues.apache.org/jira/browse/SPARK-48206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48206: --- Assignee: Kelvin Jiang > Add tests for window expression rewrites in RewriteWithExpression > - > > Key: SPARK-48206 > URL: https://issues.apache.org/jira/browse/SPARK-48206 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > > Window expressions can be potentially problematic if we pull out a window > expression outside a `Window` operator. Right now this shouldn't happen but > we should add some tests to make sure it doesn't break. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48215) DateFormatClass (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48215: --- Labels: pull-request-available (was: ) > DateFormatClass (all collations) > > > Key: SPARK-48215 > URL: https://issues.apache.org/jira/browse/SPARK-48215 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Enable collation support for the *DateFormatClass* built-in function in > Spark. First confirm what is the expected behaviour for this expression when > given collated strings, and then move on to implementation and testing. You > will find this expression in the *datetimeExpressions.scala* file, and it > should be considered a pass-through function with respect to collation > awareness. Implement the corresponding E2E SQL tests > (CollationSQLExpressionsSuite) to reflect how this function should be used > with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor > to experiment with the existing functions to learn more about how they work. > In addition, look into the possible use-cases and implementation of similar > functions within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *DateFormatClass* > expression so that it supports all collation types currently supported in > Spark. To understand what changes were introduced in order to enable full > collation support for other existing functions in Spark, take a look at the > Spark PRs and Jira tickets for completed tasks in this parent (for example: > Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, > FormatNumber, Sentences). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48031) Add schema evolution options to views
[ https://issues.apache.org/jira/browse/SPARK-48031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48031: --- Assignee: Serge Rielau > Add schema evolution options to views > -- > > Key: SPARK-48031 > URL: https://issues.apache.org/jira/browse/SPARK-48031 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Serge Rielau >Assignee: Serge Rielau >Priority: Major > Labels: pull-request-available > > We want to provide the ability for views to react to changes in the query > resolution in manners differently than just failing the view. > For example we want the view to be able to compensate for type changes by > casting the query result to the view column types. > Or to adopt any type of column arity changes into a view. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48031) Add schema evolution options to views
[ https://issues.apache.org/jira/browse/SPARK-48031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48031. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46267 [https://github.com/apache/spark/pull/46267] > Add schema evolution options to views > -- > > Key: SPARK-48031 > URL: https://issues.apache.org/jira/browse/SPARK-48031 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Serge Rielau >Assignee: Serge Rielau >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We want to provide the ability for views to react to changes in the query > resolution in manners differently than just failing the view. > For example we want the view to be able to compensate for type changes by > casting the query result to the view column types. > Or to adopt any type of column arity changes into a view. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48257) Polish POM for Hive dependencies
[ https://issues.apache.org/jira/browse/SPARK-48257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-48257. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46558 [https://github.com/apache/spark/pull/46558] > Polish POM for Hive dependencies > > > Key: SPARK-48257 > URL: https://issues.apache.org/jira/browse/SPARK-48257 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48257) Polish POM for Hive dependencies
[ https://issues.apache.org/jira/browse/SPARK-48257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-48257: Assignee: Cheng Pan > Polish POM for Hive dependencies > > > Key: SPARK-48257 > URL: https://issues.apache.org/jira/browse/SPARK-48257 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48043) Kryo serialization issue with push-based shuffle
[ https://issues.apache.org/jira/browse/SPARK-48043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Romain Ardiet updated SPARK-48043: -- Description: I'm running a spark job on AWS EMR. I wanted to test the new push-based shuffle introduced in Spark 3.2 but it's failing with a kryo exception when I'm enabling it. The issue is happening when Executor starts, during KryoSerializerInstance.getAutoReset() check: {code:java} 24/04/24 15:36:22 ERROR YarnCoarseGrainedExecutorBackend: Executor self-exiting due to : Unable to create executor due to Failed to register classes with Kryo org.apache.spark.SparkException: Failed to register classes with Kryo at org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$5(KryoSerializer.scala:186) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.15.jar:?] at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:241) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:174) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:105) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48) ~[kryo-shaded-4.0.2.jar:?] at org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:112) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:352) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializerInstance.getAutoReset(KryoSerializer.scala:452) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializer.supportsRelocationOfSerializedObjects$lzycompute(KryoSerializer.scala:259) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializer.supportsRelocationOfSerializedObjects(KryoSerializer.scala:255) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.util.Utils$.serializerIsSupported$lzycompute$1(Utils.scala:2721) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.util.Utils$.serializerIsSupported$1(Utils.scala:2716) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.util.Utils$.isPushBasedShuffleEnabled(Utils.scala:2730) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:554) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.executor.Executor.(Executor.scala:143) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:190) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_402] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_402] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_402] Caused by: java.lang.ClassNotFoundException: com.analytics.AnalyticsEventWrapper at java.net.URLClassLoader.findClass(URLClassLoader.java:387) ~[?:1.8.0_402] at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_402] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) ~[?:1.8.0_402] at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_402] at java.lang.Class.forName0(Native Method) ~[?:1.8.0_402] at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_402] at org.apache.spark.util.Utils$.classForName(Utils.scala:228) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$6(KryoSerializer.scala:177) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1] at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) ~[scala-library-2.12.15.jar:?] at scala.collection.mutable.ResizableArray.foreach$(Resizab
[jira] [Created] (SPARK-48262) Substitute BinaryExpression for explicit Expressions in CollationTypeCast
Mihailo Milosevic created SPARK-48262: - Summary: Substitute BinaryExpression for explicit Expressions in CollationTypeCast Key: SPARK-48262 URL: https://issues.apache.org/jira/browse/SPARK-48262 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48231) Remove unused CodeHaus Jackson dependencies
[ https://issues.apache.org/jira/browse/SPARK-48231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48231: -- Parent: (was: SPARK-47046) Issue Type: Bug (was: Sub-task) > Remove unused CodeHaus Jackson dependencies > --- > > Key: SPARK-48231 > URL: https://issues.apache.org/jira/browse/SPARK-48231 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48230) Remove unused jodd-core
[ https://issues.apache.org/jira/browse/SPARK-48230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48230: -- Parent: (was: SPARK-47046) Issue Type: Bug (was: Sub-task) > Remove unused jodd-core > --- > > Key: SPARK-48230 > URL: https://issues.apache.org/jira/browse/SPARK-48230 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48261) RoundRobin based coalesce in spark
Subham Singhal created SPARK-48261: -- Summary: RoundRobin based coalesce in spark Key: SPARK-48261 URL: https://issues.apache.org/jira/browse/SPARK-48261 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 3.5.1 Reporter: Subham Singhal Currently default coalsce does not take partition size into account and simply merges partitions. This often results in non-uniform data distribution. There have been proposal for size based coalesce([https://github.com/apache/spark/pull/27248).|https://github.com/apache/spark/pull/27248),] I am proposing a custom roundrobin coalesce which will distribute data evenly across partitions within same executor. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48260) disable output committer coordination in one test of ParquetIOSuite
[ https://issues.apache.org/jira/browse/SPARK-48260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48260: --- Labels: pull-request-available (was: ) > disable output committer coordination in one test of ParquetIOSuite > --- > > Key: SPARK-48260 > URL: https://issues.apache.org/jira/browse/SPARK-48260 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48260) disable output committer coordination in one test of ParquetIOSuite
Wenchen Fan created SPARK-48260: --- Summary: disable output committer coordination in one test of ParquetIOSuite Key: SPARK-48260 URL: https://issues.apache.org/jira/browse/SPARK-48260 Project: Spark Issue Type: Test Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48215) DateFormatClass (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845880#comment-17845880 ] Nebojsa Savic commented on SPARK-48215: --- Starting work. > DateFormatClass (all collations) > > > Key: SPARK-48215 > URL: https://issues.apache.org/jira/browse/SPARK-48215 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Enable collation support for the *DateFormatClass* built-in function in > Spark. First confirm what is the expected behaviour for this expression when > given collated strings, and then move on to implementation and testing. You > will find this expression in the *datetimeExpressions.scala* file, and it > should be considered a pass-through function with respect to collation > awareness. Implement the corresponding E2E SQL tests > (CollationSQLExpressionsSuite) to reflect how this function should be used > with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor > to experiment with the existing functions to learn more about how they work. > In addition, look into the possible use-cases and implementation of similar > functions within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *DateFormatClass* > expression so that it supports all collation types currently supported in > Spark. To understand what changes were introduced in order to enable full > collation support for other existing functions in Spark, take a look at the > Spark PRs and Jira tickets for completed tasks in this parent (for example: > Ascii, Chr, Base64, UnBase64, Decode, StringDecode, Encode, ToBinary, > FormatNumber, Sentences). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48259) Add 3 missing methods in dsl
[ https://issues.apache.org/jira/browse/SPARK-48259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48259: --- Labels: pull-request-available (was: ) > Add 3 missing methods in dsl > > > Key: SPARK-48259 > URL: https://issues.apache.org/jira/browse/SPARK-48259 > Project: Spark > Issue Type: Test > Components: Connect, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48259) Add 3 missing methods in dsl
Ruifeng Zheng created SPARK-48259: - Summary: Add 3 missing methods in dsl Key: SPARK-48259 URL: https://issues.apache.org/jira/browse/SPARK-48259 Project: Spark Issue Type: Test Components: Connect, Tests Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48258) Implement DataFrame.checkpoint and DataFrame.localCheckpoint
Hyukjin Kwon created SPARK-48258: Summary: Implement DataFrame.checkpoint and DataFrame.localCheckpoint Key: SPARK-48258 URL: https://issues.apache.org/jira/browse/SPARK-48258 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon We should add DataFrame.checkpoint and DataFrame.localCheckpoint for feature parity. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48254) Enhance Guava version extraction rule in dev/test-dependencies.sh
[ https://issues.apache.org/jira/browse/SPARK-48254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48254. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46555 [https://github.com/apache/spark/pull/46555] > Enhance Guava version extraction rule in dev/test-dependencies.sh > - > > Key: SPARK-48254 > URL: https://issues.apache.org/jira/browse/SPARK-48254 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48254) Enhance Guava version extraction rule in dev/test-dependencies.sh
[ https://issues.apache.org/jira/browse/SPARK-48254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48254: Assignee: Cheng Pan > Enhance Guava version extraction rule in dev/test-dependencies.sh > - > > Key: SPARK-48254 > URL: https://issues.apache.org/jira/browse/SPARK-48254 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48257) Polish POM for Hive dependencies
[ https://issues.apache.org/jira/browse/SPARK-48257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48257: --- Labels: pull-request-available (was: ) > Polish POM for Hive dependencies > > > Key: SPARK-48257 > URL: https://issues.apache.org/jira/browse/SPARK-48257 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48257) Polish POM for Hive dependencies
Cheng Pan created SPARK-48257: - Summary: Polish POM for Hive dependencies Key: SPARK-48257 URL: https://issues.apache.org/jira/browse/SPARK-48257 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48256) Add a rule to check file headers for the java side, and fix inconsistent files
[ https://issues.apache.org/jira/browse/SPARK-48256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48256: --- Labels: pull-request-available (was: ) > Add a rule to check file headers for the java side, and fix inconsistent files > -- > > Key: SPARK-48256 > URL: https://issues.apache.org/jira/browse/SPARK-48256 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48255) Guava should not respect hadoop.deps.scope
[ https://issues.apache.org/jira/browse/SPARK-48255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48255: --- Labels: pull-request-available (was: ) > Guava should not respect hadoop.deps.scope > -- > > Key: SPARK-48255 > URL: https://issues.apache.org/jira/browse/SPARK-48255 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48256) Add a rule to check file headers for the java side, and fix inconsistent files
BingKun Pan created SPARK-48256: --- Summary: Add a rule to check file headers for the java side, and fix inconsistent files Key: SPARK-48256 URL: https://issues.apache.org/jira/browse/SPARK-48256 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47415) Levenshtein (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47415: -- Assignee: (was: Apache Spark) > Levenshtein (all collations) > > > Key: SPARK-47415 > URL: https://issues.apache.org/jira/browse/SPARK-47415 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Enable collation support for the *Levenshtein* built-in string function in > Spark. First confirm what is the expected behaviour for this function when > given collated strings, and then move on to implementation and testing. > Implement the corresponding unit tests and E2E sql tests to reflect how this > function should be used with collation in SparkSQL, and feel free to use your > chosen Spark SQL Editor to experiment with the existing functions to learn > more about how they work. In addition, look into the possible use-cases and > implementation of similar functions within other other open-source DBMS, such > as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *Levenshtein* function so > it supports all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47415) Levenshtein (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47415: -- Assignee: Apache Spark > Levenshtein (all collations) > > > Key: SPARK-47415 > URL: https://issues.apache.org/jira/browse/SPARK-47415 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Enable collation support for the *Levenshtein* built-in string function in > Spark. First confirm what is the expected behaviour for this function when > given collated strings, and then move on to implementation and testing. > Implement the corresponding unit tests and E2E sql tests to reflect how this > function should be used with collation in SparkSQL, and feel free to use your > chosen Spark SQL Editor to experiment with the existing functions to learn > more about how they work. In addition, look into the possible use-cases and > implementation of similar functions within other other open-source DBMS, such > as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *Levenshtein* function so > it supports all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47415) Levenshtein (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47415: -- Assignee: (was: Apache Spark) > Levenshtein (all collations) > > > Key: SPARK-47415 > URL: https://issues.apache.org/jira/browse/SPARK-47415 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Enable collation support for the *Levenshtein* built-in string function in > Spark. First confirm what is the expected behaviour for this function when > given collated strings, and then move on to implementation and testing. > Implement the corresponding unit tests and E2E sql tests to reflect how this > function should be used with collation in SparkSQL, and feel free to use your > chosen Spark SQL Editor to experiment with the existing functions to learn > more about how they work. In addition, look into the possible use-cases and > implementation of similar functions within other other open-source DBMS, such > as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *Levenshtein* function so > it supports all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47415) Levenshtein (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47415: -- Assignee: Apache Spark > Levenshtein (all collations) > > > Key: SPARK-47415 > URL: https://issues.apache.org/jira/browse/SPARK-47415 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Enable collation support for the *Levenshtein* built-in string function in > Spark. First confirm what is the expected behaviour for this function when > given collated strings, and then move on to implementation and testing. > Implement the corresponding unit tests and E2E sql tests to reflect how this > function should be used with collation in SparkSQL, and feel free to use your > chosen Spark SQL Editor to experiment with the existing functions to learn > more about how they work. In addition, look into the possible use-cases and > implementation of similar functions within other other open-source DBMS, such > as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *Levenshtein* function so > it supports all collation types currently supported in Spark. To understand > what changes were introduced in order to enable full collation support for > other existing functions in Spark, take a look at the Spark PRs and Jira > tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48255) Guava should not respect hadoop.deps.scope
Cheng Pan created SPARK-48255: - Summary: Guava should not respect hadoop.deps.scope Key: SPARK-48255 URL: https://issues.apache.org/jira/browse/SPARK-48255 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48254) Enhance Guava version extraction rule in dev/test-dependencies.sh
[ https://issues.apache.org/jira/browse/SPARK-48254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48254: -- Assignee: (was: Apache Spark) > Enhance Guava version extraction rule in dev/test-dependencies.sh > - > > Key: SPARK-48254 > URL: https://issues.apache.org/jira/browse/SPARK-48254 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48254) Enhance Guava version extraction rule in dev/test-dependencies.sh
[ https://issues.apache.org/jira/browse/SPARK-48254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48254: -- Assignee: Apache Spark > Enhance Guava version extraction rule in dev/test-dependencies.sh > - > > Key: SPARK-48254 > URL: https://issues.apache.org/jira/browse/SPARK-48254 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48254) Enhance Guava version extraction rule in dev/test-dependencies.sh
[ https://issues.apache.org/jira/browse/SPARK-48254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48254: --- Labels: pull-request-available (was: ) > Enhance Guava version extraction rule in dev/test-dependencies.sh > - > > Key: SPARK-48254 > URL: https://issues.apache.org/jira/browse/SPARK-48254 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48254) Enhance Guava version extraction rule in dev/test-dependencies.sh
Cheng Pan created SPARK-48254: - Summary: Enhance Guava version extraction rule in dev/test-dependencies.sh Key: SPARK-48254 URL: https://issues.apache.org/jira/browse/SPARK-48254 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48253) Support default mode for Pandas API on Spark
[ https://issues.apache.org/jira/browse/SPARK-48253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48253: --- Labels: pull-request-available (was: ) > Support default mode for Pandas API on Spark > > > Key: SPARK-48253 > URL: https://issues.apache.org/jira/browse/SPARK-48253 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > To reduce the communication cost between Python process and JVM, suggest to > support default mode -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48248) Fix nested array to respect legacy conf of inferArrayTypeFromFirstElement
[ https://issues.apache.org/jira/browse/SPARK-48248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48248: Assignee: Hyukjin Kwon > Fix nested array to respect legacy conf of inferArrayTypeFromFirstElement > - > > Key: SPARK-48248 > URL: https://issues.apache.org/jira/browse/SPARK-48248 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > >>> spark.conf.set("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled", > >>> True) > >>> spark.createDataFrame(1, "a") > DataFrame[_1: array>] > {code} > should infer it as an integer of array -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48248) Fix nested array to respect legacy conf of inferArrayTypeFromFirstElement
[ https://issues.apache.org/jira/browse/SPARK-48248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48248. -- Fix Version/s: 3.4.4 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46548 [https://github.com/apache/spark/pull/46548] > Fix nested array to respect legacy conf of inferArrayTypeFromFirstElement > - > > Key: SPARK-48248 > URL: https://issues.apache.org/jira/browse/SPARK-48248 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 3.4.4, 3.5.2, 4.0.0 > > > {code} > >>> spark.conf.set("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled", > >>> True) > >>> spark.createDataFrame(1, "a") > DataFrame[_1: array>] > {code} > should infer it as an integer of array -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48250) Enable array inference tests at test_parity_types.py
[ https://issues.apache.org/jira/browse/SPARK-48250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48250. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46550 [https://github.com/apache/spark/pull/46550] > Enable array inference tests at test_parity_types.py > > > Key: SPARK-48250 > URL: https://issues.apache.org/jira/browse/SPARK-48250 > Project: Spark > Issue Type: Test > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > Some tests in test_types.py are using RDD unnecessarily. We can remove that > to enable some tests with Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48252) Update CommonExpressionRef when necessary
[ https://issues.apache.org/jira/browse/SPARK-48252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48252: --- Labels: pull-request-available (was: ) > Update CommonExpressionRef when necessary > - > > Key: SPARK-48252 > URL: https://issues.apache.org/jira/browse/SPARK-48252 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48253) Support default mode for Pandas API on Spark
Haejoon Lee created SPARK-48253: --- Summary: Support default mode for Pandas API on Spark Key: SPARK-48253 URL: https://issues.apache.org/jira/browse/SPARK-48253 Project: Spark Issue Type: Bug Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee To reduce the communication cost between Python process and JVM, suggest to support default mode -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48252) Update CommonExpressionRef when necessary
Wenchen Fan created SPARK-48252: --- Summary: Update CommonExpressionRef when necessary Key: SPARK-48252 URL: https://issues.apache.org/jira/browse/SPARK-48252 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48251) Disable `maven local cache` on GA's step `MIMA test`
[ https://issues.apache.org/jira/browse/SPARK-48251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48251: --- Labels: pull-request-available (was: ) > Disable `maven local cache` on GA's step `MIMA test` > > > Key: SPARK-48251 > URL: https://issues.apache.org/jira/browse/SPARK-48251 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48251) Disable `maven local cache` on GA's step `MIMA test`
[ https://issues.apache.org/jira/browse/SPARK-48251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-48251: Summary: Disable `maven local cache` on GA's step `MIMA test` (was: Disable `maven local cache` on step `MIMA test` of the GA's job `lint`) > Disable `maven local cache` on GA's step `MIMA test` > > > Key: SPARK-48251 > URL: https://issues.apache.org/jira/browse/SPARK-48251 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48251) Disable `maven local cache` on step `MIMA test` of the GA's job `lint`
BingKun Pan created SPARK-48251: --- Summary: Disable `maven local cache` on step `MIMA test` of the GA's job `lint` Key: SPARK-48251 URL: https://issues.apache.org/jira/browse/SPARK-48251 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org