[jira] [Updated] (SPARK-33391) element_at with CreateArray not respect one based index
[ https://issues.apache.org/jira/browse/SPARK-33391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leanken.Lin updated SPARK-33391: Description: var df = spark.sql("select element_at(array(3, 2, 1), 0)") df.printSchema() df = spark.sql("select element_at(array(3, 2, 1), 1)") df.printSchema() df = spark.sql("select element_at(array(3, 2, 1), 2)") df.printSchema() df = spark.sql("select element_at(array(3, 2, 1), 3)") df.printSchema() root |-- element_at(array(3, 2, 1), 0): integer (nullable = false) root |-- element_at(array(3, 2, 1), 1): integer (nullable = false) root |-- element_at(array(3, 2, 1), 2): integer (nullable = false) root |-- element_at(array(3, 2, 1), 3): integer (nullable = true) In this case, the nullable property in element_at with CreateArray statement is not correct. was:TODO > element_at with CreateArray not respect one based index > --- > > Key: SPARK-33391 > URL: https://issues.apache.org/jira/browse/SPARK-33391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Leanken.Lin >Priority: Major > > var df = spark.sql("select element_at(array(3, 2, 1), 0)") > df.printSchema() > df = spark.sql("select element_at(array(3, 2, 1), 1)") > df.printSchema() > df = spark.sql("select element_at(array(3, 2, 1), 2)") > df.printSchema() > df = spark.sql("select element_at(array(3, 2, 1), 3)") > df.printSchema() > root > |-- element_at(array(3, 2, 1), 0): integer (nullable = false) > root > |-- element_at(array(3, 2, 1), 1): integer (nullable = false) > root > |-- element_at(array(3, 2, 1), 2): integer (nullable = false) > root > |-- element_at(array(3, 2, 1), 3): integer (nullable = true) > > In this case, the nullable property in element_at with CreateArray statement > is not correct. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33391) element_at with CreateArray not respect one based index
[ https://issues.apache.org/jira/browse/SPARK-33391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leanken.Lin updated SPARK-33391: Summary: element_at with CreateArray not respect one based index (was: element_at not respect one based index) > element_at with CreateArray not respect one based index > --- > > Key: SPARK-33391 > URL: https://issues.apache.org/jira/browse/SPARK-33391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Leanken.Lin >Priority: Major > > TODO -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33391) element_at not respect one based index
Leanken.Lin created SPARK-33391: --- Summary: element_at not respect one based index Key: SPARK-33391 URL: https://issues.apache.org/jira/browse/SPARK-33391 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.0 Reporter: Leanken.Lin TODO -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33390) Make Literal support char array
[ https://issues.apache.org/jira/browse/SPARK-33390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33390: Assignee: Apache Spark > Make Literal support char array > --- > > Key: SPARK-33390 > URL: https://issues.apache.org/jira/browse/SPARK-33390 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: ulysses you >Assignee: Apache Spark >Priority: Minor > > Make Literal support char array. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33390) Make Literal support char array
[ https://issues.apache.org/jira/browse/SPARK-33390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228385#comment-17228385 ] Apache Spark commented on SPARK-33390: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/30295 > Make Literal support char array > --- > > Key: SPARK-33390 > URL: https://issues.apache.org/jira/browse/SPARK-33390 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: ulysses you >Priority: Minor > > Make Literal support char array. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33390) Make Literal support char array
[ https://issues.apache.org/jira/browse/SPARK-33390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33390: Assignee: (was: Apache Spark) > Make Literal support char array > --- > > Key: SPARK-33390 > URL: https://issues.apache.org/jira/browse/SPARK-33390 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: ulysses you >Priority: Minor > > Make Literal support char array. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33390) Make Literal support char array
ulysses you created SPARK-33390: --- Summary: Make Literal support char array Key: SPARK-33390 URL: https://issues.apache.org/jira/browse/SPARK-33390 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: ulysses you Make Literal support char array. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32405) Apply table options while creating tables in JDBC Table Catalog
[ https://issues.apache.org/jira/browse/SPARK-32405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-32405: --- Assignee: Huaxin Gao > Apply table options while creating tables in JDBC Table Catalog > --- > > Key: SPARK-32405 > URL: https://issues.apache.org/jira/browse/SPARK-32405 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Huaxin Gao >Priority: Major > > We need to add an API to `JdbcDialect` to generate the SQL statement to > specify table options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32405) Apply table options while creating tables in JDBC Table Catalog
[ https://issues.apache.org/jira/browse/SPARK-32405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-32405. - Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30154 [https://github.com/apache/spark/pull/30154] > Apply table options while creating tables in JDBC Table Catalog > --- > > Key: SPARK-32405 > URL: https://issues.apache.org/jira/browse/SPARK-32405 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.1.0 > > > We need to add an API to `JdbcDialect` to generate the SQL statement to > specify table options. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33389) make internal classes of SparkSession always using active SQLConf
Lu Lu created SPARK-33389: - Summary: make internal classes of SparkSession always using active SQLConf Key: SPARK-33389 URL: https://issues.apache.org/jira/browse/SPARK-33389 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Lu Lu -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33387) Support ordered shuffle block migration
[ https://issues.apache.org/jira/browse/SPARK-33387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33387. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30293 [https://github.com/apache/spark/pull/30293] > Support ordered shuffle block migration > --- > > Key: SPARK-33387 > URL: https://issues.apache.org/jira/browse/SPARK-33387 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > > Since the current shuffle block migration works in a random order, the > failure during worker decommission affects all of the shuffles. This issue > aims to support ordered migration. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33387) Support ordered shuffle block migration
[ https://issues.apache.org/jira/browse/SPARK-33387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-33387: - Assignee: Dongjoon Hyun > Support ordered shuffle block migration > --- > > Key: SPARK-33387 > URL: https://issues.apache.org/jira/browse/SPARK-33387 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > > Since the current shuffle block migration works in a random order, the > failure during worker decommission affects all of the shuffles. This issue > aims to support ordered migration. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33140) make all sub-class of Rule[QueryPlan] using SQLConf.get
[ https://issues.apache.org/jira/browse/SPARK-33140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lu Lu updated SPARK-33140: -- Summary: make all sub-class of Rule[QueryPlan] using SQLConf.get (was: make Analyzer rules using SQLConf.get) > make all sub-class of Rule[QueryPlan] using SQLConf.get > --- > > Key: SPARK-33140 > URL: https://issues.apache.org/jira/browse/SPARK-33140 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Leanken.Lin >Assignee: Leanken.Lin >Priority: Major > Fix For: 3.1.0 > > > TODO -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33140) make Analyzer rules using SQLConf.get
[ https://issues.apache.org/jira/browse/SPARK-33140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228356#comment-17228356 ] Apache Spark commented on SPARK-33140: -- User 'linhongliu-db' has created a pull request for this issue: https://github.com/apache/spark/pull/30294 > make Analyzer rules using SQLConf.get > - > > Key: SPARK-33140 > URL: https://issues.apache.org/jira/browse/SPARK-33140 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Leanken.Lin >Assignee: Leanken.Lin >Priority: Major > Fix For: 3.1.0 > > > TODO -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33140) make Analyzer rules using SQLConf.get
[ https://issues.apache.org/jira/browse/SPARK-33140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228357#comment-17228357 ] Apache Spark commented on SPARK-33140: -- User 'linhongliu-db' has created a pull request for this issue: https://github.com/apache/spark/pull/30294 > make Analyzer rules using SQLConf.get > - > > Key: SPARK-33140 > URL: https://issues.apache.org/jira/browse/SPARK-33140 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Leanken.Lin >Assignee: Leanken.Lin >Priority: Major > Fix For: 3.1.0 > > > TODO -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33371) Support Python 3.9+ in PySpark
[ https://issues.apache.org/jira/browse/SPARK-33371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33371: -- Fix Version/s: 3.0.2 > Support Python 3.9+ in PySpark > -- > > Key: SPARK-33371 > URL: https://issues.apache.org/jira/browse/SPARK-33371 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.1, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.2, 3.1.0 > > > Python 3.9 works with PySpark. we should fix setup.py. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33387) Support ordered shuffle block migration
[ https://issues.apache.org/jira/browse/SPARK-33387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33387: -- Description: Since the current shuffle block migration works in a random order, the failure during worker decommission affects all of the shuffles. This issue aims to support ordered migration. (was: Since the current shuffle block migration works in a random order like the following, the failure during worker decommission affects all of the shuffles. This issue aims to support ordered migration. shuffle_16_1900_0.index shuffle_19_2123_0.index shuffle_25_3792_0.index shuffle_25_3792_0.data shuffle_19_2123_0.data shuffle_16_1900_0.data shuffle_16_2015_0.index shuffle_16_2015_0.data shuffle_12_3264_0.index shuffle_14_4329_0.index shuffle_20_2463_0.index shuffle_20_2463_0.data shuffle_14_4329_0.data) > Support ordered shuffle block migration > --- > > Key: SPARK-33387 > URL: https://issues.apache.org/jira/browse/SPARK-33387 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > > Since the current shuffle block migration works in a random order, the > failure during worker decommission affects all of the shuffles. This issue > aims to support ordered migration. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33388) Merge In and InSet predicate
Yuming Wang created SPARK-33388: --- Summary: Merge In and InSet predicate Key: SPARK-33388 URL: https://issues.apache.org/jira/browse/SPARK-33388 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Yuming Wang Maybe we should create a base class for {{In}} and {{InSet}}, so that these 2 classes are only different in the expression tree, eval and codegen are the same. [https://github.com/apache/spark/pull/28269#issuecomment-655365714] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33387) Support ordered shuffle block migration
[ https://issues.apache.org/jira/browse/SPARK-33387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33387: Assignee: Apache Spark > Support ordered shuffle block migration > --- > > Key: SPARK-33387 > URL: https://issues.apache.org/jira/browse/SPARK-33387 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > > Since the current shuffle block migration works in a random order like the > following, the failure during worker decommission affects all of the > shuffles. This issue aims to support ordered migration. > shuffle_16_1900_0.index > shuffle_19_2123_0.index > shuffle_25_3792_0.index > shuffle_25_3792_0.data > shuffle_19_2123_0.data > shuffle_16_1900_0.data > shuffle_16_2015_0.index > shuffle_16_2015_0.data > shuffle_12_3264_0.index > shuffle_14_4329_0.index > shuffle_20_2463_0.index > shuffle_20_2463_0.data > shuffle_14_4329_0.data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33387) Support ordered shuffle block migration
[ https://issues.apache.org/jira/browse/SPARK-33387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33387: Assignee: (was: Apache Spark) > Support ordered shuffle block migration > --- > > Key: SPARK-33387 > URL: https://issues.apache.org/jira/browse/SPARK-33387 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > > Since the current shuffle block migration works in a random order like the > following, the failure during worker decommission affects all of the > shuffles. This issue aims to support ordered migration. > shuffle_16_1900_0.index > shuffle_19_2123_0.index > shuffle_25_3792_0.index > shuffle_25_3792_0.data > shuffle_19_2123_0.data > shuffle_16_1900_0.data > shuffle_16_2015_0.index > shuffle_16_2015_0.data > shuffle_12_3264_0.index > shuffle_14_4329_0.index > shuffle_20_2463_0.index > shuffle_20_2463_0.data > shuffle_14_4329_0.data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33387) Support ordered shuffle block migration
[ https://issues.apache.org/jira/browse/SPARK-33387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228341#comment-17228341 ] Apache Spark commented on SPARK-33387: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/30293 > Support ordered shuffle block migration > --- > > Key: SPARK-33387 > URL: https://issues.apache.org/jira/browse/SPARK-33387 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > > Since the current shuffle block migration works in a random order like the > following, the failure during worker decommission affects all of the > shuffles. This issue aims to support ordered migration. > shuffle_16_1900_0.index > shuffle_19_2123_0.index > shuffle_25_3792_0.index > shuffle_25_3792_0.data > shuffle_19_2123_0.data > shuffle_16_1900_0.data > shuffle_16_2015_0.index > shuffle_16_2015_0.data > shuffle_12_3264_0.index > shuffle_14_4329_0.index > shuffle_20_2463_0.index > shuffle_20_2463_0.data > shuffle_14_4329_0.data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33387) Support ordered shuffle block migration
[ https://issues.apache.org/jira/browse/SPARK-33387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33387: Assignee: (was: Apache Spark) > Support ordered shuffle block migration > --- > > Key: SPARK-33387 > URL: https://issues.apache.org/jira/browse/SPARK-33387 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > > Since the current shuffle block migration works in a random order like the > following, the failure during worker decommission affects all of the > shuffles. This issue aims to support ordered migration. > shuffle_16_1900_0.index > shuffle_19_2123_0.index > shuffle_25_3792_0.index > shuffle_25_3792_0.data > shuffle_19_2123_0.data > shuffle_16_1900_0.data > shuffle_16_2015_0.index > shuffle_16_2015_0.data > shuffle_12_3264_0.index > shuffle_14_4329_0.index > shuffle_20_2463_0.index > shuffle_20_2463_0.data > shuffle_14_4329_0.data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33387) Support ordered shuffle block migration
Dongjoon Hyun created SPARK-33387: - Summary: Support ordered shuffle block migration Key: SPARK-33387 URL: https://issues.apache.org/jira/browse/SPARK-33387 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.1.0 Reporter: Dongjoon Hyun Since the current shuffle block migration works in a random order like the following, the failure during worker decommission affects all of the shuffles. This issue aims to support ordered migration. shuffle_16_1900_0.index shuffle_19_2123_0.index shuffle_25_3792_0.index shuffle_25_3792_0.data shuffle_19_2123_0.data shuffle_16_1900_0.data shuffle_16_2015_0.index shuffle_16_2015_0.data shuffle_12_3264_0.index shuffle_14_4329_0.index shuffle_20_2463_0.index shuffle_20_2463_0.data shuffle_14_4329_0.data -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33244) Unify the code paths for spark.table and spark.read.table
[ https://issues.apache.org/jira/browse/SPARK-33244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanjian Li updated SPARK-33244: Description: * The code paths of `spark.table` and `spark.read.table` should be the same. This behavior is broke in SPARK-32592 since we need to respect options in `spark.read.table` API. * Add comments for `{{spark.table`}} to emphasize it also support streaming temp view reading was: * The code paths of `spark.table` and `spark.read.table` should be the same. This behavior is broke in SPARK-32592 since we need to respect options in `spark.read.table` API. * Add comment for `{{spark.table`}} to emphasize it also support streaming temp view reading > Unify the code paths for spark.table and spark.read.table > - > > Key: SPARK-33244 > URL: https://issues.apache.org/jira/browse/SPARK-33244 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Major > > * The code paths of `spark.table` and `spark.read.table` should be the same. > This behavior is broke in SPARK-32592 since we need to respect options in > `spark.read.table` API. > * Add comments for `{{spark.table`}} to emphasize it also support streaming > temp view reading -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33244) Unify the code paths for spark.table and spark.read.table
[ https://issues.apache.org/jira/browse/SPARK-33244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanjian Li updated SPARK-33244: Description: * The code paths of `spark.table` and `spark.read.table` should be the same. This behavior is broke in SPARK-32592 since we need to respect options in `spark.read.table` API. * Add comment for `{{spark.table`}} to emphasize it also support streaming temp view reading was:The code paths of `spark.table` and `spark.read.table` should be the same. This behavior is broke in SPARK-32592 since we need to respect options in `spark.read.table` API. > Unify the code paths for spark.table and spark.read.table > - > > Key: SPARK-33244 > URL: https://issues.apache.org/jira/browse/SPARK-33244 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Major > > * The code paths of `spark.table` and `spark.read.table` should be the same. > This behavior is broke in SPARK-32592 since we need to respect options in > `spark.read.table` API. > * Add comment for `{{spark.table`}} to emphasize it also support streaming > temp view reading -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33244) Unify the code paths for spark.table and spark.read.table
[ https://issues.apache.org/jira/browse/SPARK-33244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanjian Li updated SPARK-33244: Description: The code paths of `spark.table` and `spark.read.table` should be the same. This behavior is broke in SPARK-32592 since we need to respect options in `spark.read.table` API. (was: * Block reading streaming temp view via `spark.table` API * The code paths of `spark.table` and `spark.read.table` should be the same. This behavior is broke in SPARK-32592 since we need to respect options in `spark.read.table` API.) > Unify the code paths for spark.table and spark.read.table > - > > Key: SPARK-33244 > URL: https://issues.apache.org/jira/browse/SPARK-33244 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Major > > The code paths of `spark.table` and `spark.read.table` should be the same. > This behavior is broke in SPARK-32592 since we need to respect options in > `spark.read.table` API. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33244) Unify the code paths for spark.table and spark.read.table
[ https://issues.apache.org/jira/browse/SPARK-33244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanjian Li updated SPARK-33244: Summary: Unify the code paths for spark.table and spark.read.table (was: Block reading streaming temp view via `spark.table` API) > Unify the code paths for spark.table and spark.read.table > - > > Key: SPARK-33244 > URL: https://issues.apache.org/jira/browse/SPARK-33244 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Major > > * Block reading streaming temp view via `spark.table` API > * The code paths of `spark.table` and `spark.read.table` should be the same. > This behavior is broke in SPARK-32592 since we need to respect options in > `spark.read.table` API. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33166) Provide Search Function in Spark docs site
[ https://issues.apache.org/jira/browse/SPARK-33166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228335#comment-17228335 ] Apache Spark commented on SPARK-33166: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/30292 > Provide Search Function in Spark docs site > -- > > Key: SPARK-33166 > URL: https://issues.apache.org/jira/browse/SPARK-33166 > Project: Spark > Issue Type: New Feature > Components: Documentation >Affects Versions: 3.1.0 >Reporter: Xiao Li >Priority: Major > > In the last few releases, our Spark documentation > https://spark.apache.org/docs/latest/ becomes richer. It would nice to > provide a search function to make our users find contents faster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33166) Provide Search Function in Spark docs site
[ https://issues.apache.org/jira/browse/SPARK-33166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33166: Assignee: Apache Spark > Provide Search Function in Spark docs site > -- > > Key: SPARK-33166 > URL: https://issues.apache.org/jira/browse/SPARK-33166 > Project: Spark > Issue Type: New Feature > Components: Documentation >Affects Versions: 3.1.0 >Reporter: Xiao Li >Assignee: Apache Spark >Priority: Major > > In the last few releases, our Spark documentation > https://spark.apache.org/docs/latest/ becomes richer. It would nice to > provide a search function to make our users find contents faster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33166) Provide Search Function in Spark docs site
[ https://issues.apache.org/jira/browse/SPARK-33166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33166: Assignee: (was: Apache Spark) > Provide Search Function in Spark docs site > -- > > Key: SPARK-33166 > URL: https://issues.apache.org/jira/browse/SPARK-33166 > Project: Spark > Issue Type: New Feature > Components: Documentation >Affects Versions: 3.1.0 >Reporter: Xiao Li >Priority: Major > > In the last few releases, our Spark documentation > https://spark.apache.org/docs/latest/ becomes richer. It would nice to > provide a search function to make our users find contents faster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33386) Accessing array elements should failed if index is out of bound.
[ https://issues.apache.org/jira/browse/SPARK-33386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leanken.Lin updated SPARK-33386: Description: When ansi mode enabled, accessing array element should failed with exception, but currently it's returning null. (was: TODO) > Accessing array elements should failed if index is out of bound. > > > Key: SPARK-33386 > URL: https://issues.apache.org/jira/browse/SPARK-33386 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Leanken.Lin >Priority: Major > > When ansi mode enabled, accessing array element should failed with exception, > but currently it's returning null. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33386) Accessing array elements should failed if index is out of bound.
Leanken.Lin created SPARK-33386: --- Summary: Accessing array elements should failed if index is out of bound. Key: SPARK-33386 URL: https://issues.apache.org/jira/browse/SPARK-33386 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Leanken.Lin TODO -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33384) Delete temporary file when cancelling writing to final path even underlying stream throwing error
[ https://issues.apache.org/jira/browse/SPARK-33384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33384. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30290 [https://github.com/apache/spark/pull/30290] > Delete temporary file when cancelling writing to final path even underlying > stream throwing error > - > > Key: SPARK-33384 > URL: https://issues.apache.org/jira/browse/SPARK-33384 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Minor > Fix For: 3.1.0 > > > In {{RenameBasedFSDataOutputStream.cancel}}, we do two things: closing > underlying stream and delete temporary file, in a single try/catch block. > Closing {{OutputStream}} could possibly throw {{IOException}} so we possibly > missing deleting temporary file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33385) Bucket pruning support IsNaN
[ https://issues.apache.org/jira/browse/SPARK-33385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33385: Assignee: (was: Apache Spark) > Bucket pruning support IsNaN > > > Key: SPARK-33385 > URL: https://issues.apache.org/jira/browse/SPARK-33385 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > {{IsNaN}} can also support bucket pruning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33385) Bucket pruning support IsNaN
[ https://issues.apache.org/jira/browse/SPARK-33385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33385: Assignee: Apache Spark > Bucket pruning support IsNaN > > > Key: SPARK-33385 > URL: https://issues.apache.org/jira/browse/SPARK-33385 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > {{IsNaN}} can also support bucket pruning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33385) Bucket pruning support IsNaN
[ https://issues.apache.org/jira/browse/SPARK-33385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228324#comment-17228324 ] Apache Spark commented on SPARK-33385: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/30291 > Bucket pruning support IsNaN > > > Key: SPARK-33385 > URL: https://issues.apache.org/jira/browse/SPARK-33385 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > {{IsNaN}} can also support bucket pruning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33385) Bucket pruning support IsNaN
[ https://issues.apache.org/jira/browse/SPARK-33385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228323#comment-17228323 ] Apache Spark commented on SPARK-33385: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/30291 > Bucket pruning support IsNaN > > > Key: SPARK-33385 > URL: https://issues.apache.org/jira/browse/SPARK-33385 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > {{IsNaN}} can also support bucket pruning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33385) Bucket pruning support IsNaN
Yuming Wang created SPARK-33385: --- Summary: Bucket pruning support IsNaN Key: SPARK-33385 URL: https://issues.apache.org/jira/browse/SPARK-33385 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Yuming Wang {{IsNaN}} can also support bucket pruning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33352) Fix procedure-like declaration compilation warning in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-33352: Assignee: Yang Jie > Fix procedure-like declaration compilation warning in Scala 2.13 > > > Key: SPARK-33352 > URL: https://issues.apache.org/jira/browse/SPARK-33352 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > Similar to spark-29291, just to track Spark 3.1.0. > There are two similar compilation warnings about procedure-like declaration > in Scala 2.13.3: > > {code:java} > [WARNING] [Warn] > /spark/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70: > procedure syntax is deprecated for constructors: add `=`, as in method > definition > [WARNING] [Warn] > /spark/core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala:211: > procedure syntax is deprecated: instead, add `: Unit =` to explicitly > declare `run`'s return type > {code} > > For constructors method definition should be `this(...) = \{ }` not > `this(...) \{ }`, for without > `return type` methods definition should be `def methodName(...): Unit = {}` > not `def methodName(...) {}` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33352) Fix procedure-like declaration compilation warning in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-33352. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30255 [https://github.com/apache/spark/pull/30255] > Fix procedure-like declaration compilation warning in Scala 2.13 > > > Key: SPARK-33352 > URL: https://issues.apache.org/jira/browse/SPARK-33352 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.1.0 > > > Similar to spark-29291, just to track Spark 3.1.0. > There are two similar compilation warnings about procedure-like declaration > in Scala 2.13.3: > > {code:java} > [WARNING] [Warn] > /spark/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70: > procedure syntax is deprecated for constructors: add `=`, as in method > definition > [WARNING] [Warn] > /spark/core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala:211: > procedure syntax is deprecated: instead, add `: Unit =` to explicitly > declare `run`'s return type > {code} > > For constructors method definition should be `this(...) = \{ }` not > `this(...) \{ }`, for without > `return type` methods definition should be `def methodName(...): Unit = {}` > not `def methodName(...) {}` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33384) Delete temporary file when cancelling writing to final path even underlying stream throwing error
[ https://issues.apache.org/jira/browse/SPARK-33384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228270#comment-17228270 ] Apache Spark commented on SPARK-33384: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30290 > Delete temporary file when cancelling writing to final path even underlying > stream throwing error > - > > Key: SPARK-33384 > URL: https://issues.apache.org/jira/browse/SPARK-33384 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Minor > > In {{RenameBasedFSDataOutputStream.cancel}}, we do two things: closing > underlying stream and delete temporary file, in a single try/catch block. > Closing {{OutputStream}} could possibly throw {{IOException}} so we possibly > missing deleting temporary file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33384) Delete temporary file when cancelling writing to final path even underlying stream throwing error
[ https://issues.apache.org/jira/browse/SPARK-33384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33384: Assignee: Apache Spark (was: L. C. Hsieh) > Delete temporary file when cancelling writing to final path even underlying > stream throwing error > - > > Key: SPARK-33384 > URL: https://issues.apache.org/jira/browse/SPARK-33384 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: Apache Spark >Priority: Minor > > In {{RenameBasedFSDataOutputStream.cancel}}, we do two things: closing > underlying stream and delete temporary file, in a single try/catch block. > Closing {{OutputStream}} could possibly throw {{IOException}} so we possibly > missing deleting temporary file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33384) Delete temporary file when cancelling writing to final path even underlying stream throwing error
[ https://issues.apache.org/jira/browse/SPARK-33384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228269#comment-17228269 ] Apache Spark commented on SPARK-33384: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30290 > Delete temporary file when cancelling writing to final path even underlying > stream throwing error > - > > Key: SPARK-33384 > URL: https://issues.apache.org/jira/browse/SPARK-33384 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Minor > > In {{RenameBasedFSDataOutputStream.cancel}}, we do two things: closing > underlying stream and delete temporary file, in a single try/catch block. > Closing {{OutputStream}} could possibly throw {{IOException}} so we possibly > missing deleting temporary file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33384) Delete temporary file when cancelling writing to final path even underlying stream throwing error
[ https://issues.apache.org/jira/browse/SPARK-33384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33384: Assignee: L. C. Hsieh (was: Apache Spark) > Delete temporary file when cancelling writing to final path even underlying > stream throwing error > - > > Key: SPARK-33384 > URL: https://issues.apache.org/jira/browse/SPARK-33384 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Minor > > In {{RenameBasedFSDataOutputStream.cancel}}, we do two things: closing > underlying stream and delete temporary file, in a single try/catch block. > Closing {{OutputStream}} could possibly throw {{IOException}} so we possibly > missing deleting temporary file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33384) Delete temporary file when cancelling writing to final path even underlying stream throwing error
L. C. Hsieh created SPARK-33384: --- Summary: Delete temporary file when cancelling writing to final path even underlying stream throwing error Key: SPARK-33384 URL: https://issues.apache.org/jira/browse/SPARK-33384 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.1.0 Reporter: L. C. Hsieh Assignee: L. C. Hsieh In {{RenameBasedFSDataOutputStream.cancel}}, we do two things: closing underlying stream and delete temporary file, in a single try/catch block. Closing {{OutputStream}} could possibly throw {{IOException}} so we possibly missing deleting temporary file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33141) capture SQL configs when creating permanent views
[ https://issues.apache.org/jira/browse/SPARK-33141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33141: Assignee: (was: Apache Spark) > capture SQL configs when creating permanent views > - > > Key: SPARK-33141 > URL: https://issues.apache.org/jira/browse/SPARK-33141 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Leanken.Lin >Priority: Major > > TODO -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33141) capture SQL configs when creating permanent views
[ https://issues.apache.org/jira/browse/SPARK-33141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228069#comment-17228069 ] Apache Spark commented on SPARK-33141: -- User 'luluorta' has created a pull request for this issue: https://github.com/apache/spark/pull/30289 > capture SQL configs when creating permanent views > - > > Key: SPARK-33141 > URL: https://issues.apache.org/jira/browse/SPARK-33141 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Leanken.Lin >Priority: Major > > TODO -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33141) capture SQL configs when creating permanent views
[ https://issues.apache.org/jira/browse/SPARK-33141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33141: Assignee: Apache Spark > capture SQL configs when creating permanent views > - > > Key: SPARK-33141 > URL: https://issues.apache.org/jira/browse/SPARK-33141 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Leanken.Lin >Assignee: Apache Spark >Priority: Major > > TODO -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33141) capture SQL configs when creating permanent views
[ https://issues.apache.org/jira/browse/SPARK-33141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228030#comment-17228030 ] Apache Spark commented on SPARK-33141: -- User 'luluorta' has created a pull request for this issue: https://github.com/apache/spark/pull/30289 > capture SQL configs when creating permanent views > - > > Key: SPARK-33141 > URL: https://issues.apache.org/jira/browse/SPARK-33141 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Leanken.Lin >Priority: Major > > TODO -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33383) Improve performance of Column.isin Expression
[ https://issues.apache.org/jira/browse/SPARK-33383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Wollschläger updated SPARK-33383: --- Description: When I asked [a question on Stackoverflow|https://stackoverflow.com/questions/64683189/usage-of-broadcast-variables-when-using-only-spark-sql-api] and running some local tests, I came across a performance bottleneck when using the _where_-Condition _Column.isin_. I have a set of allowed-values ("whitelist") with a size that's handleable in-memory really good (about 10k values). I thought simply using the _Column.isin_ Expression in the SQL API should be the way to go. I assumed it would be runtime equivalent to {code} df.filter(row => allowedValues.contains(row.getInt(0))) {code} however, when running a few tests locally, I realized that using _Column.isin_ is actually about 10 times slower than a _rdd.filter_ or a broadcast-inner-join. Shouldn't {code}df.where(col("colname").isin(allowedValues)){code} perform (SQL-API overhead aside) as good as {code}df.filter(row => allowedValues.contains(row.getInt(0))){code} ? I used the following dummy code for my local tests: {code:scala} package example import org.apache.spark.sql.functions.{broadcast, col, count} import org.apache.spark.sql.{DataFrame, SparkSession} import scala.util.Random object Test { def main(args: Array[String]): Unit = { val spark = SparkSession.builder() .appName("Name") .master("local[*]") .config("spark.driver.host", "localhost") .config("spark.ui.enabled", "false") .getOrCreate() import spark.implicits._ val _10Million = 1000 val random = new Random(1048394789305L) val values = Seq.fill(_10Million)(random.nextInt()) val df = values.toDF("value") val allowedValues = getRandomElements(values, random, 1) println("Starting ...") runWithInCollection(spark, df, allowedValues) println(" In Collection") runWithBroadcastDF(spark, df, allowedValues) println(" Broadcast DF") runWithBroadcastVariable(spark, df, allowedValues) println(" Broadcast Variable") } def getRandomElements[A](seq: Seq[A], random: Random, size: Int): Set[A] = { val builder = Set.newBuilder[A] for (i <- 0 until size) { builder += getRandomElement(seq, random) } builder.result() } def getRandomElement[A](seq: Seq[A], random: Random): A = { seq(random.nextInt(seq.length)) } // I expected this one to be almost equivalent to the one with a broadcast-variable, but it's actually about 10 times slower def runWithInCollection(spark: SparkSession, df: DataFrame, allowedValues: Set[Int]): Unit = { spark.time { df.where(col("value").isInCollection(allowedValues)).runTestAggregation() } } // A bit slower than the one with a broadcast variable def runWithBroadcastDF(spark: SparkSession, df: DataFrame, allowedValues: Set[Int]): Unit = { import spark.implicits._ val allowedValuesDF = allowedValues.toSeq.toDF("allowedValue") spark.time { df.join(broadcast(allowedValuesDF), col("value") === col("allowedValue")).runTestAggregation() } } // This is actually the fastest one def runWithBroadcastVariable(spark: SparkSession, df: DataFrame, allowedValues: Set[Int]): Unit = { val allowedValuesBroadcast = spark.sparkContext.broadcast(allowedValues) spark.time { df.filter(row => allowedValuesBroadcast.value.contains(row.getInt(0))).runTestAggregation() } } implicit class TestRunner(val df: DataFrame) { def runTestAggregation(): Unit = { df.agg(count("value")).show() } } } {code} was: When I asked [a question on Stackoverflow|https://stackoverflow.com/questions/64683189/usage-of-broadcast-variables-when-using-only-spark-sql-api] and running some local tests, I came across a performance bottleneck when using the _where_-Condition _Column.isin_. I have a set of allowed-values ("whitelist") with a size that's handleable in-memory really good (about 10k values). I thought simply using the _Column.isin_ Expression in the SQL API should be the way to go. I assumed it would be runtime equivalent to {code} df.filter(row => allowedValues.contains(row.getInt(0))) {code} however, when running a few tests locally, I realized that using _Column.isin_ is actually about 10 times slower than a _rdd.filter_ or a broadcast-inner-join. Shouldn't {code}df.where(col("colname").isin(allowedValues)){code} perform (SQL-API overhead aside) as good as {code}df.filter(row => allowedValues.contains(row.getInt(0))){code} ? I used the following dummy code for my local tests: {code:scala} package example
[jira] [Updated] (SPARK-33383) Improve performance of Column.isin Expression
[ https://issues.apache.org/jira/browse/SPARK-33383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Wollschläger updated SPARK-33383: --- Description: When I asked [a question on Stackoverflow|https://stackoverflow.com/questions/64683189/usage-of-broadcast-variables-when-using-only-spark-sql-api] and running some local tests, I came across a performance bottleneck when using the _where_-Condition _Column.isin_. I have a set of allowed-values ("whitelist") with a size that's handleable in-memory really good (about 10k values). I thought simply using the _Column.isin_ Expression in the SQL API should be the way to go. I assumed it would be runtime equivalent to {code} df.filter(row => allowedValues.contains(row.getInt(0))) {code} however, when running a few tests locally, I realized that using _Column.isin_ is actually about 10 times slower than a _rdd.filter_ or a broadcast-inner-join. Shouldn't {code}df.where(col("colname").isin(allowedValues)){code} perform (SQL-API overhead aside) perform as good as {code}df.filter(row => allowedValues.contains(row.getInt(0))){code} ? I used the following dummy code for my local tests: {code:scala} package example import org.apache.spark.sql.functions.{broadcast, col, count} import org.apache.spark.sql.{DataFrame, SparkSession} import scala.util.Random object Test { def main(args: Array[String]): Unit = { val spark = SparkSession.builder() .appName("Name") .master("local[*]") .config("spark.driver.host", "localhost") .config("spark.ui.enabled", "false") .getOrCreate() import spark.implicits._ val _10Million = 1000 val random = new Random(1048394789305L) val values = Seq.fill(_10Million)(random.nextInt()) val df = Seq.fill(_10Million)(random.nextInt()).toDF("value") val allowedValues = getRandomElements(values, random, 1) println("Starting ...") runWithInCollection(spark, df, allowedValues) println(" In Collection") runWithBroadcastDF(spark, df, allowedValues) println(" Broadcast DF") runWithBroadcastVariable(spark, df, allowedValues) println(" Broadcast Variable") } def getRandomElements[A](seq: Seq[A], random: Random, size: Int): Set[A] = { val builder = Set.newBuilder[A] for (i <- 0 until size) { builder += getRandomElement(seq, random) } builder.result() } def getRandomElement[A](seq: Seq[A], random: Random): A = { seq(random.nextInt(seq.length)) } // I expected this one to be almost equivalent to the one with a broadcast-variable, but it's actually about 10 times slower def runWithInCollection(spark: SparkSession, df: DataFrame, allowedValues: Set[Int]): Unit = { spark.time { df.where(col("value").isInCollection(allowedValues)).runTestAggregation() } } // A bit slower than the one with a broadcast variable def runWithBroadcastDF(spark: SparkSession, df: DataFrame, allowedValues: Set[Int]): Unit = { import spark.implicits._ val allowedValuesDF = allowedValues.toSeq.toDF("allowedValue") spark.time { df.join(broadcast(allowedValuesDF), col("value") === col("allowedValue")).runTestAggregation() } } // This is actually the fastest one def runWithBroadcastVariable(spark: SparkSession, df: DataFrame, allowedValues: Set[Int]): Unit = { val allowedValuesBroadcast = spark.sparkContext.broadcast(allowedValues) spark.time { df.filter(row => allowedValuesBroadcast.value.contains(row.getInt(0))).runTestAggregation() } } implicit class TestRunner(val df: DataFrame) { def runTestAggregation(): Unit = { df.agg(count("value")).show() } } } {code} was: When I asked [a question on Stackoverflow|https://stackoverflow.com/questions/64683189/usage-of-broadcast-variables-when-using-only-spark-sql-api] and running some local tests, I came across a performance bottleneck when using the `where`-Condition `Column.isin`. I have a set of allowed-values ("whitelist") with a size that's handleable in-memory really good (about 10k values). I thought simply using the `Column.isin` Expression in the SQL API should be the way to go. I assumed it would be runtime equivalent to ```scala df.filter(row => allowedValues.contains(row.getInt(0))) ``` {noformat} fdfsf {noformat} however, when running a few tests locally, I realized that using `Column.isin` is actually about 10 times slower than a ```rdd.filter``` or a broadcast-inner-join. Shouldn't ```df.where(col("colname").isin(allowedValues))``` perform (SQL-API overhead aside) perform as good as ```df.filter(row => allowedValues.contains(row.getInt(0)))``` ? {code:scala}
[jira] [Updated] (SPARK-33383) Improve performance of Column.isin Expression
[ https://issues.apache.org/jira/browse/SPARK-33383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Wollschläger updated SPARK-33383: --- Description: When I asked [a question on Stackoverflow|https://stackoverflow.com/questions/64683189/usage-of-broadcast-variables-when-using-only-spark-sql-api] and running some local tests, I came across a performance bottleneck when using the _where_-Condition _Column.isin_. I have a set of allowed-values ("whitelist") with a size that's handleable in-memory really good (about 10k values). I thought simply using the _Column.isin_ Expression in the SQL API should be the way to go. I assumed it would be runtime equivalent to {code} df.filter(row => allowedValues.contains(row.getInt(0))) {code} however, when running a few tests locally, I realized that using _Column.isin_ is actually about 10 times slower than a _rdd.filter_ or a broadcast-inner-join. Shouldn't {code}df.where(col("colname").isin(allowedValues)){code} perform (SQL-API overhead aside) as good as {code}df.filter(row => allowedValues.contains(row.getInt(0))){code} ? I used the following dummy code for my local tests: {code:scala} package example import org.apache.spark.sql.functions.{broadcast, col, count} import org.apache.spark.sql.{DataFrame, SparkSession} import scala.util.Random object Test { def main(args: Array[String]): Unit = { val spark = SparkSession.builder() .appName("Name") .master("local[*]") .config("spark.driver.host", "localhost") .config("spark.ui.enabled", "false") .getOrCreate() import spark.implicits._ val _10Million = 1000 val random = new Random(1048394789305L) val values = Seq.fill(_10Million)(random.nextInt()) val df = Seq.fill(_10Million)(random.nextInt()).toDF("value") val allowedValues = getRandomElements(values, random, 1) println("Starting ...") runWithInCollection(spark, df, allowedValues) println(" In Collection") runWithBroadcastDF(spark, df, allowedValues) println(" Broadcast DF") runWithBroadcastVariable(spark, df, allowedValues) println(" Broadcast Variable") } def getRandomElements[A](seq: Seq[A], random: Random, size: Int): Set[A] = { val builder = Set.newBuilder[A] for (i <- 0 until size) { builder += getRandomElement(seq, random) } builder.result() } def getRandomElement[A](seq: Seq[A], random: Random): A = { seq(random.nextInt(seq.length)) } // I expected this one to be almost equivalent to the one with a broadcast-variable, but it's actually about 10 times slower def runWithInCollection(spark: SparkSession, df: DataFrame, allowedValues: Set[Int]): Unit = { spark.time { df.where(col("value").isInCollection(allowedValues)).runTestAggregation() } } // A bit slower than the one with a broadcast variable def runWithBroadcastDF(spark: SparkSession, df: DataFrame, allowedValues: Set[Int]): Unit = { import spark.implicits._ val allowedValuesDF = allowedValues.toSeq.toDF("allowedValue") spark.time { df.join(broadcast(allowedValuesDF), col("value") === col("allowedValue")).runTestAggregation() } } // This is actually the fastest one def runWithBroadcastVariable(spark: SparkSession, df: DataFrame, allowedValues: Set[Int]): Unit = { val allowedValuesBroadcast = spark.sparkContext.broadcast(allowedValues) spark.time { df.filter(row => allowedValuesBroadcast.value.contains(row.getInt(0))).runTestAggregation() } } implicit class TestRunner(val df: DataFrame) { def runTestAggregation(): Unit = { df.agg(count("value")).show() } } } {code} was: When I asked [a question on Stackoverflow|https://stackoverflow.com/questions/64683189/usage-of-broadcast-variables-when-using-only-spark-sql-api] and running some local tests, I came across a performance bottleneck when using the _where_-Condition _Column.isin_. I have a set of allowed-values ("whitelist") with a size that's handleable in-memory really good (about 10k values). I thought simply using the _Column.isin_ Expression in the SQL API should be the way to go. I assumed it would be runtime equivalent to {code} df.filter(row => allowedValues.contains(row.getInt(0))) {code} however, when running a few tests locally, I realized that using _Column.isin_ is actually about 10 times slower than a _rdd.filter_ or a broadcast-inner-join. Shouldn't {code}df.where(col("colname").isin(allowedValues)){code} perform (SQL-API overhead aside) perform as good as {code}df.filter(row => allowedValues.contains(row.getInt(0))){code} ? I used the following dummy code for my local
[jira] [Updated] (SPARK-33383) Improve performance of Column.isin Expression
[ https://issues.apache.org/jira/browse/SPARK-33383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Wollschläger updated SPARK-33383: --- Description: When I asked [a question on Stackoverflow|https://stackoverflow.com/questions/64683189/usage-of-broadcast-variables-when-using-only-spark-sql-api] and running some local tests, I came across a performance bottleneck when using the `where`-Condition `Column.isin`. I have a set of allowed-values ("whitelist") with a size that's handleable in-memory really good (about 10k values). I thought simply using the `Column.isin` Expression in the SQL API should be the way to go. I assumed it would be runtime equivalent to ```scala df.filter(row => allowedValues.contains(row.getInt(0))) ``` {noformat} fdfsf {noformat} however, when running a few tests locally, I realized that using `Column.isin` is actually about 10 times slower than a ```rdd.filter``` or a broadcast-inner-join. Shouldn't ```df.where(col("colname").isin(allowedValues))``` perform (SQL-API overhead aside) perform as good as ```df.filter(row => allowedValues.contains(row.getInt(0)))``` ? {code:scala} package example import org.apache.spark.sql.functions.{broadcast, col, count} import org.apache.spark.sql.{DataFrame, SparkSession} import scala.util.Random object Test { def main(args: Array[String]): Unit = { val spark = SparkSession.builder() .appName("Name") .master("local[*]") .config("spark.driver.host", "localhost") .config("spark.ui.enabled", "false") .getOrCreate() import spark.implicits._ val _10Million = 1000 val random = new Random(1048394789305L) val values = Seq.fill(_10Million)(random.nextInt()) val df = Seq.fill(_10Million)(random.nextInt()).toDF("value") val allowedValues = getRandomElements(values, random, 1) println("Starting ...") runWithInCollection(spark, df, allowedValues) println(" In Collection") runWithBroadcastDF(spark, df, allowedValues) println(" Broadcast DF") runWithBroadcastVariable(spark, df, allowedValues) println(" Broadcast Variable") } def getRandomElements[A](seq: Seq[A], random: Random, size: Int): Set[A] = { val builder = Set.newBuilder[A] for (i <- 0 until size) { builder += getRandomElement(seq, random) } builder.result() } def getRandomElement[A](seq: Seq[A], random: Random): A = { seq(random.nextInt(seq.length)) } // I expected this one to be almost equivalent to the one with a broadcast-variable, but it's actually about 10 times slower def runWithInCollection(spark: SparkSession, df: DataFrame, allowedValues: Set[Int]): Unit = { spark.time { df.where(col("value").isInCollection(allowedValues)).runTestAggregation() } } // A bit slower than the one with a broadcast variable def runWithBroadcastDF(spark: SparkSession, df: DataFrame, allowedValues: Set[Int]): Unit = { import spark.implicits._ val allowedValuesDF = allowedValues.toSeq.toDF("allowedValue") spark.time { df.join(broadcast(allowedValuesDF), col("value") === col("allowedValue")).runTestAggregation() } } // This is actually the fastest one def runWithBroadcastVariable(spark: SparkSession, df: DataFrame, allowedValues: Set[Int]): Unit = { val allowedValuesBroadcast = spark.sparkContext.broadcast(allowedValues) spark.time { df.filter(row => allowedValuesBroadcast.value.contains(row.getInt(0))).runTestAggregation() } } implicit class TestRunner(val df: DataFrame) { def runTestAggregation(): Unit = { df.agg(count("value")).show() } } } {code} was: When I asked [a question on Stackoverflow|https://stackoverflow.com/questions/64683189/usage-of-broadcast-variables-when-using-only-spark-sql-api] and running some local tests, I came across a performance bottleneck when using the `where`-Condition `Column.isin`. I have a set of allowed-values ("whitelist") with a size that's handleable in-memory really good (about 10k values). I thought simply using the `Column.isin` Expression in the SQL API should be the way to go. I assumed it would be runtime equivalent to ```scala df.filter(row => allowedValues.contains(row.getInt(0))) ``` however, when running a few tests locally, I realized that using `Column.isin` is actually about 10 times slower than a ```rdd.filter``` or a broadcast-inner-join. Shouldn't ```df.where(col("colname").isin(allowedValues))``` perform (SQL-API overhead aside) perform as good as ```df.filter(row => allowedValues.contains(row.getInt(0)))``` ? ```scala package example import org.apache.spark.sql.functions.{broadcast, col,
[jira] [Created] (SPARK-33383) Improve performance of Column.isin Expression
Felix Wollschläger created SPARK-33383: -- Summary: Improve performance of Column.isin Expression Key: SPARK-33383 URL: https://issues.apache.org/jira/browse/SPARK-33383 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.1, 2.4.4 Environment: macOS Spark(-SQL) 2.4.4 and 3.0.1 Scala 2.12.10 Reporter: Felix Wollschläger When I asked [a question on Stackoverflow|https://stackoverflow.com/questions/64683189/usage-of-broadcast-variables-when-using-only-spark-sql-api] and running some local tests, I came across a performance bottleneck when using the `where`-Condition `Column.isin`. I have a set of allowed-values ("whitelist") with a size that's handleable in-memory really good (about 10k values). I thought simply using the `Column.isin` Expression in the SQL API should be the way to go. I assumed it would be runtime equivalent to ```scala df.filter(row => allowedValues.contains(row.getInt(0))) ``` however, when running a few tests locally, I realized that using `Column.isin` is actually about 10 times slower than a ```rdd.filter``` or a broadcast-inner-join. Shouldn't ```df.where(col("colname").isin(allowedValues))``` perform (SQL-API overhead aside) perform as good as ```df.filter(row => allowedValues.contains(row.getInt(0)))``` ? ```scala package example import org.apache.spark.sql.functions.{broadcast, col, count} import org.apache.spark.sql.{DataFrame, SparkSession} import scala.util.Random object Test { def main(args: Array[String]): Unit = { val spark = SparkSession.builder() .appName("Name") .master("local[*]") .config("spark.driver.host", "localhost") .config("spark.ui.enabled", "false") .getOrCreate() import spark.implicits._ val _10Million = 1000 val random = new Random(1048394789305L) val values = Seq.fill(_10Million)(random.nextInt()) val df = Seq.fill(_10Million)(random.nextInt()).toDF("value") val allowedValues = getRandomElements(values, random, 1) println("Starting ...") runWithInCollection(spark, df, allowedValues) println(" In Collection") runWithBroadcastDF(spark, df, allowedValues) println(" Broadcast DF") runWithBroadcastVariable(spark, df, allowedValues) println(" Broadcast Variable") } def getRandomElements[A](seq: Seq[A], random: Random, size: Int): Set[A] = { val builder = Set.newBuilder[A] for (i <- 0 until size) { builder += getRandomElement(seq, random) } builder.result() } def getRandomElement[A](seq: Seq[A], random: Random): A = { seq(random.nextInt(seq.length)) } // I expected this one to be almost equivalent to the one with a broadcast-variable, but it's actually about 10 times slower def runWithInCollection(spark: SparkSession, df: DataFrame, allowedValues: Set[Int]): Unit = { spark.time { df.where(col("value").isInCollection(allowedValues)).runTestAggregation() } } // A bit slower than the one with a broadcast variable def runWithBroadcastDF(spark: SparkSession, df: DataFrame, allowedValues: Set[Int]): Unit = { import spark.implicits._ val allowedValuesDF = allowedValues.toSeq.toDF("allowedValue") spark.time { df.join(broadcast(allowedValuesDF), col("value") === col("allowedValue")).runTestAggregation() } } // This is actually the fastest one def runWithBroadcastVariable(spark: SparkSession, df: DataFrame, allowedValues: Set[Int]): Unit = { val allowedValuesBroadcast = spark.sparkContext.broadcast(allowedValues) spark.time { df.filter(row => allowedValuesBroadcast.value.contains(row.getInt(0))).runTestAggregation() } } implicit class TestRunner(val df: DataFrame) { def runTestAggregation(): Unit = { df.agg(count("value")).show() } } } ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30527) Add IsNotNull filter when use In, InSet and InSubQuery
[ https://issues.apache.org/jira/browse/SPARK-30527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-30527. - Resolution: Invalid > Add IsNotNull filter when use In, InSet and InSubQuery > -- > > Key: SPARK-30527 > URL: https://issues.apache.org/jira/browse/SPARK-30527 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: ulysses you >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32860) Encoders::bean doc incorrectly states maps are not supported
[ https://issues.apache.org/jira/browse/SPARK-32860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-32860. -- Fix Version/s: 3.1.0 3.0.2 Resolution: Fixed Fixed in https://github.com/apache/spark/pull/30274 > Encoders::bean doc incorrectly states maps are not supported > > > Key: SPARK-32860 > URL: https://issues.apache.org/jira/browse/SPARK-32860 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 2.4.6, 3.0.1, 3.1.0 >Reporter: Dan Ziemba >Assignee: Dan Ziemba >Priority: Trivial > Labels: starter > Fix For: 3.0.2, 3.1.0 > > > The documentation for the bean method in the Encoders class currently states: > {quote}collection types: only array and java.util.List currently, map support > is in progress > {quote} > But map support appears to work properly and has been available since 2.1.0 > according to SPARK-16706. Documentation should be updated to match what is / > is not actually supported (Set, Queue, etc?). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32860) Encoders::bean doc incorrectly states maps are not supported
[ https://issues.apache.org/jira/browse/SPARK-32860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-32860: Assignee: Dan Ziemba > Encoders::bean doc incorrectly states maps are not supported > > > Key: SPARK-32860 > URL: https://issues.apache.org/jira/browse/SPARK-32860 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 2.4.6, 3.0.1, 3.1.0 >Reporter: Dan Ziemba >Assignee: Dan Ziemba >Priority: Trivial > Labels: starter > > The documentation for the bean method in the Encoders class currently states: > {quote}collection types: only array and java.util.List currently, map support > is in progress > {quote} > But map support appears to work properly and has been available since 2.1.0 > according to SPARK-16706. Documentation should be updated to match what is / > is not actually supported (Set, Queue, etc?). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33371) Support Python 3.9+ in PySpark
[ https://issues.apache.org/jira/browse/SPARK-33371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227971#comment-17227971 ] Apache Spark commented on SPARK-33371: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/30288 > Support Python 3.9+ in PySpark > -- > > Key: SPARK-33371 > URL: https://issues.apache.org/jira/browse/SPARK-33371 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.1, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.1.0 > > > Python 3.9 works with PySpark. we should fix setup.py. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org