[jira] [Resolved] (SPARK-47946) Nested field's nullable value could be invalid after extracted using GetStructField
[ https://issues.apache.org/jira/browse/SPARK-47946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Linhong Liu resolved SPARK-47946. - Resolution: Not A Problem > Nested field's nullable value could be invalid after extracted using > GetStructField > --- > > Key: SPARK-47946 > URL: https://issues.apache.org/jira/browse/SPARK-47946 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.4.2 >Reporter: Junyoung Cho >Priority: Major > > I've got error when append to table using DataFrameWriterV2. > The error was occured in TableOutputResolver.checkNullability. This error > occurs when the data type of the schema is the same, but the order of the > fields is different. > I found that GetStructField.nullable returns unexpected result. > {code:java} > override def nullable: Boolean = child.nullable || > childSchema(ordinal).nullable {code} > Even if nested field has not nullability attribute, it returns true when > parent struct has nullability attribute. > ||Parent nullability||Child nullability||Result|| > |true|true|true| > |{color:#ff}true{color}|{color:#ff}false{color}|{color:#ff}true{color}| > |{color:#172b4d}false{color}|{color:#172b4d}true{color}|{color:#172b4d}true{color}| > |false|false|false| > > I think the logic should be changed to get just child's nullability, because > both of parent and child should be nullable to be considered nullable. > > {code:java} > override def nullable: Boolean = childSchema(ordinal).nullable {code} > > > > I want to check current logic is reasonable, or my suggestion can occur other > side effect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47946) Nested field's nullable value could be invalid after extracted using GetStructField
[ https://issues.apache.org/jira/browse/SPARK-47946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846773#comment-17846773 ] Linhong Liu commented on SPARK-47946: - No, it's not an issue. think about this ||key||value (nullable=true)|| |a|{"x": 1, "y": 2}| |b|null| |c|{"x": null, "y": 3}| let's assume `value.y` cannot be null (e.g. nullable = false), and run `select value.y from tbl`, what's the result? and what's the nullability of this column? it should be ||y|| |2| |null| |2| > Nested field's nullable value could be invalid after extracted using > GetStructField > --- > > Key: SPARK-47946 > URL: https://issues.apache.org/jira/browse/SPARK-47946 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.4.2 >Reporter: Junyoung Cho >Priority: Major > > I've got error when append to table using DataFrameWriterV2. > The error was occured in TableOutputResolver.checkNullability. This error > occurs when the data type of the schema is the same, but the order of the > fields is different. > I found that GetStructField.nullable returns unexpected result. > {code:java} > override def nullable: Boolean = child.nullable || > childSchema(ordinal).nullable {code} > Even if nested field has not nullability attribute, it returns true when > parent struct has nullability attribute. > ||Parent nullability||Child nullability||Result|| > |true|true|true| > |{color:#ff}true{color}|{color:#ff}false{color}|{color:#ff}true{color}| > |{color:#172b4d}false{color}|{color:#172b4d}true{color}|{color:#172b4d}true{color}| > |false|false|false| > > I think the logic should be changed to get just child's nullability, because > both of parent and child should be nullable to be considered nullable. > > {code:java} > override def nullable: Boolean = childSchema(ordinal).nullable {code} > > > > I want to check current logic is reasonable, or my suggestion can occur other > side effect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44577) INSERT BY NAME returns non-sensical error message
[ https://issues.apache.org/jira/browse/SPARK-44577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748745#comment-17748745 ] Linhong Liu commented on SPARK-44577: - [~fanjia] could you make a followup PR to fix this? The error is at: [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala#L243] we should do something like: ``` val pathInfo = if (colPath.empty) { "table" } else { s"struct ${colPath.quoted}" } throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError( tableName, pathInfo, // the changes extraCols ) ``` > INSERT BY NAME returns non-sensical error message > - > > Key: SPARK-44577 > URL: https://issues.apache.org/jira/browse/SPARK-44577 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Major > > CREATE TABLE bug(c1 INT); > INSERT INTO bug BY NAME SELECT 1 AS c2; > ==> Multi-part identifier cannot be empty. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44577) INSERT BY NAME returns non-sensical error message
[ https://issues.apache.org/jira/browse/SPARK-44577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748746#comment-17748746 ] Linhong Liu commented on SPARK-44577: - cc [~cloud_fan] > INSERT BY NAME returns non-sensical error message > - > > Key: SPARK-44577 > URL: https://issues.apache.org/jira/browse/SPARK-44577 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Major > > CREATE TABLE bug(c1 INT); > INSERT INTO bug BY NAME SELECT 1 AS c2; > ==> Multi-part identifier cannot be empty. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41752) UI improvement for nested SQL executions
Linhong Liu created SPARK-41752: --- Summary: UI improvement for nested SQL executions Key: SPARK-41752 URL: https://issues.apache.org/jira/browse/SPARK-41752 Project: Spark Issue Type: Task Components: SQL, Web UI Affects Versions: 3.4.0 Reporter: Linhong Liu in SPARK-41713, the CTAS will trigger a sub-execution to perform the data insertion. But the UI will display two independent queries, it will confuse users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40292) arrays_zip output unexpected alias column names
[ https://issues.apache.org/jira/browse/SPARK-40292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Linhong Liu updated SPARK-40292: Description: For the below query: {code:sql} with q as ( select named_struct( 'my_array', array(named_struct('x', 1, 'y', 2)) ) as my_struct ) select arrays_zip(my_struct.my_array) from q {code} The latest spark gives the below schema, the field name "my_array" was changed to "0" {code:java} root |-- arrays_zip(my_struct.my_array): array (nullable = true) | |-- element: struct (containsNull = false) | | |-- 0: struct (nullable = true) | | | |-- x: integer (nullable = true) | | | |-- y: integer (nullable = true){code} While Spark 3.1 gives the expected result {code:java} root |-- arrays_zip(my_struct.my_array): array (nullable = true) ||-- element: struct (containsNull = false) |||-- my_array: struct (nullable = true) ||||-- x: integer (nullable = true) ||||-- y: integer (nullable = true) {code} was: For the below query: {code:sql} with q as ( select named_struct( 'my_array', array(named_struct('x', 1, 'y', 2)) ) as my_struct ) select arrays_zip(my_struct.my_array) from q {code} The latest spark gives the below schema, the field name "my_array" was changed to "0" {code:java} root |-- arrays_zip(my_struct.my_array): array (nullable = true) | |-- element: struct (containsNull = false) | | |-- 0: struct (nullable = true) | | | |-- x: integer (nullable = true) | | | |-- y: integer (nullable = true){code} While Spark 3.1 gives the expected result {code:java} root |-- arrays_zip(my_struct.my_array): array (nullable = true) ||-- element: struct (containsNull = false) |||-- my_array: struct (nullable = true) ||||-- x: integer (nullable = true) ||||-- y: integer (nullable = true) {code} > arrays_zip output unexpected alias column names > --- > > Key: SPARK-40292 > URL: https://issues.apache.org/jira/browse/SPARK-40292 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Linhong Liu >Priority: Major > > For the below query: > {code:sql} > with q as ( > select > named_struct( > 'my_array', array(named_struct('x', 1, 'y', 2)) > ) as my_struct > ) > select > arrays_zip(my_struct.my_array) > from > q {code} > The latest spark gives the below schema, the field name "my_array" was > changed to "0" > {code:java} > root > |-- arrays_zip(my_struct.my_array): array (nullable = true) > | |-- element: struct (containsNull = false) > | | |-- 0: struct (nullable = true) > | | | |-- x: integer (nullable = true) > | | | |-- y: integer (nullable = true){code} > While Spark 3.1 gives the expected result > {code:java} > root > |-- arrays_zip(my_struct.my_array): array (nullable = true) > ||-- element: struct (containsNull = false) > |||-- my_array: struct (nullable = true) > ||||-- x: integer (nullable = true) > ||||-- y: integer (nullable = true) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40292) arrays_zip output unexpected alias column names
[ https://issues.apache.org/jira/browse/SPARK-40292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Linhong Liu updated SPARK-40292: Description: For the below query: {code:sql} with q as ( select named_struct( 'my_array', array(named_struct('x', 1, 'y', 2)) ) as my_struct ) select arrays_zip(my_struct.my_array) from q {code} The latest spark gives the below schema, the field name "my_array" was changed to "0" {code:java} root |-- arrays_zip(my_struct.my_array): array (nullable = true) | |-- element: struct (containsNull = false) | | |-- 0: struct (nullable = true) | | | |-- x: integer (nullable = true) | | | |-- y: integer (nullable = true){code} While Spark 3.1 gives the expected result {code:java} root |-- arrays_zip(my_struct.my_array): array (nullable = true) ||-- element: struct (containsNull = false) |||-- my_array: struct (nullable = true) ||||-- x: integer (nullable = true) ||||-- y: integer (nullable = true) {code} was: For the below query: {code:java} with q as ( select named_struct( 'my_array', array(named_struct('x', 1, 'y', 2)) ) as my_struct ) select arrays_zip(my_struct.my_array) from q {code} The latest spark gives the below schema, the field name "my_array" was changed to "0" root |-- arrays_zip(my_struct.my_array): array (nullable = true) ||-- element: struct (containsNull = false) |||-- 0: struct (nullable = true) ||||-- x: integer (nullable = true) ||||-- y: integer (nullable = true) But the Spark 3.1 gives expected result root |-- arrays_zip(my_struct.my_array): array (nullable = true) ||-- element: struct (containsNull = false) |||-- my_array: struct (nullable = true) ||||-- x: integer (nullable = true) ||||-- y: integer (nullable = true) > arrays_zip output unexpected alias column names > --- > > Key: SPARK-40292 > URL: https://issues.apache.org/jira/browse/SPARK-40292 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Linhong Liu >Priority: Major > > For the below query: > > {code:sql} > with q as ( > select > named_struct( > 'my_array', array(named_struct('x', 1, 'y', 2)) > ) as my_struct > ) > select > arrays_zip(my_struct.my_array) > from > q {code} > The latest spark gives the below schema, the field name "my_array" was > changed to "0" > {code:java} > root > |-- arrays_zip(my_struct.my_array): array (nullable = true) > | |-- element: struct (containsNull = false) > | | |-- 0: struct (nullable = true) > | | | |-- x: integer (nullable = true) > | | | |-- y: integer (nullable = true){code} > While Spark 3.1 gives the expected result > {code:java} > root > |-- arrays_zip(my_struct.my_array): array (nullable = true) > ||-- element: struct (containsNull = false) > |||-- my_array: struct (nullable = true) > ||||-- x: integer (nullable = true) > ||||-- y: integer (nullable = true) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40292) arrays_zip output unexpected alias column names
Linhong Liu created SPARK-40292: --- Summary: arrays_zip output unexpected alias column names Key: SPARK-40292 URL: https://issues.apache.org/jira/browse/SPARK-40292 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.4.0 Reporter: Linhong Liu For the below query: {code:java} with q as ( select named_struct( 'my_array', array(named_struct('x', 1, 'y', 2)) ) as my_struct ) select arrays_zip(my_struct.my_array) from q {code} The latest spark gives the below schema, the field name "my_array" was changed to "0" root |-- arrays_zip(my_struct.my_array): array (nullable = true) ||-- element: struct (containsNull = false) |||-- 0: struct (nullable = true) ||||-- x: integer (nullable = true) ||||-- y: integer (nullable = true) But the Spark 3.1 gives expected result root |-- arrays_zip(my_struct.my_array): array (nullable = true) ||-- element: struct (containsNull = false) |||-- my_array: struct (nullable = true) ||||-- x: integer (nullable = true) ||||-- y: integer (nullable = true) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40291) Improve the message for column not in group by clause error
Linhong Liu created SPARK-40291: --- Summary: Improve the message for column not in group by clause error Key: SPARK-40291 URL: https://issues.apache.org/jira/browse/SPARK-40291 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.4.0 Reporter: Linhong Liu Improve the message for column not in group by clause error to use the new error framework -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40213) Incorrect ASCII value for Latin-1 Supplement characters
Linhong Liu created SPARK-40213: --- Summary: Incorrect ASCII value for Latin-1 Supplement characters Key: SPARK-40213 URL: https://issues.apache.org/jira/browse/SPARK-40213 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.2 Reporter: Linhong Liu the `ascii()` built-in function in spark doesn't support Latin-1 Supplement characters which value between [128, 256). Instead, it produces a wrong value, -62 or -61 for all the chars. But the `chr()` built-in function supports value in [0, 256) and normally `ascii` should be the inverse of `chr()` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39207) Record SQL text when executing with SparkSession.sql()
Linhong Liu created SPARK-39207: --- Summary: Record SQL text when executing with SparkSession.sql() Key: SPARK-39207 URL: https://issues.apache.org/jira/browse/SPARK-39207 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.4.0 Reporter: Linhong Liu -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38550) Use a disk-based store to save more information in live UI to help debug
Linhong Liu created SPARK-38550: --- Summary: Use a disk-based store to save more information in live UI to help debug Key: SPARK-38550 URL: https://issues.apache.org/jira/browse/SPARK-38550 Project: Spark Issue Type: Task Components: Spark Core, SQL Affects Versions: 3.3.0 Reporter: Linhong Liu In Spark, the UI lacks troubleshooting abilities. For example: * AQE plan changes are not available * plan description of a large plan is truncated This is because the live UI depends on an in-memory KV store. We should always be worried about the stability issues when adding more information to the store. Therefore, it's better to add a disk-based store to save more information -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38318) regression when replacing a dataset view
Linhong Liu created SPARK-38318: --- Summary: regression when replacing a dataset view Key: SPARK-38318 URL: https://issues.apache.org/jira/browse/SPARK-38318 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.1, 3.2.0, 3.3.0 Reporter: Linhong Liu The below use case works well in 3.1 but failed in 3.2 and master. {code:java} sql("select 1").createOrReplaceTempView("v") sql("select * from v").createOrReplaceTempView("v") // in 3.1 it works well, and select will output 1 // in 3.2 it failed with error: "AnalysisException: Recursive view v detected (cycle: v -> v)"{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37202) Temp view didn't collect temp function that registered with catalog API
Linhong Liu created SPARK-37202: --- Summary: Temp view didn't collect temp function that registered with catalog API Key: SPARK-37202 URL: https://issues.apache.org/jira/browse/SPARK-37202 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.0 Reporter: Linhong Liu -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37067) DateTimeUtils.stringToTimestamp() incorrectly rejects timezone without colon
Linhong Liu created SPARK-37067: --- Summary: DateTimeUtils.stringToTimestamp() incorrectly rejects timezone without colon Key: SPARK-37067 URL: https://issues.apache.org/jira/browse/SPARK-37067 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.0, 3.1.0 Reporter: Linhong Liu For the zoneid with format like "+" or "+0730", it can be parsed by `ZoneId.of()` but will rejected by Spark's `DateTimeUtils.stringToTimestamp()`. it means we will return null for some valid datetime string, such as: `2021-10-11T03:58:03.000+0700` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36286) Block some invalid datetime string
Linhong Liu created SPARK-36286: --- Summary: Block some invalid datetime string Key: SPARK-36286 URL: https://issues.apache.org/jira/browse/SPARK-36286 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.3.0 Reporter: Linhong Liu In PR #32959, we found some weird datetime strings that can be parsed. ([details]([https://github.com/apache/spark/pull/32959#discussion_r665015489))] we should block them as well -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36241) support for creating tablewith void column datatype
[ https://issues.apache.org/jira/browse/SPARK-36241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Linhong Liu updated SPARK-36241: Summary: support for creating tablewith void column datatype (was: support for creating table/view with void column datatype) > support for creating tablewith void column datatype > --- > > Key: SPARK-36241 > URL: https://issues.apache.org/jira/browse/SPARK-36241 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Linhong Liu >Priority: Major > > previously we blocked creating tablewith void column datatype to follow the > hive behavior in PR: > [https://github.com/apache/spark/pull/28833] > > But according to the discussion here: > [https://github.com/apache/spark/pull/28833#discussion_r613003850] > creating a table/view with void datatype is actually useful, so we need to > restore the previous behvior -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36241) support for creating table with void column datatype
[ https://issues.apache.org/jira/browse/SPARK-36241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Linhong Liu updated SPARK-36241: Description: previously we blocked creating table with void column datatype to follow the hive behavior in PR: [https://github.com/apache/spark/pull/28833] But according to the discussion here: [https://github.com/apache/spark/pull/28833#discussion_r613003850] creating a table/view with void datatype is actually useful, so we need to restore the previous behvior was: previously we blocked creating tablewith void column datatype to follow the hive behavior in PR: [https://github.com/apache/spark/pull/28833] But according to the discussion here: [https://github.com/apache/spark/pull/28833#discussion_r613003850] creating a table/view with void datatype is actually useful, so we need to restore the previous behvior > support for creating table with void column datatype > > > Key: SPARK-36241 > URL: https://issues.apache.org/jira/browse/SPARK-36241 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Linhong Liu >Priority: Major > > previously we blocked creating table with void column datatype to follow the > hive behavior in PR: > [https://github.com/apache/spark/pull/28833] > > But according to the discussion here: > [https://github.com/apache/spark/pull/28833#discussion_r613003850] > creating a table/view with void datatype is actually useful, so we need to > restore the previous behvior -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36241) support for creating table/view with void column datatype
[ https://issues.apache.org/jira/browse/SPARK-36241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Linhong Liu updated SPARK-36241: Description: previously we blocked creating tablewith void column datatype to follow the hive behavior in PR: [https://github.com/apache/spark/pull/28833] But according to the discussion here: [https://github.com/apache/spark/pull/28833#discussion_r613003850] creating a table/view with void datatype is actually useful, so we need to restore the previous behvior was: previously we blocked creating table/view with void column datatype to follow the hive behavior in PR: [https://github.com/apache/spark/pull/28833] [https://github.com/apache/spark/pull/29152] But according to the discussion here: [https://github.com/apache/spark/pull/28833#discussion_r613003850] creating a table/view with void datatype is actually useful, so we need to restore the previous behvior > support for creating table/view with void column datatype > - > > Key: SPARK-36241 > URL: https://issues.apache.org/jira/browse/SPARK-36241 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Linhong Liu >Priority: Major > > previously we blocked creating tablewith void column datatype to follow the > hive behavior in PR: > [https://github.com/apache/spark/pull/28833] > > But according to the discussion here: > [https://github.com/apache/spark/pull/28833#discussion_r613003850] > creating a table/view with void datatype is actually useful, so we need to > restore the previous behvior -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36241) support for creating table with void column datatype
[ https://issues.apache.org/jira/browse/SPARK-36241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Linhong Liu updated SPARK-36241: Summary: support for creating table with void column datatype (was: support for creating tablewith void column datatype) > support for creating table with void column datatype > > > Key: SPARK-36241 > URL: https://issues.apache.org/jira/browse/SPARK-36241 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Linhong Liu >Priority: Major > > previously we blocked creating tablewith void column datatype to follow the > hive behavior in PR: > [https://github.com/apache/spark/pull/28833] > > But according to the discussion here: > [https://github.com/apache/spark/pull/28833#discussion_r613003850] > creating a table/view with void datatype is actually useful, so we need to > restore the previous behvior -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36241) support for creating table/view with void column datatype
Linhong Liu created SPARK-36241: --- Summary: support for creating table/view with void column datatype Key: SPARK-36241 URL: https://issues.apache.org/jira/browse/SPARK-36241 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.3.0 Reporter: Linhong Liu previously we blocked creating table/view with void column datatype to follow the hive behavior in PR: [https://github.com/apache/spark/pull/28833] [https://github.com/apache/spark/pull/29152] But according to the discussion here: [https://github.com/apache/spark/pull/28833#discussion_r613003850] creating a table/view with void datatype is actually useful, so we need to restore the previous behvior -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36224) Use "void" as the type name of NullType
Linhong Liu created SPARK-36224: --- Summary: Use "void" as the type name of NullType Key: SPARK-36224 URL: https://issues.apache.org/jira/browse/SPARK-36224 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.3.0 Reporter: Linhong Liu In PR: [https://github.com/apache/spark/pull/28833,] we support parsing "void" as NullType. But still use "null" as the type name. This leads some confusing and inconsistent issues. For example: `org.apache.spark.sql.types.DataType.fromDDL(NullType.toDDL)` is not working -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36223) TPCDSQueryTestSuite should run with different config set
Linhong Liu created SPARK-36223: --- Summary: TPCDSQueryTestSuite should run with different config set Key: SPARK-36223 URL: https://issues.apache.org/jira/browse/SPARK-36223 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.3.0 Reporter: Linhong Liu In current github actions we run TPCDSQueryTestSuite for tpcds benchmark. But it's only tested under default configurations. Since we have added the `spark.sql.join.forceApplyShuffledHashJoin` config. Now we can test all 3 join strategies in TPCDS to improve the coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36148) Missing validation of regexp_replace inputs
Linhong Liu created SPARK-36148: --- Summary: Missing validation of regexp_replace inputs Key: SPARK-36148 URL: https://issues.apache.org/jira/browse/SPARK-36148 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.3.0 Reporter: Linhong Liu sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala overrides checkInputDataTypes, but it doesn't call super.checkInputDataTypes, so basic type checking is disabled. {code:java} scala> spark.sql("""select regexp_replace(collect_list(1), "1", "2")""").collect() 221/07/14 20:58:38 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 72, Column 1: Assignment conversion not possible from type "org.apache.spark.sql.catalyst.util.ArrayData" to type "org.apache.spark.unsafe.types.UTF8String" 3org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 72, Column 1: Assignment conversion not possible from type "org.apache.spark.sql.catalyst.util.ArrayData" to type "org.apache.spark.unsafe.types.UTF8String" 4 at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12021) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35984) Add a config to force using ShuffledHashJoin for test purpose
Linhong Liu created SPARK-35984: --- Summary: Add a config to force using ShuffledHashJoin for test purpose Key: SPARK-35984 URL: https://issues.apache.org/jira/browse/SPARK-35984 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.0 Reporter: Linhong Liu In the join.sql, we want to cover all 3 join types. but the problem is currently the `spark.sql.join.preferSortMergeJoin = false` couldn't guarantee all the joins will use ShuffledHashJoin, so we need another config to force using hash join in the testing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35792) View should not capture configs used in `RelationConversions`
Linhong Liu created SPARK-35792: --- Summary: View should not capture configs used in `RelationConversions` Key: SPARK-35792 URL: https://issues.apache.org/jira/browse/SPARK-35792 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.0 Reporter: Linhong Liu RelationConversions is actually a optimization rule while it's executed in the analysis phase. For view, it's designed to only capture sementic configs, so we should ignore the configs related to `RelationConversions` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35780) Support DATE/TIMESTAMP literals across the full range
Linhong Liu created SPARK-35780: --- Summary: Support DATE/TIMESTAMP literals across the full range Key: SPARK-35780 URL: https://issues.apache.org/jira/browse/SPARK-35780 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.0 Reporter: Linhong Liu DATE/TIMESTAMP literals support years to . However, internally we support a range that is much larger. I can add or subtract large intervals from a date/timestamp and the system will happily process and display large negative and positive dates. Since we obviously cannot put this genie back into the bottle the only thing we can do is allow matching DATE/TIMESTAMP literals. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35686) Avoid using auto generated alias when creating view
Linhong Liu created SPARK-35686: --- Summary: Avoid using auto generated alias when creating view Key: SPARK-35686 URL: https://issues.apache.org/jira/browse/SPARK-35686 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.0 Reporter: Linhong Liu If the user creates a view in 2.4 and reads it in 3.2, there will be an incompatible schema issue. the root cause is that we changed the alias auto generation rule after 2.4. To avoid this happening again, we should let the user explicitly specifying the column names -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35685) Prompt recreating the View when there is a schema incompatible change
Linhong Liu created SPARK-35685: --- Summary: Prompt recreating the View when there is a schema incompatible change Key: SPARK-35685 URL: https://issues.apache.org/jira/browse/SPARK-35685 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.0 Reporter: Linhong Liu Prompt recreating the View when there is a schema incompatible change. Something like: "there is an incompatible schema change and the column couldn't be resolved. Please consider to recreate the view to fix this: CREATE OR REPLACE VIEW v AS xxx" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35440) Add language type to `ExpressionInfo` for UDF
[ https://issues.apache.org/jira/browse/SPARK-35440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Linhong Liu updated SPARK-35440: Description: add "scala", "java", "python", "hive", "built-in" > Add language type to `ExpressionInfo` for UDF > - > > Key: SPARK-35440 > URL: https://issues.apache.org/jira/browse/SPARK-35440 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.20 >Reporter: Linhong Liu >Priority: Major > > add "scala", "java", "python", "hive", "built-in" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35440) Add language type to `ExpressionInfo` for UDF
Linhong Liu created SPARK-35440: --- Summary: Add language type to `ExpressionInfo` for UDF Key: SPARK-35440 URL: https://issues.apache.org/jira/browse/SPARK-35440 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.20 Reporter: Linhong Liu -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35366) Avoid using deprecated `buildForBatch` and `buildForStreaming`
Linhong Liu created SPARK-35366: --- Summary: Avoid using deprecated `buildForBatch` and `buildForStreaming` Key: SPARK-35366 URL: https://issues.apache.org/jira/browse/SPARK-35366 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2 Reporter: Linhong Liu in DSv2 we are still using the deprecated functions. need to avoid this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35318) View internal properties should be hidden for describe table command
Linhong Liu created SPARK-35318: --- Summary: View internal properties should be hidden for describe table command Key: SPARK-35318 URL: https://issues.apache.org/jira/browse/SPARK-35318 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.2.0 Reporter: Linhong Liu when creating view, spark will save some internal properties as table properties. But this should not be displayed for describe table command because this should be transparent to the end user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34504) avoid unnecessary view resolving and remove the `performCheck` flag
Linhong Liu created SPARK-34504: --- Summary: avoid unnecessary view resolving and remove the `performCheck` flag Key: SPARK-34504 URL: https://issues.apache.org/jira/browse/SPARK-34504 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.1 Reporter: Linhong Liu in SPARK-34490, I added a `performCheck` flag to skip analysis check when resolving views. This is due to some view resolution is unnecessary. So we can avoid these unnecessary view resolution and remove the `performCheck` flag. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34490) table maybe resolved as a view if the table is dropped
Linhong Liu created SPARK-34490: --- Summary: table maybe resolved as a view if the table is dropped Key: SPARK-34490 URL: https://issues.apache.org/jira/browse/SPARK-34490 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.2 Reporter: Linhong Liu see discussion in https://github.com/apache/spark/pull/31550#issuecomment-781977326 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34260) UnresolvedException when creating temp view twice
Linhong Liu created SPARK-34260: --- Summary: UnresolvedException when creating temp view twice Key: SPARK-34260 URL: https://issues.apache.org/jira/browse/SPARK-34260 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.2, 3.1.2 Reporter: Linhong Liu when creating temp view twice, there is an UnresolvedException, queries to reproduce: {code:java} sql("create or replace temp view v as select * from (select * from range(10))") sql("create or replace temp view v as select * from (select * from range(10))") {code} error message: {noformat} org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to toAttribute on unresolved object, tree: * at org.apache.spark.sql.catalyst.analysis.Star.toAttribute(unresolved.scala:295) at org.apache.spark.sql.catalyst.plans.logical.Project.$anonfun$output$1(basicLogicalOperators.scala:62) at scala.collection.immutable.List.map(List.scala:293) at org.apache.spark.sql.catalyst.plans.logical.Project.output(basicLogicalOperators.scala:62) at org.apache.spark.sql.catalyst.plans.logical.SubqueryAlias.output(basicLogicalOperators.scala:945) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$allAttributes$1(QueryPlan.scala:431) at scala.collection.immutable.List.flatMap(List.scala:366) at org.apache.spark.sql.catalyst.plans.QueryPlan.allAttributes$lzycompute(QueryPlan.scala:431) at org.apache.spark.sql.catalyst.plans.QueryPlan.allAttributes(QueryPlan.scala:431) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$doCanonicalize$2(QueryPlan.scala:404) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116) at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.immutable.List.foreach(List.scala:431) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.immutable.List.map(List.scala:305) at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243) at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137) at org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:389) at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373) at org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372) at org.apache.spark.sql.catalyst.plans.QueryPlan.sameResult(QueryPlan.scala:420) at org.apache.spark.sql.execution.command.CreateViewCommand.run(views.scala:118) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3699) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3697) at org.apache.spark.sql.Dataset.(Dataset.scala:228) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615) at
[jira] [Created] (SPARK-34199) Block `count(table.*)` to follow ANSI standard and other SQL engines
Linhong Liu created SPARK-34199: --- Summary: Block `count(table.*)` to follow ANSI standard and other SQL engines Key: SPARK-34199 URL: https://issues.apache.org/jira/browse/SPARK-34199 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Linhong Liu In spark, the count(table.*) may cause very weird result, for example: select count(*) from (select 1 as a, null as b) t; output: 1 select count(t.*) from (select 1 as a, null as b) t; output: 0 After checking the ANSI standard, count(*) is always treated as count(1) while count(t.*) is not allowed. What's more, this is also not allowed by common databases, e.g. MySQL, oracle. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33692) Permanent view shouldn't use current catalog and namespace to lookup function
[ https://issues.apache.org/jira/browse/SPARK-33692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Linhong Liu updated SPARK-33692: Summary: Permanent view shouldn't use current catalog and namespace to lookup function (was: Permanent view shouldn't lookup temp functions) > Permanent view shouldn't use current catalog and namespace to lookup function > - > > Key: SPARK-33692 > URL: https://issues.apache.org/jira/browse/SPARK-33692 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Linhong Liu >Priority: Major > > Reproduce steps: > spark.sql("CREATE FUNCTION udf_plus AS 'udf.UdfPlus10' USING JAR > '/home/linhong.liu/spark-udf_2.12-0.1.0-SNAPSHOT.jar'") > spark.sql("create view v1 as select udf_plus(1)") > spark.sql("select * from v1").show() // output 11 > spark.sql("CREATE TEMPORARY FUNCTION udf_plus AS 'udf.UdfPlus20' USING JAR > '/home/linhong.liu/spark-udf_2.12-0.1.0-SNAPSHOT.jar'") > spark.sql("select * from v1").show() // throw exception > org.apache.spark.sql.AnalysisException: Attribute with name > 'default.udf_plus(1)' is not found in '(udf_plus(1))';; > Project [default.udf_plus(1)#60] > +- SubqueryAlias spark_catalog.default.v1 >+- View (`default`.`v1`, [default.udf_plus(1)#60]) > +- Project [HiveSimpleUDF#udf.UdfPlus20(1) AS udf_plus(1)#61] > +- OneRowRelation -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33692) Permanent view shouldn't lookup temp functions
Linhong Liu created SPARK-33692: --- Summary: Permanent view shouldn't lookup temp functions Key: SPARK-33692 URL: https://issues.apache.org/jira/browse/SPARK-33692 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.1 Reporter: Linhong Liu Reproduce steps: spark.sql("CREATE FUNCTION udf_plus AS 'udf.UdfPlus10' USING JAR '/home/linhong.liu/spark-udf_2.12-0.1.0-SNAPSHOT.jar'") spark.sql("create view v1 as select udf_plus(1)") spark.sql("select * from v1").show() // output 11 spark.sql("CREATE TEMPORARY FUNCTION udf_plus AS 'udf.UdfPlus20' USING JAR '/home/linhong.liu/spark-udf_2.12-0.1.0-SNAPSHOT.jar'") spark.sql("select * from v1").show() // throw exception org.apache.spark.sql.AnalysisException: Attribute with name 'default.udf_plus(1)' is not found in '(udf_plus(1))';; Project [default.udf_plus(1)#60] +- SubqueryAlias spark_catalog.default.v1 +- View (`default`.`v1`, [default.udf_plus(1)#60]) +- Project [HiveSimpleUDF#udf.UdfPlus20(1) AS udf_plus(1)#61] +- OneRowRelation -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33647) cache table not working for persisted view
Linhong Liu created SPARK-33647: --- Summary: cache table not working for persisted view Key: SPARK-33647 URL: https://issues.apache.org/jira/browse/SPARK-33647 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.1 Reporter: Linhong Liu In `CacheManager`, tables (including views) are cached by its logical plan, and use `QueryPlan.sameResult` to lookup the cache. But the PersistedView wraps the child plan with a `View` which always lead false for `sameResult` check. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33438) set -v couldn't dump all the conf entries
Linhong Liu created SPARK-33438: --- Summary: set -v couldn't dump all the conf entries Key: SPARK-33438 URL: https://issues.apache.org/jira/browse/SPARK-33438 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.1 Reporter: Linhong Liu since scala object is lazy init, it won't be load until some code touched it. For SQL conf entries, it won't be registered if the conf object is never touched. So "set -v" couldn't dump all the defined configs (even if it says so) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32898) totalExecutorRunTimeMs is too big
Linhong Liu created SPARK-32898: --- Summary: totalExecutorRunTimeMs is too big Key: SPARK-32898 URL: https://issues.apache.org/jira/browse/SPARK-32898 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.1 Reporter: Linhong Liu This might be because of incorrectly calculating executorRunTimeMs in Executor.scala The function collectAccumulatorsAndResetStatusOnFailure(taskStartTimeNs) can be called when taskStartTimeNs is not set yet (it is 0). As of now in master branch, here is the problematic code: [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L470] There is a throw exception before this line. The catch branch still updates the metric. However the query shows as SUCCESSful in QPL. Maybe this task is speculative. Not sure. submissionTime in LiveExecutionData may also have similar problem. [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala#L449] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32816) Planner error when aggregating multiple distinct DECIMAL columns
Linhong Liu created SPARK-32816: --- Summary: Planner error when aggregating multiple distinct DECIMAL columns Key: SPARK-32816 URL: https://issues.apache.org/jira/browse/SPARK-32816 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Linhong Liu Running different DISTINCT decimal aggregations causes a query planner error: {code:java} java.lang.RuntimeException: You hit a query analyzer bug. Please report your query to Spark user mailing list. at scala.sys.package$.error(package.scala:30) at org.apache.spark.sql.execution.SparkStrategies$Aggregation$.apply(SparkStrategies.scala:473) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:67) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:97) at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:74) at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:82) at scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:162) at scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:162) at scala.collection.Iterator.foreach(Iterator.scala:941) at scala.collection.Iterator.foreach$(Iterator.scala:941) {code} example failing query {code:java} import org.apache.spark.util.Utils // Changing decimal(9, 0) to decimal(8, 0) fixes the problem. Root cause seems to have to do with // UnscaledValue being used in one of the expressions but not the other. val df = spark.range(0, 5, 1, 1).selectExpr( "id", "cast(id as decimal(9, 0)) as ss_ext_list_price") val cacheDir = Utils.createTempDir().getCanonicalPath df.write.parquet(cacheDir) spark.read.parquet(cacheDir).createOrReplaceTempView("test_table") spark.sql(""" select avg(distinct ss_ext_list_price), sum(distinct ss_ext_list_price) from test_table""").explain {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32761) Planner error when aggregating multiple distinct Constant columns
[ https://issues.apache.org/jira/browse/SPARK-32761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Linhong Liu updated SPARK-32761: Description: SELECT COUNT(DISTINCT 2), COUNT(DISTINCT 2, 3) will trigger this bug. The problematic code is: {code:java} val distinctAggGroups = aggExpressions.filter(_.isDistinct).groupBy { e => val unfoldableChildren = e.aggregateFunction.children.filter(!_.foldable).toSet if (unfoldableChildren.nonEmpty) { // Only expand the unfoldable children unfoldableChildren } else { // If aggregateFunction's children are all foldable // we must expand at least one of the children (here we take the first child), // or If we don't, we will get the wrong result, for example: // count(distinct 1) will be explained to count(1) after the rewrite function. // Generally, the distinct aggregateFunction should not run // foldable TypeCheck for the first child. e.aggregateFunction.children.take(1).toSet } } {code} was: SELECT COUNT(DISTINCT 2), COUNT(DISTINCT 3) will trigger this bug. The problematic code is: {code:java} val distinctAggGroups = aggExpressions.filter(_.isDistinct).groupBy { e => val unfoldableChildren = e.aggregateFunction.children.filter(!_.foldable).toSet if (unfoldableChildren.nonEmpty) { // Only expand the unfoldable children unfoldableChildren } else { // If aggregateFunction's children are all foldable // we must expand at least one of the children (here we take the first child), // or If we don't, we will get the wrong result, for example: // count(distinct 1) will be explained to count(1) after the rewrite function. // Generally, the distinct aggregateFunction should not run // foldable TypeCheck for the first child. e.aggregateFunction.children.take(1).toSet } } {code} > Planner error when aggregating multiple distinct Constant columns > - > > Key: SPARK-32761 > URL: https://issues.apache.org/jira/browse/SPARK-32761 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Linhong Liu >Priority: Major > > SELECT COUNT(DISTINCT 2), COUNT(DISTINCT 2, 3) will trigger this bug. > The problematic code is: > > {code:java} > val distinctAggGroups = aggExpressions.filter(_.isDistinct).groupBy { e => > val unfoldableChildren = > e.aggregateFunction.children.filter(!_.foldable).toSet > if (unfoldableChildren.nonEmpty) { > // Only expand the unfoldable children > unfoldableChildren > } else { > // If aggregateFunction's children are all foldable > // we must expand at least one of the children (here we take the first > child), > // or If we don't, we will get the wrong result, for example: > // count(distinct 1) will be explained to count(1) after the rewrite > function. > // Generally, the distinct aggregateFunction should not run > // foldable TypeCheck for the first child. > e.aggregateFunction.children.take(1).toSet > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32761) Planner error when aggregating multiple distinct Constant columns
Linhong Liu created SPARK-32761: --- Summary: Planner error when aggregating multiple distinct Constant columns Key: SPARK-32761 URL: https://issues.apache.org/jira/browse/SPARK-32761 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Linhong Liu SELECT COUNT(DISTINCT 2), COUNT(DISTINCT 3) will trigger this bug. The problematic code is: {code:java} val distinctAggGroups = aggExpressions.filter(_.isDistinct).groupBy { e => val unfoldableChildren = e.aggregateFunction.children.filter(!_.foldable).toSet if (unfoldableChildren.nonEmpty) { // Only expand the unfoldable children unfoldableChildren } else { // If aggregateFunction's children are all foldable // we must expand at least one of the children (here we take the first child), // or If we don't, we will get the wrong result, for example: // count(distinct 1) will be explained to count(1) after the rewrite function. // Generally, the distinct aggregateFunction should not run // foldable TypeCheck for the first child. e.aggregateFunction.children.take(1).toSet } } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32680) CTAS with V2 catalog wrongly accessed unresolved query
Linhong Liu created SPARK-32680: --- Summary: CTAS with V2 catalog wrongly accessed unresolved query Key: SPARK-32680 URL: https://issues.apache.org/jira/browse/SPARK-32680 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Linhong Liu Case: {{CREATE TABLE t USING delta AS SELECT * from nonexist }} Expected: throw AnalysisException with "Table or view not found" Actual: {{throw UnresolvedException with "org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to toAttribute on unresolved object, tree: *"}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org