[jira] [Resolved] (SPARK-47946) Nested field's nullable value could be invalid after extracted using GetStructField

2024-05-15 Thread Linhong Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Linhong Liu resolved SPARK-47946.
-
Resolution: Not A Problem

> Nested field's nullable value could be invalid after extracted using 
> GetStructField
> ---
>
> Key: SPARK-47946
> URL: https://issues.apache.org/jira/browse/SPARK-47946
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.4.2
>Reporter: Junyoung Cho
>Priority: Major
>
> I've got error when append to table using DataFrameWriterV2.
> The error was occured in TableOutputResolver.checkNullability. This error 
> occurs when the data type of the schema is the same, but the order of the 
> fields is different.
> I found that GetStructField.nullable returns unexpected result.
> {code:java}
> override def nullable: Boolean = child.nullable || 
> childSchema(ordinal).nullable {code}
> Even if nested field has not nullability attribute, it returns true when 
> parent struct has nullability attribute.
> ||Parent nullability||Child nullability||Result||
> |true|true|true|
> |{color:#ff}true{color}|{color:#ff}false{color}|{color:#ff}true{color}|
> |{color:#172b4d}false{color}|{color:#172b4d}true{color}|{color:#172b4d}true{color}|
> |false|false|false|
>  
> I think the logic should be changed to get just child's nullability, because 
> both of parent and child should be nullable to be considered nullable.
>  
> {code:java}
> override def nullable: Boolean = childSchema(ordinal).nullable  {code}
>  
>  
>  
> I want to check current logic is reasonable, or my suggestion can occur other 
> side effect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47946) Nested field's nullable value could be invalid after extracted using GetStructField

2024-05-15 Thread Linhong Liu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846773#comment-17846773
 ] 

Linhong Liu commented on SPARK-47946:
-

No, it's not an issue.

think about this

 
||key||value (nullable=true)||
|a|{"x": 1, "y": 2}|
|b|null|
|c|{"x": null, "y": 3}|

let's assume `value.y` cannot be null (e.g. nullable = false), and run `select 
value.y from tbl`, what's the result? and what's the nullability of this 
column? it should be

 

 
||y||
|2|
|null|
|2|

 

 

> Nested field's nullable value could be invalid after extracted using 
> GetStructField
> ---
>
> Key: SPARK-47946
> URL: https://issues.apache.org/jira/browse/SPARK-47946
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.4.2
>Reporter: Junyoung Cho
>Priority: Major
>
> I've got error when append to table using DataFrameWriterV2.
> The error was occured in TableOutputResolver.checkNullability. This error 
> occurs when the data type of the schema is the same, but the order of the 
> fields is different.
> I found that GetStructField.nullable returns unexpected result.
> {code:java}
> override def nullable: Boolean = child.nullable || 
> childSchema(ordinal).nullable {code}
> Even if nested field has not nullability attribute, it returns true when 
> parent struct has nullability attribute.
> ||Parent nullability||Child nullability||Result||
> |true|true|true|
> |{color:#ff}true{color}|{color:#ff}false{color}|{color:#ff}true{color}|
> |{color:#172b4d}false{color}|{color:#172b4d}true{color}|{color:#172b4d}true{color}|
> |false|false|false|
>  
> I think the logic should be changed to get just child's nullability, because 
> both of parent and child should be nullable to be considered nullable.
>  
> {code:java}
> override def nullable: Boolean = childSchema(ordinal).nullable  {code}
>  
>  
>  
> I want to check current logic is reasonable, or my suggestion can occur other 
> side effect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44577) INSERT BY NAME returns non-sensical error message

2023-07-28 Thread Linhong Liu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748745#comment-17748745
 ] 

Linhong Liu commented on SPARK-44577:
-

[~fanjia] could you make a followup PR to fix this?

The error is at: 
[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala#L243]

we should do something like:

```

val pathInfo = if (colPath.empty) {

  "table"

} else {

  s"struct ${colPath.quoted}"

}

throw QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
     tableName,

    pathInfo,   // the changes

    extraCols
)

```

> INSERT BY NAME returns non-sensical error message
> -
>
> Key: SPARK-44577
> URL: https://issues.apache.org/jira/browse/SPARK-44577
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> CREATE TABLE bug(c1 INT);
> INSERT INTO bug BY NAME SELECT 1 AS c2;
> ==> Multi-part identifier cannot be empty.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44577) INSERT BY NAME returns non-sensical error message

2023-07-28 Thread Linhong Liu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748746#comment-17748746
 ] 

Linhong Liu commented on SPARK-44577:
-

cc [~cloud_fan] 

> INSERT BY NAME returns non-sensical error message
> -
>
> Key: SPARK-44577
> URL: https://issues.apache.org/jira/browse/SPARK-44577
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> CREATE TABLE bug(c1 INT);
> INSERT INTO bug BY NAME SELECT 1 AS c2;
> ==> Multi-part identifier cannot be empty.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41752) UI improvement for nested SQL executions

2022-12-28 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-41752:
---

 Summary: UI improvement for nested SQL executions
 Key: SPARK-41752
 URL: https://issues.apache.org/jira/browse/SPARK-41752
 Project: Spark
  Issue Type: Task
  Components: SQL, Web UI
Affects Versions: 3.4.0
Reporter: Linhong Liu


in SPARK-41713, the CTAS will trigger a sub-execution to perform the data 
insertion. But the UI will display two independent queries, it will confuse 
users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40292) arrays_zip output unexpected alias column names

2022-08-31 Thread Linhong Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Linhong Liu updated SPARK-40292:

Description: 
For the below query:
{code:sql}
with q as (
  select
    named_struct(
      'my_array', array(named_struct('x', 1, 'y', 2))
    ) as my_struct
)
select
  arrays_zip(my_struct.my_array)
from
  q {code}
The latest spark gives the below schema, the field name "my_array" was changed 
to "0"
{code:java}
root
 |-- arrays_zip(my_struct.my_array): array (nullable = true)
 |    |-- element: struct (containsNull = false)
 |    |    |-- 0: struct (nullable = true)
 |    |    |    |-- x: integer (nullable = true)
 |    |    |    |-- y: integer (nullable = true){code}
While Spark 3.1 gives the expected result
{code:java}
root
 |-- arrays_zip(my_struct.my_array): array (nullable = true)
 ||-- element: struct (containsNull = false)
 |||-- my_array: struct (nullable = true)
 ||||-- x: integer (nullable = true)
 ||||-- y: integer (nullable = true)
{code}

  was:
For the below query:

 
{code:sql}
with q as (
  select
    named_struct(
      'my_array', array(named_struct('x', 1, 'y', 2))
    ) as my_struct
)
select
  arrays_zip(my_struct.my_array)
from
  q {code}
The latest spark gives the below schema, the field name "my_array" was changed 
to "0"
{code:java}
root
 |-- arrays_zip(my_struct.my_array): array (nullable = true)
 |    |-- element: struct (containsNull = false)
 |    |    |-- 0: struct (nullable = true)
 |    |    |    |-- x: integer (nullable = true)
 |    |    |    |-- y: integer (nullable = true){code}
While Spark 3.1 gives the expected result
{code:java}
root
 |-- arrays_zip(my_struct.my_array): array (nullable = true)
 ||-- element: struct (containsNull = false)
 |||-- my_array: struct (nullable = true)
 ||||-- x: integer (nullable = true)
 ||||-- y: integer (nullable = true)
{code}


> arrays_zip output unexpected alias column names
> ---
>
> Key: SPARK-40292
> URL: https://issues.apache.org/jira/browse/SPARK-40292
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Linhong Liu
>Priority: Major
>
> For the below query:
> {code:sql}
> with q as (
>   select
>     named_struct(
>       'my_array', array(named_struct('x', 1, 'y', 2))
>     ) as my_struct
> )
> select
>   arrays_zip(my_struct.my_array)
> from
>   q {code}
> The latest spark gives the below schema, the field name "my_array" was 
> changed to "0"
> {code:java}
> root
>  |-- arrays_zip(my_struct.my_array): array (nullable = true)
>  |    |-- element: struct (containsNull = false)
>  |    |    |-- 0: struct (nullable = true)
>  |    |    |    |-- x: integer (nullable = true)
>  |    |    |    |-- y: integer (nullable = true){code}
> While Spark 3.1 gives the expected result
> {code:java}
> root
>  |-- arrays_zip(my_struct.my_array): array (nullable = true)
>  ||-- element: struct (containsNull = false)
>  |||-- my_array: struct (nullable = true)
>  ||||-- x: integer (nullable = true)
>  ||||-- y: integer (nullable = true)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40292) arrays_zip output unexpected alias column names

2022-08-31 Thread Linhong Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Linhong Liu updated SPARK-40292:

Description: 
For the below query:

 
{code:sql}
with q as (
  select
    named_struct(
      'my_array', array(named_struct('x', 1, 'y', 2))
    ) as my_struct
)
select
  arrays_zip(my_struct.my_array)
from
  q {code}
The latest spark gives the below schema, the field name "my_array" was changed 
to "0"
{code:java}
root
 |-- arrays_zip(my_struct.my_array): array (nullable = true)
 |    |-- element: struct (containsNull = false)
 |    |    |-- 0: struct (nullable = true)
 |    |    |    |-- x: integer (nullable = true)
 |    |    |    |-- y: integer (nullable = true){code}
While Spark 3.1 gives the expected result
{code:java}
root
 |-- arrays_zip(my_struct.my_array): array (nullable = true)
 ||-- element: struct (containsNull = false)
 |||-- my_array: struct (nullable = true)
 ||||-- x: integer (nullable = true)
 ||||-- y: integer (nullable = true)
{code}

  was:
For the below query:

 
{code:java}
with q as (
  select
    named_struct(
      'my_array', array(named_struct('x', 1, 'y', 2))
    ) as my_struct
)
select
  arrays_zip(my_struct.my_array)
from
  q {code}
The latest spark gives the below schema, the field name "my_array" was changed 
to "0"

 
root
 |-- arrays_zip(my_struct.my_array): array (nullable = true)
 ||-- element: struct (containsNull = false)
 |||-- 0: struct (nullable = true)
 ||||-- x: integer (nullable = true)
 ||||-- y: integer (nullable = true)
 
But the Spark 3.1 gives expected result
root
 |-- arrays_zip(my_struct.my_array): array (nullable = true)
 ||-- element: struct (containsNull = false)
 |||-- my_array: struct (nullable = true)
 ||||-- x: integer (nullable = true)
 ||||-- y: integer (nullable = true)
 


> arrays_zip output unexpected alias column names
> ---
>
> Key: SPARK-40292
> URL: https://issues.apache.org/jira/browse/SPARK-40292
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Linhong Liu
>Priority: Major
>
> For the below query:
>  
> {code:sql}
> with q as (
>   select
>     named_struct(
>       'my_array', array(named_struct('x', 1, 'y', 2))
>     ) as my_struct
> )
> select
>   arrays_zip(my_struct.my_array)
> from
>   q {code}
> The latest spark gives the below schema, the field name "my_array" was 
> changed to "0"
> {code:java}
> root
>  |-- arrays_zip(my_struct.my_array): array (nullable = true)
>  |    |-- element: struct (containsNull = false)
>  |    |    |-- 0: struct (nullable = true)
>  |    |    |    |-- x: integer (nullable = true)
>  |    |    |    |-- y: integer (nullable = true){code}
> While Spark 3.1 gives the expected result
> {code:java}
> root
>  |-- arrays_zip(my_struct.my_array): array (nullable = true)
>  ||-- element: struct (containsNull = false)
>  |||-- my_array: struct (nullable = true)
>  ||||-- x: integer (nullable = true)
>  ||||-- y: integer (nullable = true)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40292) arrays_zip output unexpected alias column names

2022-08-31 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-40292:
---

 Summary: arrays_zip output unexpected alias column names
 Key: SPARK-40292
 URL: https://issues.apache.org/jira/browse/SPARK-40292
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Linhong Liu


For the below query:

 
{code:java}
with q as (
  select
    named_struct(
      'my_array', array(named_struct('x', 1, 'y', 2))
    ) as my_struct
)
select
  arrays_zip(my_struct.my_array)
from
  q {code}
The latest spark gives the below schema, the field name "my_array" was changed 
to "0"

 
root
 |-- arrays_zip(my_struct.my_array): array (nullable = true)
 ||-- element: struct (containsNull = false)
 |||-- 0: struct (nullable = true)
 ||||-- x: integer (nullable = true)
 ||||-- y: integer (nullable = true)
 
But the Spark 3.1 gives expected result
root
 |-- arrays_zip(my_struct.my_array): array (nullable = true)
 ||-- element: struct (containsNull = false)
 |||-- my_array: struct (nullable = true)
 ||||-- x: integer (nullable = true)
 ||||-- y: integer (nullable = true)
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40291) Improve the message for column not in group by clause error

2022-08-31 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-40291:
---

 Summary: Improve the message for column not in group by clause 
error
 Key: SPARK-40291
 URL: https://issues.apache.org/jira/browse/SPARK-40291
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Linhong Liu


Improve the message for column not in group by clause error to use the new 
error framework



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40213) Incorrect ASCII value for Latin-1 Supplement characters

2022-08-24 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-40213:
---

 Summary: Incorrect ASCII value for Latin-1 Supplement characters
 Key: SPARK-40213
 URL: https://issues.apache.org/jira/browse/SPARK-40213
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.2
Reporter: Linhong Liu


the `ascii()` built-in function in spark doesn't support Latin-1 Supplement 
characters which value between [128, 256). Instead, it produces a wrong value, 
-62 or -61 for all the chars. But the `chr()` built-in function supports value 
in [0, 256) and normally `ascii` should be the inverse of `chr()`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39207) Record SQL text when executing with SparkSession.sql()

2022-05-17 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-39207:
---

 Summary: Record SQL text when executing with SparkSession.sql()
 Key: SPARK-39207
 URL: https://issues.apache.org/jira/browse/SPARK-39207
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Linhong Liu






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38550) Use a disk-based store to save more information in live UI to help debug

2022-03-14 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-38550:
---

 Summary: Use a disk-based store to save more information in live 
UI to help debug
 Key: SPARK-38550
 URL: https://issues.apache.org/jira/browse/SPARK-38550
 Project: Spark
  Issue Type: Task
  Components: Spark Core, SQL
Affects Versions: 3.3.0
Reporter: Linhong Liu


In Spark, the UI lacks troubleshooting abilities. For example:

* AQE plan changes are not available

* plan description of a large plan is truncated

This is because the live UI depends on an in-memory KV store. We should always 
be worried

about the stability issues when adding more information to the store.

Therefore, it's better to add a disk-based store to save more information



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38318) regression when replacing a dataset view

2022-02-24 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-38318:
---

 Summary: regression when replacing a dataset view
 Key: SPARK-38318
 URL: https://issues.apache.org/jira/browse/SPARK-38318
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.1, 3.2.0, 3.3.0
Reporter: Linhong Liu


The below use case works well in 3.1 but failed in 3.2 and master.
{code:java}
sql("select 1").createOrReplaceTempView("v")
sql("select * from v").createOrReplaceTempView("v")
// in 3.1 it works well, and select will output 1
// in 3.2 it failed with error: "AnalysisException: Recursive view v detected 
(cycle: v -> v)"{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37202) Temp view didn't collect temp function that registered with catalog API

2021-11-02 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-37202:
---

 Summary: Temp view didn't collect temp function that registered 
with catalog API
 Key: SPARK-37202
 URL: https://issues.apache.org/jira/browse/SPARK-37202
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Linhong Liu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37067) DateTimeUtils.stringToTimestamp() incorrectly rejects timezone without colon

2021-10-19 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-37067:
---

 Summary: DateTimeUtils.stringToTimestamp() incorrectly rejects 
timezone without colon
 Key: SPARK-37067
 URL: https://issues.apache.org/jira/browse/SPARK-37067
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0, 3.1.0
Reporter: Linhong Liu


For the zoneid with format like "+" or "+0730", it can be parsed by 
`ZoneId.of()` but will rejected by Spark's `DateTimeUtils.stringToTimestamp()`. 
it means we will return null for some valid datetime string, such as: 
`2021-10-11T03:58:03.000+0700`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36286) Block some invalid datetime string

2021-07-26 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-36286:
---

 Summary: Block some invalid datetime string
 Key: SPARK-36286
 URL: https://issues.apache.org/jira/browse/SPARK-36286
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Linhong Liu


In PR #32959, we found some weird datetime strings that can be parsed. 
([details]([https://github.com/apache/spark/pull/32959#discussion_r665015489))]

we should block them as well



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36241) support for creating tablewith void column datatype

2021-07-22 Thread Linhong Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Linhong Liu updated SPARK-36241:

Summary: support for creating tablewith void column datatype  (was: support 
for creating table/view with void column datatype)

> support for creating tablewith void column datatype
> ---
>
> Key: SPARK-36241
> URL: https://issues.apache.org/jira/browse/SPARK-36241
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Linhong Liu
>Priority: Major
>
> previously we blocked creating tablewith void column datatype to follow the 
> hive behavior in PR: 
> [https://github.com/apache/spark/pull/28833]
>  
> But according to the discussion here: 
> [https://github.com/apache/spark/pull/28833#discussion_r613003850]
> creating a table/view with void datatype is actually useful, so we need to 
> restore the previous behvior



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36241) support for creating table with void column datatype

2021-07-22 Thread Linhong Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Linhong Liu updated SPARK-36241:

Description: 
previously we blocked creating table with void column datatype to follow the 
hive behavior in PR: 

[https://github.com/apache/spark/pull/28833]

 

But according to the discussion here: 
[https://github.com/apache/spark/pull/28833#discussion_r613003850]

creating a table/view with void datatype is actually useful, so we need to 
restore the previous behvior

  was:
previously we blocked creating tablewith void column datatype to follow the 
hive behavior in PR: 

[https://github.com/apache/spark/pull/28833]

 

But according to the discussion here: 
[https://github.com/apache/spark/pull/28833#discussion_r613003850]

creating a table/view with void datatype is actually useful, so we need to 
restore the previous behvior


> support for creating table with void column datatype
> 
>
> Key: SPARK-36241
> URL: https://issues.apache.org/jira/browse/SPARK-36241
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Linhong Liu
>Priority: Major
>
> previously we blocked creating table with void column datatype to follow the 
> hive behavior in PR: 
> [https://github.com/apache/spark/pull/28833]
>  
> But according to the discussion here: 
> [https://github.com/apache/spark/pull/28833#discussion_r613003850]
> creating a table/view with void datatype is actually useful, so we need to 
> restore the previous behvior



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36241) support for creating table/view with void column datatype

2021-07-22 Thread Linhong Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Linhong Liu updated SPARK-36241:

Description: 
previously we blocked creating tablewith void column datatype to follow the 
hive behavior in PR: 

[https://github.com/apache/spark/pull/28833]

 

But according to the discussion here: 
[https://github.com/apache/spark/pull/28833#discussion_r613003850]

creating a table/view with void datatype is actually useful, so we need to 
restore the previous behvior

  was:
previously we blocked creating table/view with void column datatype to follow 
the hive behavior in PR: 

[https://github.com/apache/spark/pull/28833]

[https://github.com/apache/spark/pull/29152]

 

But according to the discussion here: 
[https://github.com/apache/spark/pull/28833#discussion_r613003850]

creating a table/view with void datatype is actually useful, so we need to 
restore the previous behvior


> support for creating table/view with void column datatype
> -
>
> Key: SPARK-36241
> URL: https://issues.apache.org/jira/browse/SPARK-36241
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Linhong Liu
>Priority: Major
>
> previously we blocked creating tablewith void column datatype to follow the 
> hive behavior in PR: 
> [https://github.com/apache/spark/pull/28833]
>  
> But according to the discussion here: 
> [https://github.com/apache/spark/pull/28833#discussion_r613003850]
> creating a table/view with void datatype is actually useful, so we need to 
> restore the previous behvior



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36241) support for creating table with void column datatype

2021-07-22 Thread Linhong Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Linhong Liu updated SPARK-36241:

Summary: support for creating table with void column datatype  (was: 
support for creating tablewith void column datatype)

> support for creating table with void column datatype
> 
>
> Key: SPARK-36241
> URL: https://issues.apache.org/jira/browse/SPARK-36241
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Linhong Liu
>Priority: Major
>
> previously we blocked creating tablewith void column datatype to follow the 
> hive behavior in PR: 
> [https://github.com/apache/spark/pull/28833]
>  
> But according to the discussion here: 
> [https://github.com/apache/spark/pull/28833#discussion_r613003850]
> creating a table/view with void datatype is actually useful, so we need to 
> restore the previous behvior



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36241) support for creating table/view with void column datatype

2021-07-21 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-36241:
---

 Summary: support for creating table/view with void column datatype
 Key: SPARK-36241
 URL: https://issues.apache.org/jira/browse/SPARK-36241
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Linhong Liu


previously we blocked creating table/view with void column datatype to follow 
the hive behavior in PR: 

[https://github.com/apache/spark/pull/28833]

[https://github.com/apache/spark/pull/29152]

 

But according to the discussion here: 
[https://github.com/apache/spark/pull/28833#discussion_r613003850]

creating a table/view with void datatype is actually useful, so we need to 
restore the previous behvior



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36224) Use "void" as the type name of NullType

2021-07-20 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-36224:
---

 Summary: Use "void" as the type name of NullType
 Key: SPARK-36224
 URL: https://issues.apache.org/jira/browse/SPARK-36224
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Linhong Liu


In PR: [https://github.com/apache/spark/pull/28833,] we support parsing "void" 
as NullType. But still use "null" as the type name. This leads some confusing 
and inconsistent issues. For example:

`org.apache.spark.sql.types.DataType.fromDDL(NullType.toDDL)` is not working



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36223) TPCDSQueryTestSuite should run with different config set

2021-07-20 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-36223:
---

 Summary: TPCDSQueryTestSuite should run with different config set
 Key: SPARK-36223
 URL: https://issues.apache.org/jira/browse/SPARK-36223
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Linhong Liu


In current github actions we run TPCDSQueryTestSuite for tpcds benchmark. But 
it's only tested under default configurations. Since we have added the 
`spark.sql.join.forceApplyShuffledHashJoin` config. Now we can test all 3 join 
strategies in TPCDS to improve the coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36148) Missing validation of regexp_replace inputs

2021-07-14 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-36148:
---

 Summary: Missing validation of regexp_replace inputs
 Key: SPARK-36148
 URL: https://issues.apache.org/jira/browse/SPARK-36148
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Linhong Liu


sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
 overrides checkInputDataTypes, but it doesn't call super.checkInputDataTypes, 
so basic type checking is disabled.

 
{code:java}
scala> spark.sql("""select regexp_replace(collect_list(1), "1", 
"2")""").collect()
221/07/14 20:58:38 ERROR CodeGenerator: failed to compile: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 72, 
Column 1: Assignment conversion not possible from type 
"org.apache.spark.sql.catalyst.util.ArrayData" to type 
"org.apache.spark.unsafe.types.UTF8String"
3org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
72, Column 1: Assignment conversion not possible from type 
"org.apache.spark.sql.catalyst.util.ArrayData" to type 
"org.apache.spark.unsafe.types.UTF8String"
4   at 
org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12021)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35984) Add a config to force using ShuffledHashJoin for test purpose

2021-07-01 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-35984:
---

 Summary: Add a config to force using ShuffledHashJoin for test 
purpose
 Key: SPARK-35984
 URL: https://issues.apache.org/jira/browse/SPARK-35984
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Linhong Liu


In the join.sql, we want to cover all 3 join types. but the problem is 
currently the `spark.sql.join.preferSortMergeJoin = false` couldn't guarantee 
all the joins will use ShuffledHashJoin, so we need another config to force 
using hash join in the testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35792) View should not capture configs used in `RelationConversions`

2021-06-16 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-35792:
---

 Summary: View should not capture configs used in 
`RelationConversions`
 Key: SPARK-35792
 URL: https://issues.apache.org/jira/browse/SPARK-35792
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Linhong Liu


RelationConversions is actually a optimization rule while it's executed in the 
analysis phase. For view, it's designed to only capture sementic configs, so we 
should ignore the configs related to `RelationConversions`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35780) Support DATE/TIMESTAMP literals across the full range

2021-06-15 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-35780:
---

 Summary: Support DATE/TIMESTAMP literals across the full range
 Key: SPARK-35780
 URL: https://issues.apache.org/jira/browse/SPARK-35780
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Linhong Liu


DATE/TIMESTAMP literals support years  to .
However, internally we support a range that is much larger.
I can add or subtract large intervals from a date/timestamp and the system will 
happily process and display large negative and positive dates.

Since we obviously cannot put this genie back into the bottle the only thing we 
can do is allow matching DATE/TIMESTAMP literals.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35686) Avoid using auto generated alias when creating view

2021-06-08 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-35686:
---

 Summary: Avoid using auto generated alias when creating view
 Key: SPARK-35686
 URL: https://issues.apache.org/jira/browse/SPARK-35686
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Linhong Liu


If the user creates a view in 2.4 and reads it in 3.2, there will be an 
incompatible schema issue. the root cause is that we changed the alias auto 
generation rule after 2.4. To avoid this happening again, we should let the 
user explicitly specifying the column names



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35685) Prompt recreating the View when there is a schema incompatible change

2021-06-08 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-35685:
---

 Summary: Prompt recreating the View when there is a schema 
incompatible change
 Key: SPARK-35685
 URL: https://issues.apache.org/jira/browse/SPARK-35685
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Linhong Liu


Prompt recreating the View when there is a schema incompatible change. 
Something like:

"there is an incompatible schema change and the column couldn't be resolved. 
Please consider to recreate the view to fix this: CREATE OR REPLACE VIEW v AS 
xxx"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35440) Add language type to `ExpressionInfo` for UDF

2021-05-18 Thread Linhong Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Linhong Liu updated SPARK-35440:

Description: add "scala", "java", "python", "hive", "built-in"

> Add language type to `ExpressionInfo` for UDF
> -
>
> Key: SPARK-35440
> URL: https://issues.apache.org/jira/browse/SPARK-35440
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.20
>Reporter: Linhong Liu
>Priority: Major
>
> add "scala", "java", "python", "hive", "built-in"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35440) Add language type to `ExpressionInfo` for UDF

2021-05-18 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-35440:
---

 Summary: Add language type to `ExpressionInfo` for UDF
 Key: SPARK-35440
 URL: https://issues.apache.org/jira/browse/SPARK-35440
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.20
Reporter: Linhong Liu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35366) Avoid using deprecated `buildForBatch` and `buildForStreaming`

2021-05-10 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-35366:
---

 Summary: Avoid using deprecated `buildForBatch` and 
`buildForStreaming`
 Key: SPARK-35366
 URL: https://issues.apache.org/jira/browse/SPARK-35366
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2
Reporter: Linhong Liu


in DSv2 we are still using the deprecated functions. need to avoid this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35318) View internal properties should be hidden for describe table command

2021-05-05 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-35318:
---

 Summary: View internal properties should be hidden for describe 
table command
 Key: SPARK-35318
 URL: https://issues.apache.org/jira/browse/SPARK-35318
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Linhong Liu


when creating view, spark will save some internal properties as table 
properties. But this should not be displayed for describe table command because 
this should be transparent to the end user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34504) avoid unnecessary view resolving and remove the `performCheck` flag

2021-02-22 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-34504:
---

 Summary: avoid unnecessary view resolving and remove the 
`performCheck` flag
 Key: SPARK-34504
 URL: https://issues.apache.org/jira/browse/SPARK-34504
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.1
Reporter: Linhong Liu


in SPARK-34490, I added a `performCheck` flag to skip analysis check when 
resolving views. This is due to some view resolution is unnecessary. So we can 
avoid these unnecessary view resolution and remove the `performCheck` flag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34490) table maybe resolved as a view if the table is dropped

2021-02-21 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-34490:
---

 Summary: table maybe resolved as a view if the table is dropped
 Key: SPARK-34490
 URL: https://issues.apache.org/jira/browse/SPARK-34490
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.2
Reporter: Linhong Liu


see discussion in 
https://github.com/apache/spark/pull/31550#issuecomment-781977326



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34260) UnresolvedException when creating temp view twice

2021-01-27 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-34260:
---

 Summary: UnresolvedException when creating temp view twice
 Key: SPARK-34260
 URL: https://issues.apache.org/jira/browse/SPARK-34260
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.2, 3.1.2
Reporter: Linhong Liu


when creating temp view twice, there is an UnresolvedException, queries to 
reproduce:

{code:java}
sql("create or replace temp view v as select * from (select * from range(10))")
sql("create or replace temp view v as select * from (select * from range(10))")
{code}

error message:

{noformat}
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
toAttribute on unresolved object, tree: *
at 
org.apache.spark.sql.catalyst.analysis.Star.toAttribute(unresolved.scala:295)
at 
org.apache.spark.sql.catalyst.plans.logical.Project.$anonfun$output$1(basicLogicalOperators.scala:62)
at scala.collection.immutable.List.map(List.scala:293)
at 
org.apache.spark.sql.catalyst.plans.logical.Project.output(basicLogicalOperators.scala:62)
at 
org.apache.spark.sql.catalyst.plans.logical.SubqueryAlias.output(basicLogicalOperators.scala:945)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$allAttributes$1(QueryPlan.scala:431)
at scala.collection.immutable.List.flatMap(List.scala:366)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.allAttributes$lzycompute(QueryPlan.scala:431)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.allAttributes(QueryPlan.scala:431)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$doCanonicalize$2(QueryPlan.scala:404)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.immutable.List.foreach(List.scala:431)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.immutable.List.map(List.scala:305)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:389)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:373)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:372)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.sameResult(QueryPlan.scala:420)
at 
org.apache.spark.sql.execution.command.CreateViewCommand.run(views.scala:118)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at 
org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
at 
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3699)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3697)
at org.apache.spark.sql.Dataset.(Dataset.scala:228)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615)
at 

[jira] [Created] (SPARK-34199) Block `count(table.*)` to follow ANSI standard and other SQL engines

2021-01-21 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-34199:
---

 Summary: Block `count(table.*)` to follow ANSI standard and other 
SQL engines
 Key: SPARK-34199
 URL: https://issues.apache.org/jira/browse/SPARK-34199
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Linhong Liu


In spark, the count(table.*) may cause very weird result, for example:

select count(*) from (select 1 as a, null as b) t;

output: 1

select count(t.*) from (select 1 as a, null as b) t;

output: 0

 

After checking the ANSI standard, count(*) is always treated as count(1) while 
count(t.*) is not allowed. What's more, this is also not allowed by common 
databases, e.g. MySQL, oracle.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33692) Permanent view shouldn't use current catalog and namespace to lookup function

2020-12-07 Thread Linhong Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Linhong Liu updated SPARK-33692:

Summary: Permanent view shouldn't use current catalog and namespace to 
lookup function  (was: Permanent view shouldn't lookup temp functions)

> Permanent view shouldn't use current catalog and namespace to lookup function
> -
>
> Key: SPARK-33692
> URL: https://issues.apache.org/jira/browse/SPARK-33692
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Linhong Liu
>Priority: Major
>
> Reproduce steps:
> spark.sql("CREATE FUNCTION udf_plus AS 'udf.UdfPlus10' USING JAR 
> '/home/linhong.liu/spark-udf_2.12-0.1.0-SNAPSHOT.jar'")
> spark.sql("create view v1 as select udf_plus(1)")
> spark.sql("select * from v1").show() // output 11
> spark.sql("CREATE TEMPORARY FUNCTION udf_plus AS 'udf.UdfPlus20' USING JAR 
> '/home/linhong.liu/spark-udf_2.12-0.1.0-SNAPSHOT.jar'")
> spark.sql("select * from v1").show() // throw exception
> org.apache.spark.sql.AnalysisException: Attribute with name 
> 'default.udf_plus(1)' is not found in '(udf_plus(1))';;
> Project [default.udf_plus(1)#60]
> +- SubqueryAlias spark_catalog.default.v1
>+- View (`default`.`v1`, [default.udf_plus(1)#60])
>   +- Project [HiveSimpleUDF#udf.UdfPlus20(1) AS udf_plus(1)#61]
>  +- OneRowRelation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33692) Permanent view shouldn't lookup temp functions

2020-12-07 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-33692:
---

 Summary: Permanent view shouldn't lookup temp functions
 Key: SPARK-33692
 URL: https://issues.apache.org/jira/browse/SPARK-33692
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1
Reporter: Linhong Liu


Reproduce steps:
spark.sql("CREATE FUNCTION udf_plus AS 'udf.UdfPlus10' USING JAR 
'/home/linhong.liu/spark-udf_2.12-0.1.0-SNAPSHOT.jar'")

spark.sql("create view v1 as select udf_plus(1)")
spark.sql("select * from v1").show() // output 11

spark.sql("CREATE TEMPORARY FUNCTION udf_plus AS 'udf.UdfPlus20' USING JAR 
'/home/linhong.liu/spark-udf_2.12-0.1.0-SNAPSHOT.jar'")

spark.sql("select * from v1").show() // throw exception

org.apache.spark.sql.AnalysisException: Attribute with name 
'default.udf_plus(1)' is not found in '(udf_plus(1))';;
Project [default.udf_plus(1)#60]
+- SubqueryAlias spark_catalog.default.v1
   +- View (`default`.`v1`, [default.udf_plus(1)#60])
  +- Project [HiveSimpleUDF#udf.UdfPlus20(1) AS udf_plus(1)#61]
 +- OneRowRelation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33647) cache table not working for persisted view

2020-12-03 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-33647:
---

 Summary: cache table not working for persisted view
 Key: SPARK-33647
 URL: https://issues.apache.org/jira/browse/SPARK-33647
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1
Reporter: Linhong Liu


In `CacheManager`, tables (including views) are cached by its logical plan, and
use `QueryPlan.sameResult` to lookup the cache. But the PersistedView wraps
the child plan with a `View` which always lead false for `sameResult` check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33438) set -v couldn't dump all the conf entries

2020-11-12 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-33438:
---

 Summary: set -v couldn't dump all the conf entries
 Key: SPARK-33438
 URL: https://issues.apache.org/jira/browse/SPARK-33438
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1
Reporter: Linhong Liu


since scala object is lazy init, it won't be load until some code touched it. 
For SQL conf entries, it won't be registered if the conf object is never 
touched. So "set -v" couldn't dump all the defined configs (even if it says so)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32898) totalExecutorRunTimeMs is too big

2020-09-15 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-32898:
---

 Summary: totalExecutorRunTimeMs is too big
 Key: SPARK-32898
 URL: https://issues.apache.org/jira/browse/SPARK-32898
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.1
Reporter: Linhong Liu


This might be because of incorrectly calculating executorRunTimeMs in 
Executor.scala
The function collectAccumulatorsAndResetStatusOnFailure(taskStartTimeNs) can be 
called when taskStartTimeNs is not set yet (it is 0).

As of now in master branch, here is the problematic code: 

[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L470]

 

There is a throw exception before this line. The catch branch still updates the 
metric.
However the query shows as SUCCESSful in QPL. Maybe this task is speculative. 
Not sure.

 

submissionTime in LiveExecutionData may also have similar problem.

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala#L449]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32816) Planner error when aggregating multiple distinct DECIMAL columns

2020-09-08 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-32816:
---

 Summary: Planner error when aggregating multiple distinct DECIMAL 
columns
 Key: SPARK-32816
 URL: https://issues.apache.org/jira/browse/SPARK-32816
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Linhong Liu


Running different DISTINCT decimal aggregations causes a query planner error:
{code:java}
java.lang.RuntimeException: You hit a query analyzer bug. Please report your 
query to Spark user mailing list.
at scala.sys.package$.error(package.scala:30)
at 
org.apache.spark.sql.execution.SparkStrategies$Aggregation$.apply(SparkStrategies.scala:473)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:67)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:97)
at 
org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:74)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:82)
at 
scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:162)
at 
scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:162)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
{code}
example failing query
{code:java}
import org.apache.spark.util.Utils

// Changing decimal(9, 0) to decimal(8, 0) fixes the problem. Root cause seems 
to have to do with
// UnscaledValue being used in one of the expressions but not the other.
val df = spark.range(0, 5, 1, 1).selectExpr(
  "id",
  "cast(id as decimal(9, 0)) as ss_ext_list_price")
val cacheDir = Utils.createTempDir().getCanonicalPath
df.write.parquet(cacheDir)

spark.read.parquet(cacheDir).createOrReplaceTempView("test_table")

spark.sql("""
select
avg(distinct ss_ext_list_price), sum(distinct ss_ext_list_price)
from test_table""").explain
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32761) Planner error when aggregating multiple distinct Constant columns

2020-08-31 Thread Linhong Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Linhong Liu updated SPARK-32761:

Description: 
SELECT COUNT(DISTINCT 2), COUNT(DISTINCT 2, 3) will trigger this bug.

The problematic code is:

 
{code:java}
val distinctAggGroups = aggExpressions.filter(_.isDistinct).groupBy { e =>
  val unfoldableChildren = 
e.aggregateFunction.children.filter(!_.foldable).toSet
  if (unfoldableChildren.nonEmpty) {
// Only expand the unfoldable children
 unfoldableChildren
  } else {
// If aggregateFunction's children are all foldable
// we must expand at least one of the children (here we take the first 
child),
// or If we don't, we will get the wrong result, for example:
// count(distinct 1) will be explained to count(1) after the rewrite 
function.
// Generally, the distinct aggregateFunction should not run
// foldable TypeCheck for the first child.
e.aggregateFunction.children.take(1).toSet
  }
}
{code}

  was:
SELECT COUNT(DISTINCT 2), COUNT(DISTINCT 3) will trigger this bug.

The problematic code is:

 
{code:java}
val distinctAggGroups = aggExpressions.filter(_.isDistinct).groupBy { e =>
  val unfoldableChildren = 
e.aggregateFunction.children.filter(!_.foldable).toSet
  if (unfoldableChildren.nonEmpty) {
// Only expand the unfoldable children
 unfoldableChildren
  } else {
// If aggregateFunction's children are all foldable
// we must expand at least one of the children (here we take the first 
child),
// or If we don't, we will get the wrong result, for example:
// count(distinct 1) will be explained to count(1) after the rewrite 
function.
// Generally, the distinct aggregateFunction should not run
// foldable TypeCheck for the first child.
e.aggregateFunction.children.take(1).toSet
  }
}
{code}


> Planner error when aggregating multiple distinct Constant columns
> -
>
> Key: SPARK-32761
> URL: https://issues.apache.org/jira/browse/SPARK-32761
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Linhong Liu
>Priority: Major
>
> SELECT COUNT(DISTINCT 2), COUNT(DISTINCT 2, 3) will trigger this bug.
> The problematic code is:
>  
> {code:java}
> val distinctAggGroups = aggExpressions.filter(_.isDistinct).groupBy { e =>
>   val unfoldableChildren = 
> e.aggregateFunction.children.filter(!_.foldable).toSet
>   if (unfoldableChildren.nonEmpty) {
> // Only expand the unfoldable children
>  unfoldableChildren
>   } else {
> // If aggregateFunction's children are all foldable
> // we must expand at least one of the children (here we take the first 
> child),
> // or If we don't, we will get the wrong result, for example:
> // count(distinct 1) will be explained to count(1) after the rewrite 
> function.
> // Generally, the distinct aggregateFunction should not run
> // foldable TypeCheck for the first child.
> e.aggregateFunction.children.take(1).toSet
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32761) Planner error when aggregating multiple distinct Constant columns

2020-08-31 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-32761:
---

 Summary: Planner error when aggregating multiple distinct Constant 
columns
 Key: SPARK-32761
 URL: https://issues.apache.org/jira/browse/SPARK-32761
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Linhong Liu


SELECT COUNT(DISTINCT 2), COUNT(DISTINCT 3) will trigger this bug.

The problematic code is:

 
{code:java}
val distinctAggGroups = aggExpressions.filter(_.isDistinct).groupBy { e =>
  val unfoldableChildren = 
e.aggregateFunction.children.filter(!_.foldable).toSet
  if (unfoldableChildren.nonEmpty) {
// Only expand the unfoldable children
 unfoldableChildren
  } else {
// If aggregateFunction's children are all foldable
// we must expand at least one of the children (here we take the first 
child),
// or If we don't, we will get the wrong result, for example:
// count(distinct 1) will be explained to count(1) after the rewrite 
function.
// Generally, the distinct aggregateFunction should not run
// foldable TypeCheck for the first child.
e.aggregateFunction.children.take(1).toSet
  }
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32680) CTAS with V2 catalog wrongly accessed unresolved query

2020-08-21 Thread Linhong Liu (Jira)
Linhong Liu created SPARK-32680:
---

 Summary: CTAS with V2 catalog wrongly accessed unresolved query
 Key: SPARK-32680
 URL: https://issues.apache.org/jira/browse/SPARK-32680
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Linhong Liu


Case:

{{CREATE TABLE t USING delta AS SELECT * from nonexist }}

 

Expected:

throw AnalysisException with "Table or view not found"

 

Actual:

{{throw UnresolvedException with 
"org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
toAttribute on unresolved object, tree: *"}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org