[jira] [Assigned] (SPARK-38929) Improve error messages for cast failures in ANSI

2022-04-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38929:


Assignee: (was: Apache Spark)

> Improve error messages for cast failures in ANSI
> 
>
> Key: SPARK-38929
> URL: https://issues.apache.org/jira/browse/SPARK-38929
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Xinyi Yu
>Priority: Major
>
> Improve several error messages for cast failures in ANSI.
> h2. Cast to numeric types
> {code:java}
> java.lang.NumberFormatException: invalid input syntax for type numeric: 1.0. 
> To return NULL instead, use 'try_cast'. ...{code}
> This is confusing as 1.0 is numeric to an average user. Need to mention the 
> specific target type (integer in this case) and put 1.0 in single quotes. If 
> we can mention this is a cast from string to an integer that’s even better.
> *Proposed change*
> {code:java}
> Invalid `int` literal: '1.0'. To return NULL instead, use 'try_cast'.{code}
> h2. Cast to date types
> {code:java}
> java.time.DateTimeException: Cannot cast 2021-09- 2 to DateType.{code}
> Can align with the above change.
> *Proposed change*
> {code:java}
> Invalid `date` literal: '2021-09- 2'. To return NULL instead, use 
> 'try_cast'.{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38929) Improve error messages for cast failures in ANSI

2022-04-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38929:


Assignee: Apache Spark

> Improve error messages for cast failures in ANSI
> 
>
> Key: SPARK-38929
> URL: https://issues.apache.org/jira/browse/SPARK-38929
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Xinyi Yu
>Assignee: Apache Spark
>Priority: Major
>
> Improve several error messages for cast failures in ANSI.
> h2. Cast to numeric types
> {code:java}
> java.lang.NumberFormatException: invalid input syntax for type numeric: 1.0. 
> To return NULL instead, use 'try_cast'. ...{code}
> This is confusing as 1.0 is numeric to an average user. Need to mention the 
> specific target type (integer in this case) and put 1.0 in single quotes. If 
> we can mention this is a cast from string to an integer that’s even better.
> *Proposed change*
> {code:java}
> Invalid `int` literal: '1.0'. To return NULL instead, use 'try_cast'.{code}
> h2. Cast to date types
> {code:java}
> java.time.DateTimeException: Cannot cast 2021-09- 2 to DateType.{code}
> Can align with the above change.
> *Proposed change*
> {code:java}
> Invalid `date` literal: '2021-09- 2'. To return NULL instead, use 
> 'try_cast'.{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38929) Improve error messages for cast failures in ANSI

2022-04-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523515#comment-17523515
 ] 

Apache Spark commented on SPARK-38929:
--

User 'anchovYu' has created a pull request for this issue:
https://github.com/apache/spark/pull/36241

> Improve error messages for cast failures in ANSI
> 
>
> Key: SPARK-38929
> URL: https://issues.apache.org/jira/browse/SPARK-38929
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Xinyi Yu
>Priority: Major
>
> Improve several error messages for cast failures in ANSI.
> h2. Cast to numeric types
> {code:java}
> java.lang.NumberFormatException: invalid input syntax for type numeric: 1.0. 
> To return NULL instead, use 'try_cast'. ...{code}
> This is confusing as 1.0 is numeric to an average user. Need to mention the 
> specific target type (integer in this case) and put 1.0 in single quotes. If 
> we can mention this is a cast from string to an integer that’s even better.
> *Proposed change*
> {code:java}
> Invalid `int` literal: '1.0'. To return NULL instead, use 'try_cast'.{code}
> h2. Cast to date types
> {code:java}
> java.time.DateTimeException: Cannot cast 2021-09- 2 to DateType.{code}
> Can align with the above change.
> *Proposed change*
> {code:java}
> Invalid `date` literal: '2021-09- 2'. To return NULL instead, use 
> 'try_cast'.{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37787) Long running Spark Job(Spark ThriftServer) throw HDFS_DELEGATE_TOKEN not found in cache Exception

2022-04-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523512#comment-17523512
 ] 

Apache Spark commented on SPARK-37787:
--

User 'huangzhir' has created a pull request for this issue:
https://github.com/apache/spark/pull/36240

> Long running Spark Job(Spark ThriftServer) throw HDFS_DELEGATE_TOKEN not 
> found in cache Exception
> -
>
> Key: SPARK-37787
> URL: https://issues.apache.org/jira/browse/SPARK-37787
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.1.0, 3.2.0
> Environment: spark3 thrift server
>  
> spark-default.conf
> spark.hadoop.fs.hdfs.impl.disable.cache=true
>  
>Reporter: huangzhir
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> *HDFS_DELEGATE_TOKEN not found in cache exception* occurs when accessing 
> spark thriftserver service. The specific exception is as follows:
> [Exception Log | 
> https://raw.githubusercontent.com/huangzhir/Temp/main/image-3.png]
> !https://raw.githubusercontent.com/huangzhir/Temp/main/image-3.png!
> the HadoopDelegationTokenManager thow Exception when renewal 
> DelegationToken,as follows:
>  
> We are also find HadoopDelegationTokenManager log as follows:
> INFO [Credential Renewal Thread] 
> org.apache.spark.deploy.security.HadoopDelegationTokenManager logInfo - 
> *Scheduling renewal in 1921535501304.2 h.*
> [hdfs Exceptin log in HadoopDelegationTokenManager  
> |https://raw.githubusercontent.com/huangzhir/Temp/main/spark%20thriftserver%20Exceptin.png]
> !https://raw.githubusercontent.com/huangzhir/Temp/main/spark%20thriftserver%20Exceptin.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37787) Long running Spark Job(Spark ThriftServer) throw HDFS_DELEGATE_TOKEN not found in cache Exception

2022-04-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523511#comment-17523511
 ] 

Apache Spark commented on SPARK-37787:
--

User 'huangzhir' has created a pull request for this issue:
https://github.com/apache/spark/pull/36240

> Long running Spark Job(Spark ThriftServer) throw HDFS_DELEGATE_TOKEN not 
> found in cache Exception
> -
>
> Key: SPARK-37787
> URL: https://issues.apache.org/jira/browse/SPARK-37787
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.1.0, 3.2.0
> Environment: spark3 thrift server
>  
> spark-default.conf
> spark.hadoop.fs.hdfs.impl.disable.cache=true
>  
>Reporter: huangzhir
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> *HDFS_DELEGATE_TOKEN not found in cache exception* occurs when accessing 
> spark thriftserver service. The specific exception is as follows:
> [Exception Log | 
> https://raw.githubusercontent.com/huangzhir/Temp/main/image-3.png]
> !https://raw.githubusercontent.com/huangzhir/Temp/main/image-3.png!
> the HadoopDelegationTokenManager thow Exception when renewal 
> DelegationToken,as follows:
>  
> We are also find HadoopDelegationTokenManager log as follows:
> INFO [Credential Renewal Thread] 
> org.apache.spark.deploy.security.HadoopDelegationTokenManager logInfo - 
> *Scheduling renewal in 1921535501304.2 h.*
> [hdfs Exceptin log in HadoopDelegationTokenManager  
> |https://raw.githubusercontent.com/huangzhir/Temp/main/spark%20thriftserver%20Exceptin.png]
> !https://raw.githubusercontent.com/huangzhir/Temp/main/spark%20thriftserver%20Exceptin.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38929) Improve error messages for cast failures in ANSI

2022-04-17 Thread Xinyi Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyi Yu updated SPARK-38929:
-
Description: 
Improve several error messages for cast failures in ANSI.
h2. Cast to numeric types
{code:java}
java.lang.NumberFormatException: invalid input syntax for type numeric: 1.0. To 
return NULL instead, use 'try_cast'. ...{code}
This is confusing as 1.0 is numeric to an average user. Need to mention the 
specific target type (integer in this case) and put 1.0 in single quotes. If we 
can mention this is a cast from string to an integer that’s even better.

*Proposed change*
{code:java}
Invalid `int` literal: '1.0'. To return NULL instead, use 'try_cast'.{code}
h2. Cast to date types
{code:java}
java.time.DateTimeException: Cannot cast 2021-09- 2 to DateType.{code}
Can align with the above change.

*Proposed change*
{code:java}
Invalid `date` literal: '2021-09- 2'. To return NULL instead, use 
'try_cast'.{code}

  was:
Improve several error messages for cast failures in ANSI.
h2. Cast to numeric types
{code:java}
java.lang.NumberFormatException: invalid input syntax for type numeric: 1.0. To 
return NULL instead, use 'try_cast'. ...{code}
This is confusing as 1.0 is numeric to an average user. Need to mention the 
specific target type (integer in this case) and put 1.0 in single quotes. If we 
can mention this is a cast from string to an integer that’s even better.

*Proposed change*
{code:java}
Invalid `int` value: '1.0'. To return NULL instead, use 'try_cast'.{code}
h2. Cast to date types
{code:java}
java.time.DateTimeException: Cannot cast 2021-09- 2 to DateType.{code}
Can align with the above change.

*Proposed change*
{code:java}
Invalid `date` value: '2021-09- 2'. To return NULL instead, use 
'try_cast'.{code}


> Improve error messages for cast failures in ANSI
> 
>
> Key: SPARK-38929
> URL: https://issues.apache.org/jira/browse/SPARK-38929
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Xinyi Yu
>Priority: Major
>
> Improve several error messages for cast failures in ANSI.
> h2. Cast to numeric types
> {code:java}
> java.lang.NumberFormatException: invalid input syntax for type numeric: 1.0. 
> To return NULL instead, use 'try_cast'. ...{code}
> This is confusing as 1.0 is numeric to an average user. Need to mention the 
> specific target type (integer in this case) and put 1.0 in single quotes. If 
> we can mention this is a cast from string to an integer that’s even better.
> *Proposed change*
> {code:java}
> Invalid `int` literal: '1.0'. To return NULL instead, use 'try_cast'.{code}
> h2. Cast to date types
> {code:java}
> java.time.DateTimeException: Cannot cast 2021-09- 2 to DateType.{code}
> Can align with the above change.
> *Proposed change*
> {code:java}
> Invalid `date` literal: '2021-09- 2'. To return NULL instead, use 
> 'try_cast'.{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38929) Improve error messages for cast failures in ANSI

2022-04-17 Thread Xinyi Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyi Yu updated SPARK-38929:
-
Description: 
Improve several error messages for cast failures in ANSI.
h2. Cast to numeric types
{code:java}
java.lang.NumberFormatException: invalid input syntax for type numeric: 1.0. To 
return NULL instead, use 'try_cast'. ...{code}
This is confusing as 1.0 is numeric to an average user. Need to mention the 
specific target type (integer in this case) and put 1.0 in single quotes. If we 
can mention this is a cast from string to an integer that’s even better.

*Proposed change*
{code:java}
Invalid `int` value: '1.0'. To return NULL instead, use 'try_cast'.{code}
h2. Cast to date types
{code:java}
java.time.DateTimeException: Cannot cast 2021-09- 2 to DateType.{code}
Can align with the above change.

*Proposed change*
{code:java}
Invalid `date` value: '2021-09- 2'. To return NULL instead, use 
'try_cast'.{code}

  was:
Improve several error messages for cast failures in ANSI.
h3. Cast to numeric types
{code:java}
java.lang.NumberFormatException: invalid input syntax for type numeric: 1.0. To 
return NULL instead, use 'try_cast'. ...{code}
This is confusing as 1.0 is numeric to an average user. Need to mention the 
specific target type (integer in this case) and put 1.0 in single quotes. If we 
can mention this is a cast from string to an integer that’s even better.
h4. Proposed change
{code:java}
Invalid `int` value: '1.0'. To return NULL instead, use 'try_cast'.{code}
h3. Cast to date types
{code:java}
java.time.DateTimeException: Cannot cast 2021-09- 2 to DateType.{code}
Can align with the above change.
h4. Proposed change
{code:java}
Invalid `date` value: '2021-09- 2'. To return NULL instead, use 
'try_cast'.{code}


> Improve error messages for cast failures in ANSI
> 
>
> Key: SPARK-38929
> URL: https://issues.apache.org/jira/browse/SPARK-38929
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Xinyi Yu
>Priority: Major
>
> Improve several error messages for cast failures in ANSI.
> h2. Cast to numeric types
> {code:java}
> java.lang.NumberFormatException: invalid input syntax for type numeric: 1.0. 
> To return NULL instead, use 'try_cast'. ...{code}
> This is confusing as 1.0 is numeric to an average user. Need to mention the 
> specific target type (integer in this case) and put 1.0 in single quotes. If 
> we can mention this is a cast from string to an integer that’s even better.
> *Proposed change*
> {code:java}
> Invalid `int` value: '1.0'. To return NULL instead, use 'try_cast'.{code}
> h2. Cast to date types
> {code:java}
> java.time.DateTimeException: Cannot cast 2021-09- 2 to DateType.{code}
> Can align with the above change.
> *Proposed change*
> {code:java}
> Invalid `date` value: '2021-09- 2'. To return NULL instead, use 
> 'try_cast'.{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38929) Improve error messages for cast failures in ANSI

2022-04-17 Thread Xinyi Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyi Yu updated SPARK-38929:
-
Description: 
Improve several error messages for cast failures in ANSI.
h3. Cast to numeric types
{code:java}
java.lang.NumberFormatException: invalid input syntax for type numeric: 1.0. To 
return NULL instead, use 'try_cast'. ...{code}
This is confusing as 1.0 is numeric to an average user. Need to mention the 
specific target type (integer in this case) and put 1.0 in single quotes. If we 
can mention this is a cast from string to an integer that’s even better.
h4. Proposed change
{code:java}
Invalid `int` value: '1.0'. To return NULL instead, use 'try_cast'.{code}
h3. Cast to date types
{code:java}
java.time.DateTimeException: Cannot cast 2021-09- 2 to DateType.{code}
Can align with the above change.
h4. Proposed change
{code:java}
Invalid `date` value: '2021-09- 2'. To return NULL instead, use 
'try_cast'.{code}

  was:
Improve several error messages for cast failures in ANSI.
h3. Cast to numeric types
{code:java}
java.lang.NumberFormatException: invalid input syntax for type numeric: 1.0. To 
return NULL instead, use 'try_cast'. ...{code}
This is confusing as 1.0 is numeric to an average user. Need to mention the 
specific target type (integer in this case) and put 1.0 in single quotes. If we 
can mention this is a cast from string to an integer that’s even better.
h4. Proposed change

{{Invalid `int` value: '1.0'. To return NULL instead, use 'try_cast'.}}
h3. Cast to date types
{code:java}
java.time.DateTimeException: Cannot cast 2021-09- 2 to DateType.{code}
Can align with the above change.
h4. Proposed change
{code:java}
Invalid `date` value: '2021-09- 2'. To return NULL instead, use 
'try_cast'.{code}


> Improve error messages for cast failures in ANSI
> 
>
> Key: SPARK-38929
> URL: https://issues.apache.org/jira/browse/SPARK-38929
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Xinyi Yu
>Priority: Major
>
> Improve several error messages for cast failures in ANSI.
> h3. Cast to numeric types
> {code:java}
> java.lang.NumberFormatException: invalid input syntax for type numeric: 1.0. 
> To return NULL instead, use 'try_cast'. ...{code}
> This is confusing as 1.0 is numeric to an average user. Need to mention the 
> specific target type (integer in this case) and put 1.0 in single quotes. If 
> we can mention this is a cast from string to an integer that’s even better.
> h4. Proposed change
> {code:java}
> Invalid `int` value: '1.0'. To return NULL instead, use 'try_cast'.{code}
> h3. Cast to date types
> {code:java}
> java.time.DateTimeException: Cannot cast 2021-09- 2 to DateType.{code}
> Can align with the above change.
> h4. Proposed change
> {code:java}
> Invalid `date` value: '2021-09- 2'. To return NULL instead, use 
> 'try_cast'.{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38929) Improve error messages for cast failures in ANSI

2022-04-17 Thread Xinyi Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyi Yu updated SPARK-38929:
-
Description: 
Improve several error messages for cast failures in ANSI.
h3. Cast to numeric types
{code:java}
java.lang.NumberFormatException: invalid input syntax for type numeric: 1.0. To 
return NULL instead, use 'try_cast'. ...{code}
This is confusing as 1.0 is numeric to an average user. Need to mention the 
specific target type (integer in this case) and put 1.0 in single quotes. If we 
can mention this is a cast from string to an integer that’s even better.
h4. Proposed change

{{Invalid `int` value: '1.0'. To return NULL instead, use 'try_cast'.}}
h3. Cast to date types
{code:java}
java.time.DateTimeException: Cannot cast 2021-09- 2 to DateType.{code}
Can align with the above change.
h4. Proposed change
{code:java}
Invalid `date` value: '2021-09- 2'. To return NULL instead, use 
'try_cast'.{code}

  was:
Improve several error messages for cast failures in ANSI.
h3. Cast to numeric types
{code:java}
java.lang.NumberFormatException: invalid input syntax for type numeric: 1.0. To 
return NULL instead, use 'try_cast'. ...{code}
This is confusing as 1.0 is numeric to an average user. Need to mention the 
specific target type (integer in this case) and put 1.0 in single quotes. If we 
can mention this is a cast from string to an integer that’s even better.
h4. Proposed change

{{Invalid `int` value: '1.0'. To return NULL instead, use 'try_cast'.}}
h3. Cast to date types
{code:java}
java.time.DateTimeException: Cannot cast 2021-09- 2 to DateType.{code}
Can align with the above change.
h4. Proposed change

{{}}
{code:java}
Invalid `date` value: '2021-09- 2'. To return NULL instead, use 
'try_cast'.{code}
{{}}


> Improve error messages for cast failures in ANSI
> 
>
> Key: SPARK-38929
> URL: https://issues.apache.org/jira/browse/SPARK-38929
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Xinyi Yu
>Priority: Major
>
> Improve several error messages for cast failures in ANSI.
> h3. Cast to numeric types
> {code:java}
> java.lang.NumberFormatException: invalid input syntax for type numeric: 1.0. 
> To return NULL instead, use 'try_cast'. ...{code}
> This is confusing as 1.0 is numeric to an average user. Need to mention the 
> specific target type (integer in this case) and put 1.0 in single quotes. If 
> we can mention this is a cast from string to an integer that’s even better.
> h4. Proposed change
> {{Invalid `int` value: '1.0'. To return NULL instead, use 'try_cast'.}}
> h3. Cast to date types
> {code:java}
> java.time.DateTimeException: Cannot cast 2021-09- 2 to DateType.{code}
> Can align with the above change.
> h4. Proposed change
> {code:java}
> Invalid `date` value: '2021-09- 2'. To return NULL instead, use 
> 'try_cast'.{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38929) Improve error messages for cast failures in ANSI

2022-04-17 Thread Xinyi Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyi Yu updated SPARK-38929:
-
Description: 
Improve several error messages for cast failures in ANSI.
h3. Cast to numeric types
{code:java}
java.lang.NumberFormatException: invalid input syntax for type numeric: 1.0. To 
return NULL instead, use 'try_cast'. ...{code}
This is confusing as 1.0 is numeric to an average user. Need to mention the 
specific target type (integer in this case) and put 1.0 in single quotes. If we 
can mention this is a cast from string to an integer that’s even better.
h4. Proposed change

{{Invalid `int` value: '1.0'. To return NULL instead, use 'try_cast'.}}
h3. Cast to date types
{code:java}
java.time.DateTimeException: Cannot cast 2021-09- 2 to DateType.{code}
Can align with the above change.
h4. Proposed change

{{}}
{code:java}
Invalid `date` value: '2021-09- 2'. To return NULL instead, use 
'try_cast'.{code}
{{}}

  was:Improve several error messages for cast failures in ANSI:


> Improve error messages for cast failures in ANSI
> 
>
> Key: SPARK-38929
> URL: https://issues.apache.org/jira/browse/SPARK-38929
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Xinyi Yu
>Priority: Major
>
> Improve several error messages for cast failures in ANSI.
> h3. Cast to numeric types
> {code:java}
> java.lang.NumberFormatException: invalid input syntax for type numeric: 1.0. 
> To return NULL instead, use 'try_cast'. ...{code}
> This is confusing as 1.0 is numeric to an average user. Need to mention the 
> specific target type (integer in this case) and put 1.0 in single quotes. If 
> we can mention this is a cast from string to an integer that’s even better.
> h4. Proposed change
> {{Invalid `int` value: '1.0'. To return NULL instead, use 'try_cast'.}}
> h3. Cast to date types
> {code:java}
> java.time.DateTimeException: Cannot cast 2021-09- 2 to DateType.{code}
> Can align with the above change.
> h4. Proposed change
> {{}}
> {code:java}
> Invalid `date` value: '2021-09- 2'. To return NULL instead, use 
> 'try_cast'.{code}
> {{}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38929) Improve error messages for cast failures in ANSI

2022-04-17 Thread Xinyi Yu (Jira)
Xinyi Yu created SPARK-38929:


 Summary: Improve error messages for cast failures in ANSI
 Key: SPARK-38929
 URL: https://issues.apache.org/jira/browse/SPARK-38929
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Xinyi Yu


Improve several error messages for cast failures in ANSI:



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38720) Test the error class: CANNOT_CHANGE_DECIMAL_PRECISION

2022-04-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523498#comment-17523498
 ] 

Apache Spark commented on SPARK-38720:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36239

> Test the error class: CANNOT_CHANGE_DECIMAL_PRECISION
> -
>
> Key: SPARK-38720
> URL: https://issues.apache.org/jira/browse/SPARK-38720
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Add at least one test for the error class *CANNOT_CHANGE_DECIMAL_PRECISION* 
> to QueryExecutionErrorsSuite. The test should cover the exception throw in 
> QueryExecutionErrors:
> {code:scala}
>   def cannotChangeDecimalPrecisionError(
>   value: Decimal, decimalPrecision: Int, decimalScale: Int): 
> ArithmeticException = {
> new SparkArithmeticException(errorClass = 
> "CANNOT_CHANGE_DECIMAL_PRECISION",
>   messageParameters = Array(value.toDebugString,
> decimalPrecision.toString, decimalScale.toString, 
> SQLConf.ANSI_ENABLED.key))
>   }
> {code}
> For example, here is a test for the error class *UNSUPPORTED_FEATURE*: 
> https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170
> +The test must have a check of:+
> # the entire error message
> # sqlState if it is defined in the error-classes.json file
> # the error class



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38720) Test the error class: CANNOT_CHANGE_DECIMAL_PRECISION

2022-04-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38720:


Assignee: Apache Spark

> Test the error class: CANNOT_CHANGE_DECIMAL_PRECISION
> -
>
> Key: SPARK-38720
> URL: https://issues.apache.org/jira/browse/SPARK-38720
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Minor
>  Labels: starter
>
> Add at least one test for the error class *CANNOT_CHANGE_DECIMAL_PRECISION* 
> to QueryExecutionErrorsSuite. The test should cover the exception throw in 
> QueryExecutionErrors:
> {code:scala}
>   def cannotChangeDecimalPrecisionError(
>   value: Decimal, decimalPrecision: Int, decimalScale: Int): 
> ArithmeticException = {
> new SparkArithmeticException(errorClass = 
> "CANNOT_CHANGE_DECIMAL_PRECISION",
>   messageParameters = Array(value.toDebugString,
> decimalPrecision.toString, decimalScale.toString, 
> SQLConf.ANSI_ENABLED.key))
>   }
> {code}
> For example, here is a test for the error class *UNSUPPORTED_FEATURE*: 
> https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170
> +The test must have a check of:+
> # the entire error message
> # sqlState if it is defined in the error-classes.json file
> # the error class



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38720) Test the error class: CANNOT_CHANGE_DECIMAL_PRECISION

2022-04-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38720:


Assignee: (was: Apache Spark)

> Test the error class: CANNOT_CHANGE_DECIMAL_PRECISION
> -
>
> Key: SPARK-38720
> URL: https://issues.apache.org/jira/browse/SPARK-38720
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Add at least one test for the error class *CANNOT_CHANGE_DECIMAL_PRECISION* 
> to QueryExecutionErrorsSuite. The test should cover the exception throw in 
> QueryExecutionErrors:
> {code:scala}
>   def cannotChangeDecimalPrecisionError(
>   value: Decimal, decimalPrecision: Int, decimalScale: Int): 
> ArithmeticException = {
> new SparkArithmeticException(errorClass = 
> "CANNOT_CHANGE_DECIMAL_PRECISION",
>   messageParameters = Array(value.toDebugString,
> decimalPrecision.toString, decimalScale.toString, 
> SQLConf.ANSI_ENABLED.key))
>   }
> {code}
> For example, here is a test for the error class *UNSUPPORTED_FEATURE*: 
> https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170
> +The test must have a check of:+
> # the entire error message
> # sqlState if it is defined in the error-classes.json file
> # the error class



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38916) Tasks not killed caused by race conditions between killTask() and launchTask()

2022-04-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523493#comment-17523493
 ] 

Apache Spark commented on SPARK-38916:
--

User 'maryannxue' has created a pull request for this issue:
https://github.com/apache/spark/pull/36238

> Tasks not killed caused by race conditions between killTask() and launchTask()
> --
>
> Key: SPARK-38916
> URL: https://issues.apache.org/jira/browse/SPARK-38916
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Wei Xue
>Priority: Minor
>
> Sometimes when the scheduler tries to cancel a task right after it launches 
> that task on the executor, the KillTask and LaunchTask events can come in a 
> reversed order, causing the task to escape the kill-task signal and finish 
> "secretly". And those tasks even show as an un-launched task in Spark UI.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38916) Tasks not killed caused by race conditions between killTask() and launchTask()

2022-04-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38916:


Assignee: Apache Spark

> Tasks not killed caused by race conditions between killTask() and launchTask()
> --
>
> Key: SPARK-38916
> URL: https://issues.apache.org/jira/browse/SPARK-38916
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Wei Xue
>Assignee: Apache Spark
>Priority: Minor
>
> Sometimes when the scheduler tries to cancel a task right after it launches 
> that task on the executor, the KillTask and LaunchTask events can come in a 
> reversed order, causing the task to escape the kill-task signal and finish 
> "secretly". And those tasks even show as an un-launched task in Spark UI.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38916) Tasks not killed caused by race conditions between killTask() and launchTask()

2022-04-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38916:


Assignee: (was: Apache Spark)

> Tasks not killed caused by race conditions between killTask() and launchTask()
> --
>
> Key: SPARK-38916
> URL: https://issues.apache.org/jira/browse/SPARK-38916
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Wei Xue
>Priority: Minor
>
> Sometimes when the scheduler tries to cancel a task right after it launches 
> that task on the executor, the KillTask and LaunchTask events can come in a 
> reversed order, causing the task to escape the kill-task signal and finish 
> "secretly". And those tasks even show as an un-launched task in Spark UI.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38896) Use tryWithResource to recycling KVStoreIterator

2022-04-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523487#comment-17523487
 ] 

Apache Spark commented on SPARK-38896:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36237

> Use tryWithResource to recycling KVStoreIterator
> 
>
> Key: SPARK-38896
> URL: https://issues.apache.org/jira/browse/SPARK-38896
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> Use `Utils.tryWithResource` to recycling all  `KVStoreIterator` opened by 
> RocksDB/LevelDB 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38928) Skip Pandas UDF test in `QueryCompilationErrorsSuite` if not available

2022-04-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-38928.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 36236
[https://github.com/apache/spark/pull/36236]

> Skip Pandas UDF test in `QueryCompilationErrorsSuite` if not available
> --
>
> Key: SPARK-38928
> URL: https://issues.apache.org/jira/browse/SPARK-38928
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38928) Skip Pandas UDF test in `QueryCompilationErrorsSuite` if not available

2022-04-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-38928:
-

Assignee: William Hyun

> Skip Pandas UDF test in `QueryCompilationErrorsSuite` if not available
> --
>
> Key: SPARK-38928
> URL: https://issues.apache.org/jira/browse/SPARK-38928
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Assignee: William Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38707) Allow user to insert into only certain columns of a table

2022-04-17 Thread morvenhuang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

morvenhuang resolved SPARK-38707.
-
Resolution: Duplicate

Fixed by SPARK-38795.

> Allow user to insert into only certain columns of a table
> -
>
> Key: SPARK-38707
> URL: https://issues.apache.org/jira/browse/SPARK-38707
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: morvenhuang
>Priority: Major
>
> When running INSERT INTO statement, it's quite common that user wants to 
> insert only certain columns, especially when the rest columns have default 
> value. 
> Currently, spark allows user to specify column list for INSERT INTO statement 
> only when the column list contains all columns of the table.
> Say we have a MySQL table:
>  
> {code:java}
> CREATE TABLE t1(c1 int(11), c2 int(11)){code}
>  
> This INSERT INTO statement works in Spark, 
>  
> {code:java}
> INSERT INTO t1(c1, c2) values(1, 1){code}
>  
> While this ends up with exception,
>  
> {code:java}
> INSERT INTO t1(c1) values(1){code}
>  
> {code:java}
> org.apache.spark.sql.AnalysisException: unknown requires that the data to be 
> inserted have the same number of columns as the target table: target table 
> has 2 column(s) but the inserted data has 1 column(s), including 0 partition 
> column(s) having constant value(s).
>   at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.mismatchedInsertedDataColumnNumberError(QueryCompilationErrors.scala:1261)
>   at 
> org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$.org$apache$spark$sql$execution$datasources$PreprocessTableInsertion$$preprocess(rules.scala:389)
>   at 
> org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$$anonfun$apply$3.applyOrElse(rules.scala:429)
>   at 
> org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$$anonfun$apply$3.applyOrElse(rules.scala:420)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$2(AnalysisHelper.scala:170)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:83)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:170)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:168)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:164)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning(AnalysisHelper.scala:99)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning$(AnalysisHelper.scala:96)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:76)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:75)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$.apply(rules.scala:420)
>   at 
> org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$.apply(rules.scala:370)
>   ..{code}
>  
> I can provide a fix for this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38896) Use tryWithResource to recycling KVStoreIterator

2022-04-17 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-38896:
-
Description: Use `Utils.tryWithResource` to recycling all  
`KVStoreIterator` opened by RocksDB/LevelDB   (was: Use `Utils.tryWithResource` 
to recycling all  `KVStoreIterator` opened by RocksDB/LevelDB and remove 
`finalize()` medho from LevelDB/RocksDB)

> Use tryWithResource to recycling KVStoreIterator
> 
>
> Key: SPARK-38896
> URL: https://issues.apache.org/jira/browse/SPARK-38896
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> Use `Utils.tryWithResource` to recycling all  `KVStoreIterator` opened by 
> RocksDB/LevelDB 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38896) Use tryWithResource to recycling KVStoreIterator

2022-04-17 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-38896:
-
Summary: Use tryWithResource to recycling KVStoreIterator  (was: Use 
tryWithResource to recycling KVStoreIterator and remove finalize() from 
LevelDB/RocksDB)

> Use tryWithResource to recycling KVStoreIterator
> 
>
> Key: SPARK-38896
> URL: https://issues.apache.org/jira/browse/SPARK-38896
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> Use `Utils.tryWithResource` to recycling all  `KVStoreIterator` opened by 
> RocksDB/LevelDB and remove `finalize()` medho from LevelDB/RocksDB



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38812) when i clean data ,I hope one rdd spill two rdd according clean data rule

2022-04-17 Thread gaokui (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523476#comment-17523476
 ] 

gaokui commented on SPARK-38812:


I see  SPARK-2373, SPARK-6664 

actually i can get more better method than two, use once time job to calcute, 
not twice.

for example :

val intRDD=sc.makeRDD(Array(1,2,3,4,5,6))
intRDD.foreachPartition(iter=>{
val (it1,it2)=iter.patition(x=>x<=3)
saveQualityError(it1)   //but right here can not use rdd.savetextfile, need 
write store policy with interaltime and writing size.
saveQualityGood(it2)  //but right here can not use rdd.savetextfile, need write 
store policy with interaltime and writing size.

//and more serious problem short bucket effect. one patition good data is less, 
worse data is more. then one write method will wait another method.
})

> when i clean data ,I hope one rdd spill two rdd according clean data rule
> -
>
> Key: SPARK-38812
> URL: https://issues.apache.org/jira/browse/SPARK-38812
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: gaokui
>Priority: Major
>
> when id do clean data,one rdd according one value(>or <) filter data, and 
> then generate two different set,one is error data file, another is errorless 
> data file.
> Now I use filter, but this filter must have two spark dag job, that cost too 
> much.
> exactly some code like iterator.span(preidicate) and then return one 
> tuple(iter1,iter2)
> one dataset will be spilted tow dataset in one rule data clean progress.
> i hope compute once not twice.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38928) Skip Pandas UDF test in `QueryCompilationErrorsSuite` if not available

2022-04-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523475#comment-17523475
 ] 

Apache Spark commented on SPARK-38928:
--

User 'williamhyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36236

> Skip Pandas UDF test in `QueryCompilationErrorsSuite` if not available
> --
>
> Key: SPARK-38928
> URL: https://issues.apache.org/jira/browse/SPARK-38928
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38928) Skip Pandas UDF test in `QueryCompilationErrorsSuite` if not available

2022-04-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38928:


Assignee: (was: Apache Spark)

> Skip Pandas UDF test in `QueryCompilationErrorsSuite` if not available
> --
>
> Key: SPARK-38928
> URL: https://issues.apache.org/jira/browse/SPARK-38928
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38928) Skip Pandas UDF test in `QueryCompilationErrorsSuite` if not available

2022-04-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38928:


Assignee: Apache Spark

> Skip Pandas UDF test in `QueryCompilationErrorsSuite` if not available
> --
>
> Key: SPARK-38928
> URL: https://issues.apache.org/jira/browse/SPARK-38928
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38868) `assert_true` fails unconditionnaly after `left_outer` joins

2022-04-17 Thread Bruce Robbins (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Robbins updated SPARK-38868:
--
Affects Version/s: 3.3.0
   3.4.0

> `assert_true` fails unconditionnaly after `left_outer` joins
> 
>
> Key: SPARK-38868
> URL: https://issues.apache.org/jira/browse/SPARK-38868
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.1.1, 3.1.2, 3.2.0, 3.2.1, 3.3.0, 3.4.0
>Reporter: Fabien Dubosson
>Priority: Major
>
> When `assert_true` is used after a `left_outer` join the assert exception is 
> raised even though all the rows meet the condition. Using an `inner` join 
> does not expose this issue.
>  
> {code:java}
> from pyspark.sql import SparkSession
> from pyspark.sql import functions as sf
> session = SparkSession.builder.getOrCreate()
> entries = session.createDataFrame(
>     [
>         ("a", 1),
>         ("b", 2),
>         ("c", 3),
>     ],
>     ["id", "outcome_id"],
> )
> outcomes = session.createDataFrame(
>     [
>         (1, 12),
>         (2, 34),
>         (3, 32),
>     ],
>     ["outcome_id", "outcome_value"],
> )
> # Inner join works as expected
> (
>     entries.join(outcomes, on="outcome_id", how="inner")
>     .withColumn("valid", sf.assert_true(sf.col("outcome_value") > 10))
>     .filter(sf.col("valid").isNull())
>     .show()
> )
> # Left join fails with «'('outcome_value > 10)' is not true!» even though it 
> is the case
> (
>     entries.join(outcomes, on="outcome_id", how="left_outer")
>     .withColumn("valid", sf.assert_true(sf.col("outcome_value") > 10))
>     .filter(sf.col("valid").isNull())
>     .show()
> ){code}
> Reproduced on `pyspark` versions: `3.2.1`, `3.2.0`, `3.1.2` and `3.1.1`. I am 
> not sure if "native" Spark exposes this issue as well or not, I don't have 
> the knowledge/setup to test that.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38928) Skip Pandas UDF test in `QueryCompilationErrorsSuite` if not available

2022-04-17 Thread William Hyun (Jira)
William Hyun created SPARK-38928:


 Summary: Skip Pandas UDF test in `QueryCompilationErrorsSuite` if 
not available
 Key: SPARK-38928
 URL: https://issues.apache.org/jira/browse/SPARK-38928
 Project: Spark
  Issue Type: Test
  Components: SQL, Tests
Affects Versions: 3.4.0
Reporter: William Hyun






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38812) when i clean data ,I hope one rdd spill two rdd according clean data rule

2022-04-17 Thread gaokui (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaokui updated SPARK-38812:
---
Description: 
when id do clean data,one rdd according one value(>or <) filter data, and then 
generate two different set,one is error data file, another is errorless data 
file.

Now I use filter, but this filter must have two spark dag job, that cost too 
much.

exactly some code like iterator.span(preidicate) and then return one 
tuple(iter1,iter2)

one dataset will be spilted tow dataset in one rule data clean progress.

i hope compute once not twice.

  was:
when id do clean data,one rdd according one value(>or <) filter data, and then 
generate two different set,one is error data file, another is errorless data 
file.

Now I use filter, but this filter must have two spark dag job, that cost too 
much.

exactly some code like iterator.span(preidicate) and then return one 
tuple(iter1,iter2)


> when i clean data ,I hope one rdd spill two rdd according clean data rule
> -
>
> Key: SPARK-38812
> URL: https://issues.apache.org/jira/browse/SPARK-38812
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: gaokui
>Priority: Major
>
> when id do clean data,one rdd according one value(>or <) filter data, and 
> then generate two different set,one is error data file, another is errorless 
> data file.
> Now I use filter, but this filter must have two spark dag job, that cost too 
> much.
> exactly some code like iterator.span(preidicate) and then return one 
> tuple(iter1,iter2)
> one dataset will be spilted tow dataset in one rule data clean progress.
> i hope compute once not twice.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38810) Upgrade Python black in /dev/reformat-python

2022-04-17 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523470#comment-17523470
 ] 

Hyukjin Kwon commented on SPARK-38810:
--

We can pin the version of {{click}} in dev/reformat-python if it needs a lot of 
changes in the codes. Feel free to go ahead and pin the version if you're 
interested in this [~bjornjorgensen].

> Upgrade Python black in /dev/reformat-python
> 
>
> Key: SPARK-38810
> URL: https://issues.apache.org/jira/browse/SPARK-38810
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> There are some problems with Python Black 
> [https://github.com/psf/black/issues/2964|https://github.com/psf/black/issues/2964]
> And we have BLACK_VERSION="21.12b0" hard coded in `/dev/reformat-python`
> python -m black
> Traceback (most recent call last):
>   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
> return _run_code(code, main_globals, None,
>   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
> exec(code, run_globals)
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__main__.py", 
> line 3, in 
> patched_main()
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__init__.py", 
> line 1372, in patched_main
> patch_click()
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__init__.py", 
> line 1358, in patch_click
> from click import _unicodefun
> ImportError: cannot import name '_unicodefun' from 'click' 
> (/usr/lib/python3.10/site-packages/click/__init__.py)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38904) Low cost DataFrame schema swap util

2022-04-17 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523469#comment-17523469
 ] 

Hyukjin Kwon commented on SPARK-38904:
--

It would be great to have such API actually. Feel free to go ahead for a PR if 
you're interested in that.

> Low cost DataFrame schema swap util
> ---
>
> Key: SPARK-38904
> URL: https://issues.apache.org/jira/browse/SPARK-38904
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Rafal Wojdyla
>Priority: Major
>
> This question is related to [https://stackoverflow.com/a/37090151/1661491]. 
> Let's assume I have a pyspark DataFrame with certain schema, and I would like 
> to overwrite that schema with a new schema that I *know* is compatible, I 
> could do:
> {code:python}
> df: DataFrame
> new_schema = ...
> df.rdd.toDF(schema=new_schema)
> {code}
> Unfortunately this triggers computation as described in the link above. Is 
> there a way to do that at the metadata level (or lazy), without eagerly 
> triggering computation or conversions?
> Edit, note:
>  * the schema can be arbitrarily complicated (nested etc)
>  * new schema includes updates to description, nullability and additional 
> metadata (bonus points for updates to the type)
>  * I would like to avoid writing a custom query expression generator, 
> *unless* there's one already built into Spark that can generate query based 
> on the schema/{{{}StructType{}}}
> Copied from: 
> [https://stackoverflow.com/questions/71610435/how-to-overwrite-pyspark-dataframe-schema-without-data-scan]
> See POC of workaround/util in 
> [https://github.com/ravwojdyla/spark-schema-utils]
> Also posted in 
> [https://lists.apache.org/thread/5ds0f7chzp1s3h10tvjm3r96g769rvpj]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38917) StreamingQuery.processAllAvailable() blocks forever on queries containing mapGroupsWithState(GroupStateTimeout.ProcessingTimeTimeout())

2022-04-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-38917:
-
Priority: Major  (was: Critical)

> StreamingQuery.processAllAvailable() blocks forever on queries containing 
> mapGroupsWithState(GroupStateTimeout.ProcessingTimeTimeout())
> ---
>
> Key: SPARK-38917
> URL: https://issues.apache.org/jira/browse/SPARK-38917
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.1.2
>Reporter: Trevor Christman
>Priority: Major
>
> StreamingQuery.processAllAvailable() blocks forever when called on queries 
> containing a mapGroupsWithState operation configured with 
> GroupStateTimeout.ProcessingTimeTimeout().
>  
> I think processAllAvailable() should unblock when all incoming data has been 
> processed AND when all existing groupStates do not have a current timeout 
> specified.
>  
> Sample code to demonstrate this failure follows:
> {code:java}
> def demoSparkProcessAllAvailableBug() : Unit = {
> val localSpark = SparkSession
>   .builder()
>   .master("local[*]")
>   .appName("demoSparkProcessAllAvailableBug")
>   .config("spark.driver.host", "localhost")
>   .getOrCreate()
> import localSpark.implicits._
> val demoDataStream = MemoryStream[BugDemo.NameNumberData](1, 
> localSpark.sqlContext)
> demoDataStream.addData(BugDemo.NameNumberData("Alice", 1))
> demoDataStream.addData(BugDemo.NameNumberData("Bob", 2))
> demoDataStream.addData(BugDemo.NameNumberData("Alice", 3))
> demoDataStream.addData(BugDemo.NameNumberData("Bob", 4))
> // StreamingQuery.processAllAvailable() is successful when executing 
> against NoTimeout,
> // but blocks forever when executing against EventTimeTimeout
> val timeoutTypes = List(GroupStateTimeout.NoTimeout(), 
> GroupStateTimeout.ProcessingTimeTimeout())
> for (timeoutType <- timeoutTypes) {
>   val totalByName = demoDataStream.toDF()
> .as[BugDemo.NameNumberData]
> .groupByKey(_.Name)
> .mapGroupsWithState(timeoutType)(BugDemo.summateRunningTotal)
>   val totalByNameQuery = totalByName
> .writeStream
> .format("console")
> .outputMode("update")
> .start()
>   println(s"${timeoutType} query starting to processAllAvailable()")
>   totalByNameQuery.processAllAvailable()
>   println(s"${timeoutType} query completed processAllAvailable()")
>   totalByNameQuery.stop()
> }
>   }
> }
> object BugDemo {
>   def summateRunningTotal(name: String, input: Iterator[NameNumberData], 
> groupState: GroupState[RunningTotal]): NameNumberData = {
> var currentTotal: Int = if (groupState.exists) {
>   groupState.get.Total
> } else {
>   0
> }
> for (nameNumberData <- input) {
>   currentTotal += nameNumberData.Number
> }
> groupState.update(RunningTotal(currentTotal))
> NameNumberData(name, currentTotal)
>   }
>   case class NameNumberData(
> Name: String,
> Number: Integer
>   )
>   case class RunningTotal(
> Total: Integer
>   )
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38816) Wrong comment in random matrix generator in spark-als algorithm

2022-04-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38816.
--
Fix Version/s: 3.3.0
   3.2.2
   3.1.3
   Resolution: Fixed

Issue resolved by pull request 36228
[https://github.com/apache/spark/pull/36228]

> Wrong comment in random matrix generator in spark-als algorithm 
> 
>
> Key: SPARK-38816
> URL: https://issues.apache.org/jira/browse/SPARK-38816
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.1, 3.1.2, 3.2.1
>Reporter: Nickolay
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 3.3.0, 3.2.2, 3.1.3
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In algorithm Spark ALS we need initialize nonegative factor matricies for 
> users and items. 
> In ALS:
>  
> {code:java}
> private def initialize[ID](
> inBlocks: RDD[(Int, InBlock[ID])],
> rank: Int,
> seed: Long): RDD[(Int, FactorBlock)] = {
>   // Choose a unit vector uniformly at random from the unit sphere, but from 
> the
>   // "first quadrant" where all elements are nonnegative. This can be done by 
> choosing
>   // elements distributed as Normal(0,1) and taking the absolute value, and 
> then normalizing.
>   // This appears to create factorizations that have a slightly better 
> reconstruction
>   // (<1%) compared picking elements uniformly at random in [0,1].
>   inBlocks.mapPartitions({ iter =>
> iter.map {
>   case (srcBlockId, inBlock) =>
> val random: XORShiftRandom = new XORShiftRandom(byteswap64(seed ^ 
> srcBlockId))
> val factors: Array[Array[Float]] = Array.fill(inBlock.srcIds.length) {
>   val factor = Array.fill(rank)(random.nextGaussian().toFloat)
>   val nrm: Float = blas.snrm2(rank, factor, 1)
>   blas.sscal(rank, 1.0f / nrm, factor, 1)
>   factor
> }
> (srcBlockId, factors)
> }
>   }, preservesPartitioning = true)
> } {code}
> In the comments, the author writes that we are generating a matrix filled 
> with positive numbers. In the code we use random.nextGaussian().toFloat. But 
> if we look at the documentation of the nextGaussian method, we can see that 
> it also returns negative numbers: 
>  
> {code:java}
> /** 
> * @return the next pseudorandom, Gaussian ("normally") distributed
>  * {@code double} value with mean {@code 0.0} and
>  * standard deviation {@code 1.0} from this random number
>  * generator's sequence
>  */
> synchronized public double nextGaussian() {
> // See Knuth, ACP, Section 3.4.1 Algorithm C.
> if (haveNextNextGaussian) {
> haveNextNextGaussian = false;
> return nextNextGaussian;
> } else {
> double v1, v2, s;
> do {
> v1 = 2 * nextDouble() - 1; // between -1 and 1
> v2 = 2 * nextDouble() - 1; // between -1 and 1
> s = v1 * v1 + v2 * v2;
> } while (s >= 1 || s == 0);
> double multiplier = StrictMath.sqrt(-2 * StrictMath.log(s)/s);
> nextNextGaussian = v2 * multiplier;
> haveNextNextGaussian = true;
> return v1 * multiplier;
> }
> }
>  {code}
>  
> The result is a matrix with negative values



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38816) Wrong comment in random matrix generator in spark-als algorithm

2022-04-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-38816:


Assignee: Sean R. Owen

> Wrong comment in random matrix generator in spark-als algorithm 
> 
>
> Key: SPARK-38816
> URL: https://issues.apache.org/jira/browse/SPARK-38816
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.1, 3.1.2, 3.2.1
>Reporter: Nickolay
>Assignee: Sean R. Owen
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In algorithm Spark ALS we need initialize nonegative factor matricies for 
> users and items. 
> In ALS:
>  
> {code:java}
> private def initialize[ID](
> inBlocks: RDD[(Int, InBlock[ID])],
> rank: Int,
> seed: Long): RDD[(Int, FactorBlock)] = {
>   // Choose a unit vector uniformly at random from the unit sphere, but from 
> the
>   // "first quadrant" where all elements are nonnegative. This can be done by 
> choosing
>   // elements distributed as Normal(0,1) and taking the absolute value, and 
> then normalizing.
>   // This appears to create factorizations that have a slightly better 
> reconstruction
>   // (<1%) compared picking elements uniformly at random in [0,1].
>   inBlocks.mapPartitions({ iter =>
> iter.map {
>   case (srcBlockId, inBlock) =>
> val random: XORShiftRandom = new XORShiftRandom(byteswap64(seed ^ 
> srcBlockId))
> val factors: Array[Array[Float]] = Array.fill(inBlock.srcIds.length) {
>   val factor = Array.fill(rank)(random.nextGaussian().toFloat)
>   val nrm: Float = blas.snrm2(rank, factor, 1)
>   blas.sscal(rank, 1.0f / nrm, factor, 1)
>   factor
> }
> (srcBlockId, factors)
> }
>   }, preservesPartitioning = true)
> } {code}
> In the comments, the author writes that we are generating a matrix filled 
> with positive numbers. In the code we use random.nextGaussian().toFloat. But 
> if we look at the documentation of the nextGaussian method, we can see that 
> it also returns negative numbers: 
>  
> {code:java}
> /** 
> * @return the next pseudorandom, Gaussian ("normally") distributed
>  * {@code double} value with mean {@code 0.0} and
>  * standard deviation {@code 1.0} from this random number
>  * generator's sequence
>  */
> synchronized public double nextGaussian() {
> // See Knuth, ACP, Section 3.4.1 Algorithm C.
> if (haveNextNextGaussian) {
> haveNextNextGaussian = false;
> return nextNextGaussian;
> } else {
> double v1, v2, s;
> do {
> v1 = 2 * nextDouble() - 1; // between -1 and 1
> v2 = 2 * nextDouble() - 1; // between -1 and 1
> s = v1 * v1 + v2 * v2;
> } while (s >= 1 || s == 0);
> double multiplier = StrictMath.sqrt(-2 * StrictMath.log(s)/s);
> nextNextGaussian = v2 * multiplier;
> haveNextNextGaussian = true;
> return v1 * multiplier;
> }
> }
>  {code}
>  
> The result is a matrix with negative values



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38927) Skip NumPy/Pandas tests in `test_rdd.py` if not available

2022-04-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38927:
--
Affects Version/s: 3.3.0
   3.2.2

> Skip NumPy/Pandas tests in `test_rdd.py` if not available
> -
>
> Key: SPARK-38927
> URL: https://issues.apache.org/jira/browse/SPARK-38927
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 3.3.0, 3.2.2, 3.4.0
>Reporter: William Hyun
>Assignee: William Hyun
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38927) Skip NumPy/Pandas tests in `test_rdd.py` if not available

2022-04-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-38927:
-

Assignee: William Hyun

> Skip NumPy/Pandas tests in `test_rdd.py` if not available
> -
>
> Key: SPARK-38927
> URL: https://issues.apache.org/jira/browse/SPARK-38927
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Assignee: William Hyun
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38927) Skip NumPy/Pandas tests in `test_rdd.py` if not available

2022-04-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-38927.
---
Fix Version/s: 3.3.0
   3.2.2
   Resolution: Fixed

Issue resolved by pull request 36235
[https://github.com/apache/spark/pull/36235]

> Skip NumPy/Pandas tests in `test_rdd.py` if not available
> -
>
> Key: SPARK-38927
> URL: https://issues.apache.org/jira/browse/SPARK-38927
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38927) Skip NumPy/Pandas tests in `test_rdd.py` if not available

2022-04-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523431#comment-17523431
 ] 

Apache Spark commented on SPARK-38927:
--

User 'williamhyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36235

> Skip NumPy/Pandas tests in `test_rdd.py` if not available
> -
>
> Key: SPARK-38927
> URL: https://issues.apache.org/jira/browse/SPARK-38927
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38927) Skip NumPy/Pandas tests in `test_rdd.py` if not available

2022-04-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38927:


Assignee: Apache Spark

> Skip NumPy/Pandas tests in `test_rdd.py` if not available
> -
>
> Key: SPARK-38927
> URL: https://issues.apache.org/jira/browse/SPARK-38927
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38927) Skip NumPy/Pandas tests in `test_rdd.py` if not available

2022-04-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38927:


Assignee: (was: Apache Spark)

> Skip NumPy/Pandas tests in `test_rdd.py` if not available
> -
>
> Key: SPARK-38927
> URL: https://issues.apache.org/jira/browse/SPARK-38927
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: William Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38927) Skip NumPy/Pandas tests in `test_rdd.py` if not available

2022-04-17 Thread William Hyun (Jira)
William Hyun created SPARK-38927:


 Summary: Skip NumPy/Pandas tests in `test_rdd.py` if not available
 Key: SPARK-38927
 URL: https://issues.apache.org/jira/browse/SPARK-38927
 Project: Spark
  Issue Type: Test
  Components: PySpark, Tests
Affects Versions: 3.4.0
Reporter: William Hyun






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-38810) Upgrade Python black in /dev/reformat-python

2022-04-17 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-38810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523427#comment-17523427
 ] 

Bjørn Jørgensen edited comment on SPARK-38810 at 4/17/22 6:34 PM:
--

You are right Sean, it`s not a Spark bug. But now that Black is updated, and 
the script /dev/reformat-python only work with BLACK_VERSION="21.12b0" it`s a 
problem for us that have a newer version of Black. 

The title her can be "Make /dev/reformat-python work with newer BLACK_VERSION`s 
then 21.12b0" 

I have changed this from a bug to improvement. 


was (Author: bjornjorgensen):
You are right Sean, it`s not a Spark bug. But now that Black is updated, and 
the script /dev/reformat-python only work with BLACK_VERSION="21.12b0" it`s a 
problem for us that have a newer version of Black. 

I have changed this from a bug to improvement. 

> Upgrade Python black in /dev/reformat-python
> 
>
> Key: SPARK-38810
> URL: https://issues.apache.org/jira/browse/SPARK-38810
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> There are some problems with Python Black 
> [https://github.com/psf/black/issues/2964|https://github.com/psf/black/issues/2964]
> And we have BLACK_VERSION="21.12b0" hard coded in `/dev/reformat-python`
> python -m black
> Traceback (most recent call last):
>   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
> return _run_code(code, main_globals, None,
>   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
> exec(code, run_globals)
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__main__.py", 
> line 3, in 
> patched_main()
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__init__.py", 
> line 1372, in patched_main
> patch_click()
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__init__.py", 
> line 1358, in patch_click
> from click import _unicodefun
> ImportError: cannot import name '_unicodefun' from 'click' 
> (/usr/lib/python3.10/site-packages/click/__init__.py)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-38810) Upgrade Python black in /dev/reformat-python

2022-04-17 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-38810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523427#comment-17523427
 ] 

Bjørn Jørgensen edited comment on SPARK-38810 at 4/17/22 6:27 PM:
--

You are right Sean, it`s not a Spark bug. But now that Black is updated, and 
the script /dev/reformat-python only work with BLACK_VERSION="21.12b0" it`s a 
problem for us that have a newer version of Black. 

I have changed this from a bug to improvement. 


was (Author: bjornjorgensen):
You are right Sean it not a Spark bug. But now that Black is updated. And the 
script /dev/reformat-python only work with BLACK_VERSION="21.12b0" it`s a 
problem for us that have a newer version of Black. 

I have changed this from a bug to improvement. 

> Upgrade Python black in /dev/reformat-python
> 
>
> Key: SPARK-38810
> URL: https://issues.apache.org/jira/browse/SPARK-38810
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> There are some problems with Python Black 
> [https://github.com/psf/black/issues/2964|https://github.com/psf/black/issues/2964]
> And we have BLACK_VERSION="21.12b0" hard coded in `/dev/reformat-python`
> python -m black
> Traceback (most recent call last):
>   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
> return _run_code(code, main_globals, None,
>   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
> exec(code, run_globals)
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__main__.py", 
> line 3, in 
> patched_main()
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__init__.py", 
> line 1372, in patched_main
> patch_click()
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__init__.py", 
> line 1358, in patch_click
> from click import _unicodefun
> ImportError: cannot import name '_unicodefun' from 'click' 
> (/usr/lib/python3.10/site-packages/click/__init__.py)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38810) Upgrade Python black in /dev/reformat-python

2022-04-17 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-38810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523427#comment-17523427
 ] 

Bjørn Jørgensen commented on SPARK-38810:
-

You are right Sean it not a Spark bug. But now that Black is updated. And the 
script /dev/reformat-python only work with BLACK_VERSION="21.12b0" it`s a 
problem for us that have a newer version of Black. 

I have changed this from a bug to improvement. 

> Upgrade Python black in /dev/reformat-python
> 
>
> Key: SPARK-38810
> URL: https://issues.apache.org/jira/browse/SPARK-38810
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> There are some problems with Python Black 
> [https://github.com/psf/black/issues/2964|https://github.com/psf/black/issues/2964]
> And we have BLACK_VERSION="21.12b0" hard coded in `/dev/reformat-python`
> python -m black
> Traceback (most recent call last):
>   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
> return _run_code(code, main_globals, None,
>   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
> exec(code, run_globals)
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__main__.py", 
> line 3, in 
> patched_main()
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__init__.py", 
> line 1372, in patched_main
> patch_click()
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__init__.py", 
> line 1358, in patch_click
> from click import _unicodefun
> ImportError: cannot import name '_unicodefun' from 'click' 
> (/usr/lib/python3.10/site-packages/click/__init__.py)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38924) Update dataTables to 1.10.25 for security issue

2022-04-17 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-38924:
--

Assignee: Sean R. Owen

> Update dataTables to 1.10.25 for security issue
> ---
>
> Key: SPARK-38924
> URL: https://issues.apache.org/jira/browse/SPARK-38924
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Minor
>
> https://nvd.nist.gov/vuln/detail/CVE-2020-28458 affects datatables up to 
> 1.10.21 and we're on 1.10.20. It may or may not affect Spark, but updating to 
> 1.10.25 at least should be easy



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38924) Update dataTables to 1.10.25 for security issue

2022-04-17 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-38924.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 36226
[https://github.com/apache/spark/pull/36226]

> Update dataTables to 1.10.25 for security issue
> ---
>
> Key: SPARK-38924
> URL: https://issues.apache.org/jira/browse/SPARK-38924
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 3.3.0
>
>
> https://nvd.nist.gov/vuln/detail/CVE-2020-28458 affects datatables up to 
> 1.10.21 and we're on 1.10.20. It may or may not affect Spark, but updating to 
> 1.10.25 at least should be easy



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38810) Upgrade Python black in /dev/reformat-python

2022-04-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-38810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bjørn Jørgensen updated SPARK-38810:

Issue Type: Improvement  (was: Bug)

> Upgrade Python black in /dev/reformat-python
> 
>
> Key: SPARK-38810
> URL: https://issues.apache.org/jira/browse/SPARK-38810
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> There are some problems with Python Black 
> [https://github.com/psf/black/issues/2964|https://github.com/psf/black/issues/2964]
> And we have BLACK_VERSION="21.12b0" hard coded in `/dev/reformat-python`
> python -m black
> Traceback (most recent call last):
>   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
> return _run_code(code, main_globals, None,
>   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
> exec(code, run_globals)
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__main__.py", 
> line 3, in 
> patched_main()
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__init__.py", 
> line 1372, in patched_main
> patch_click()
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__init__.py", 
> line 1358, in patch_click
> from click import _unicodefun
> ImportError: cannot import name '_unicodefun' from 'click' 
> (/usr/lib/python3.10/site-packages/click/__init__.py)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-38810) Upgrade Python black in /dev/reformat-python

2022-04-17 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-38810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523425#comment-17523425
 ] 

Bjørn Jørgensen edited comment on SPARK-38810 at 4/17/22 6:21 PM:
--

[~srowen] Yes, I think so. Black is updated, but the update did have some 
problems. And I haven't seen any fix for this yet. 

One thing we can do is to change BLACK_VERSION="21.12b0" to 
BLACK_VERSION=>"21.12b0"




was (Author: bjornjorgensen):
[~srowen] Yes, I think so. Black is updated, but the update did have some 
problems. And I have seen any fix for this yet. 

One thing we can do is to change BLACK_VERSION="21.12b0" to 
BLACK_VERSION=>"21.12b0"



> Upgrade Python black in /dev/reformat-python
> 
>
> Key: SPARK-38810
> URL: https://issues.apache.org/jira/browse/SPARK-38810
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> There are some problems with Python Black 
> [https://github.com/psf/black/issues/2964|https://github.com/psf/black/issues/2964]
> And we have BLACK_VERSION="21.12b0" hard coded in `/dev/reformat-python`
> python -m black
> Traceback (most recent call last):
>   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
> return _run_code(code, main_globals, None,
>   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
> exec(code, run_globals)
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__main__.py", 
> line 3, in 
> patched_main()
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__init__.py", 
> line 1372, in patched_main
> patch_click()
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__init__.py", 
> line 1358, in patch_click
> from click import _unicodefun
> ImportError: cannot import name '_unicodefun' from 'click' 
> (/usr/lib/python3.10/site-packages/click/__init__.py)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38810) Upgrade Python black in /dev/reformat-python

2022-04-17 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-38810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523425#comment-17523425
 ] 

Bjørn Jørgensen commented on SPARK-38810:
-

[~srowen] Yes, I think so. Black is updated, but the update did have some 
problems. And I have seen any fix for this yet. 

One thing we can do is to change BLACK_VERSION="21.12b0" to 
BLACK_VERSION=>"21.12b0"



> Upgrade Python black in /dev/reformat-python
> 
>
> Key: SPARK-38810
> URL: https://issues.apache.org/jira/browse/SPARK-38810
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> There are some problems with Python Black 
> [https://github.com/psf/black/issues/2964|https://github.com/psf/black/issues/2964]
> And we have BLACK_VERSION="21.12b0" hard coded in `/dev/reformat-python`
> python -m black
> Traceback (most recent call last):
>   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
> return _run_code(code, main_globals, None,
>   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
> exec(code, run_globals)
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__main__.py", 
> line 3, in 
> patched_main()
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__init__.py", 
> line 1372, in patched_main
> patch_click()
>   File "/home/bjorn/.local/lib/python3.10/site-packages/black/__init__.py", 
> line 1358, in patch_click
> from click import _unicodefun
> ImportError: cannot import name '_unicodefun' from 'click' 
> (/usr/lib/python3.10/site-packages/click/__init__.py)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38409) PrometheusServlet exports gauges with null values

2022-04-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523401#comment-17523401
 ] 

Apache Spark commented on SPARK-38409:
--

User 'jkylling' has created a pull request for this issue:
https://github.com/apache/spark/pull/36234

> PrometheusServlet exports gauges with null values
> -
>
> Key: SPARK-38409
> URL: https://issues.apache.org/jira/browse/SPARK-38409
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Jonas
>Priority: Major
>
> [PrometheusServlet|#L63] exports gauges with value null as {{{}gauge_name{} 
> null{}}}. This can for instance happen if one initializes a 
> [DefaultSettableGauge|https://www.javadoc.io/static/io.dropwizard.metrics/metrics-core/4.2.0-beta.3/com/codahale/metrics/DefaultSettableGauge.html]
>  without a default value. Null values are not scraped by Prometheus as null 
> is not a valid floating point value. A null value prevents any metrics from 
> being scraped and the endpoint will be considered to be down by Prometheus. A 
> solution could be to omit listing the gauge if its value is null.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38409) PrometheusServlet exports gauges with null values

2022-04-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38409:


Assignee: (was: Apache Spark)

> PrometheusServlet exports gauges with null values
> -
>
> Key: SPARK-38409
> URL: https://issues.apache.org/jira/browse/SPARK-38409
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Jonas
>Priority: Major
>
> [PrometheusServlet|#L63] exports gauges with value null as {{{}gauge_name{} 
> null{}}}. This can for instance happen if one initializes a 
> [DefaultSettableGauge|https://www.javadoc.io/static/io.dropwizard.metrics/metrics-core/4.2.0-beta.3/com/codahale/metrics/DefaultSettableGauge.html]
>  without a default value. Null values are not scraped by Prometheus as null 
> is not a valid floating point value. A null value prevents any metrics from 
> being scraped and the endpoint will be considered to be down by Prometheus. A 
> solution could be to omit listing the gauge if its value is null.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38409) PrometheusServlet exports gauges with null values

2022-04-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523400#comment-17523400
 ] 

Apache Spark commented on SPARK-38409:
--

User 'jkylling' has created a pull request for this issue:
https://github.com/apache/spark/pull/36234

> PrometheusServlet exports gauges with null values
> -
>
> Key: SPARK-38409
> URL: https://issues.apache.org/jira/browse/SPARK-38409
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Jonas
>Priority: Major
>
> [PrometheusServlet|#L63] exports gauges with value null as {{{}gauge_name{} 
> null{}}}. This can for instance happen if one initializes a 
> [DefaultSettableGauge|https://www.javadoc.io/static/io.dropwizard.metrics/metrics-core/4.2.0-beta.3/com/codahale/metrics/DefaultSettableGauge.html]
>  without a default value. Null values are not scraped by Prometheus as null 
> is not a valid floating point value. A null value prevents any metrics from 
> being scraped and the endpoint will be considered to be down by Prometheus. A 
> solution could be to omit listing the gauge if its value is null.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38409) PrometheusServlet exports gauges with null values

2022-04-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38409:


Assignee: Apache Spark

> PrometheusServlet exports gauges with null values
> -
>
> Key: SPARK-38409
> URL: https://issues.apache.org/jira/browse/SPARK-38409
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Jonas
>Assignee: Apache Spark
>Priority: Major
>
> [PrometheusServlet|#L63] exports gauges with value null as {{{}gauge_name{} 
> null{}}}. This can for instance happen if one initializes a 
> [DefaultSettableGauge|https://www.javadoc.io/static/io.dropwizard.metrics/metrics-core/4.2.0-beta.3/com/codahale/metrics/DefaultSettableGauge.html]
>  without a default value. Null values are not scraped by Prometheus as null 
> is not a valid floating point value. A null value prevents any metrics from 
> being scraped and the endpoint will be considered to be down by Prometheus. A 
> solution could be to omit listing the gauge if its value is null.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38913) Output identifiers in error messages in SQL style

2022-04-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-38913.
--
Fix Version/s: (was: 3.3.0)
   Resolution: Fixed

Issue resolved by pull request 36210
[https://github.com/apache/spark/pull/36210]

> Output identifiers in error messages in SQL style
> -
>
> Key: SPARK-38913
> URL: https://issues.apache.org/jira/browse/SPARK-38913
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> All identifiers like table names should be printed in SQL style in error 
> messages. For example, table name db.tbl should be highlighted as `db`.`tbl` 
> to make it more visible in error messages.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38485) Non-deterministic UDF executed multiple times when combined with withField

2022-04-17 Thread Tanel Kiis (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523363#comment-17523363
 ] 

Tanel Kiis commented on SPARK-38485:


Is there then even any point in having non-deterministic methods in spark? Some 
optimizations are disabled for them do avoid similar situations.

> Non-deterministic UDF executed multiple times when combined with withField
> --
>
> Key: SPARK-38485
> URL: https://issues.apache.org/jira/browse/SPARK-38485
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Tanel Kiis
>Priority: Major
>  Labels: Correctness
>
> When adding fields to a result of a non-deterministic UDF, that returns a 
> struct, then that UDF is executed multiple times (once per field) for each 
> row.
> In this UT df1 passes, but df2 fails with something like:
> "279751724 did not equal -1023188908"
> {code}
>   test("SPARK-X: non-deterministic UDF should be called once when adding 
> fields") {
> val nondeterministicUDF = udf((s: Int) => {
>   val r = Random.nextInt()
>   // Both values should be the same
>   GroupByKey(r, r)
> }).asNondeterministic()
> val df1 = spark.range(5).select(nondeterministicUDF($"id"))
> df1.collect().foreach {
>   row => assert(row.getStruct(0).getInt(0) == row.getStruct(0).getInt(1))
> }
> val df2 = 
> spark.range(5).select(nondeterministicUDF($"id").withField("new", lit(7)))
> df2.collect().foreach {
>   row => assert(row.getStruct(0).getInt(0) == row.getStruct(0).getInt(1))
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38826) dropFieldIfAllNull option does not work for empty JSON struct

2022-04-17 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-38826:


Assignee: morvenhuang

> dropFieldIfAllNull option does not work for empty JSON struct
> -
>
> Key: SPARK-38826
> URL: https://issues.apache.org/jira/browse/SPARK-38826
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.1
>Reporter: morvenhuang
>Assignee: morvenhuang
>Priority: Trivial
>
> As stated in the doc, 
> {quote}dropFieldIfAllNull
> Whether to ignore column of all null values or empty array/struct during 
> schema inference.
>  
> {quote}
> But when I try this, 
>  
> {code:java}
> String json = "{\"field1\":\"value1\", \"field2\":{}}";
> JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
> JavaRDD jrdd = jsc.parallelize(Arrays.asList(json));
> Dataset df = spark.read().option("dropFieldIfAllNull", 
> "false").json(jrdd);
> df.printSchema();
> {code}
>  
> I get this, 
> {code:java}
> root
>  |-- field1: string (nullable = true){code}
> Notice field2 is still missing even when dropFieldIfAllNull is set to false, 
> so apparently, this option does not work for empty struct.
> This is due to SPARK-8093, the empty struct will be dropped anyway.
> I think we should update the doc, otherwise it would be confusing.
> I can make a patch for this.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38826) dropFieldIfAllNull option does not work for empty JSON struct

2022-04-17 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-38826.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36111
[https://github.com/apache/spark/pull/36111]

> dropFieldIfAllNull option does not work for empty JSON struct
> -
>
> Key: SPARK-38826
> URL: https://issues.apache.org/jira/browse/SPARK-38826
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.1
>Reporter: morvenhuang
>Assignee: morvenhuang
>Priority: Trivial
> Fix For: 3.4.0
>
>
> As stated in the doc, 
> {quote}dropFieldIfAllNull
> Whether to ignore column of all null values or empty array/struct during 
> schema inference.
>  
> {quote}
> But when I try this, 
>  
> {code:java}
> String json = "{\"field1\":\"value1\", \"field2\":{}}";
> JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
> JavaRDD jrdd = jsc.parallelize(Arrays.asList(json));
> Dataset df = spark.read().option("dropFieldIfAllNull", 
> "false").json(jrdd);
> df.printSchema();
> {code}
>  
> I get this, 
> {code:java}
> root
>  |-- field1: string (nullable = true){code}
> Notice field2 is still missing even when dropFieldIfAllNull is set to false, 
> so apparently, this option does not work for empty struct.
> This is due to SPARK-8093, the empty struct will be dropped anyway.
> I think we should update the doc, otherwise it would be confusing.
> I can make a patch for this.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38826) dropFieldIfAllNull option does not work for empty JSON struct

2022-04-17 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523358#comment-17523358
 ] 

Sean R. Owen commented on SPARK-38826:
--

Yeah I guess we can at least update the docs; I'm unclear on whether the 
behavior is wrong or right

> dropFieldIfAllNull option does not work for empty JSON struct
> -
>
> Key: SPARK-38826
> URL: https://issues.apache.org/jira/browse/SPARK-38826
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.1
>Reporter: morvenhuang
>Priority: Trivial
>
> As stated in the doc, 
> {quote}dropFieldIfAllNull
> Whether to ignore column of all null values or empty array/struct during 
> schema inference.
>  
> {quote}
> But when I try this, 
>  
> {code:java}
> String json = "{\"field1\":\"value1\", \"field2\":{}}";
> JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
> JavaRDD jrdd = jsc.parallelize(Arrays.asList(json));
> Dataset df = spark.read().option("dropFieldIfAllNull", 
> "false").json(jrdd);
> df.printSchema();
> {code}
>  
> I get this, 
> {code:java}
> root
>  |-- field1: string (nullable = true){code}
> Notice field2 is still missing even when dropFieldIfAllNull is set to false, 
> so apparently, this option does not work for empty struct.
> This is due to SPARK-8093, the empty struct will be dropped anyway.
> I think we should update the doc, otherwise it would be confusing.
> I can make a patch for this.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38640) NPE with unpersisting memory-only RDD with RDD fetching from shuffle service enabled

2022-04-17 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-38640.
--
Fix Version/s: 3.3.0
 Assignee: Adam Binford
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/35959

> NPE with unpersisting memory-only RDD with RDD fetching from shuffle service 
> enabled
> 
>
> Key: SPARK-38640
> URL: https://issues.apache.org/jira/browse/SPARK-38640
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Adam Binford
>Assignee: Adam Binford
>Priority: Major
> Fix For: 3.3.0
>
>
> If you have RDD fetching from shuffle service enabled, memory-only cached 
> RDDs will fail to unpersist.
>  
>  
> {code:java}
> // spark.shuffle.service.fetch.rdd.enabled=true
> val df = spark.range(5)
>   .persist(StorageLevel.MEMORY_ONLY)
> df.count()
> df.unpersist(true)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38926) Output types in error messages in SQL style

2022-04-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523305#comment-17523305
 ] 

Apache Spark commented on SPARK-38926:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/36233

> Output types in error messages in SQL style
> ---
>
> Key: SPARK-38926
> URL: https://issues.apache.org/jira/browse/SPARK-38926
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
>
> All types should be printed in SQL style in error messages. For example, the 
> type DateType should be highlighted as DATE to make it more visible in error 
> messages.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38926) Output types in error messages in SQL style

2022-04-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38926:


Assignee: Apache Spark  (was: Max Gekk)

> Output types in error messages in SQL style
> ---
>
> Key: SPARK-38926
> URL: https://issues.apache.org/jira/browse/SPARK-38926
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
>
> All types should be printed in SQL style in error messages. For example, the 
> type DateType should be highlighted as DATE to make it more visible in error 
> messages.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38926) Output types in error messages in SQL style

2022-04-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523304#comment-17523304
 ] 

Apache Spark commented on SPARK-38926:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/36233

> Output types in error messages in SQL style
> ---
>
> Key: SPARK-38926
> URL: https://issues.apache.org/jira/browse/SPARK-38926
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
>
> All types should be printed in SQL style in error messages. For example, the 
> type DateType should be highlighted as DATE to make it more visible in error 
> messages.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38926) Output types in error messages in SQL style

2022-04-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38926:


Assignee: Max Gekk  (was: Apache Spark)

> Output types in error messages in SQL style
> ---
>
> Key: SPARK-38926
> URL: https://issues.apache.org/jira/browse/SPARK-38926
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
>
> All types should be printed in SQL style in error messages. For example, the 
> type DateType should be highlighted as DATE to make it more visible in error 
> messages.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38926) Output types in error messages in SQL style

2022-04-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-38926:
-
Description: All types should be printed in SQL style in error messages. 
For example, the type DateType should be highlighted as DATE to make it more 
visible in error messages.  (was: All identifiers like table names should be 
printed in SQL style in error messages. For example, table name db.tbl should 
be highlighted as `db`.`tbl` to make it more visible in error messages.)

> Output types in error messages in SQL style
> ---
>
> Key: SPARK-38926
> URL: https://issues.apache.org/jira/browse/SPARK-38926
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
>
> All types should be printed in SQL style in error messages. For example, the 
> type DateType should be highlighted as DATE to make it more visible in error 
> messages.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38926) Output types in error messages in SQL style

2022-04-17 Thread Max Gekk (Jira)
Max Gekk created SPARK-38926:


 Summary: Output types in error messages in SQL style
 Key: SPARK-38926
 URL: https://issues.apache.org/jira/browse/SPARK-38926
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0, 3.4.0
Reporter: Max Gekk
Assignee: Max Gekk
 Fix For: 3.3.0, 3.4.0


All identifiers like table names should be printed in SQL style in error 
messages. For example, table name db.tbl should be highlighted as `db`.`tbl` to 
make it more visible in error messages.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38492) Improve the test coverage for PySpark

2022-04-17 Thread pralabhkumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523293#comment-17523293
 ] 

pralabhkumar commented on SPARK-38492:
--

on it . Thx

> Improve the test coverage for PySpark
> -
>
> Key: SPARK-38492
> URL: https://issues.apache.org/jira/browse/SPARK-38492
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, Tests
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, PySpark test coverage is around 91% according to codecov report: 
> [https://app.codecov.io/gh/apache/spark|https://app.codecov.io/gh/apache/spark]
> Since there are still 9% missing tests, so I think it would be great to 
> improve our test coverage.
> Of course we might not target to 100%, but as much as possible, to the level 
> that we can currently cover with CI.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org