[jira] [Assigned] (SPARK-44280) Add convertJavaTimestampToTimestamp in JDBCDialect API

2023-08-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-44280:
---

Assignee: Mingkang Li

> Add convertJavaTimestampToTimestamp in JDBCDialect API
> --
>
> Key: SPARK-44280
> URL: https://issues.apache.org/jira/browse/SPARK-44280
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Mingkang Li
>Assignee: Mingkang Li
>Priority: Major
> Fix For: 3.5.0
>
>
> A new method, {{{}convertJavaTimestampToTimestamp{}}}, is introduced to the 
> JDBCDialects API, providing the capability for JDBC dialects to override the 
> default Java timestamp conversion behavior. This enhancement is particularly 
> beneficial for databases such as PostgreSQL, which feature special values for 
> timestamps representing positive and negative infinity. 
> The pre-existing default behavior of timestamp conversion potentially 
> triggers an overflow due to these special values (i.e. The executor would 
> crash if you select a column that contains infinity timestamps in 
> PostgreSQL.) By integrating this new function, we can mitigate such issues, 
> enabling more versatile and robust timestamp value conversions across various 
> JDBC-based connectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44280) Add convertJavaTimestampToTimestamp in JDBCDialect API

2023-08-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-44280.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41843
[https://github.com/apache/spark/pull/41843]

> Add convertJavaTimestampToTimestamp in JDBCDialect API
> --
>
> Key: SPARK-44280
> URL: https://issues.apache.org/jira/browse/SPARK-44280
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Mingkang Li
>Priority: Major
> Fix For: 3.5.0
>
>
> A new method, {{{}convertJavaTimestampToTimestamp{}}}, is introduced to the 
> JDBCDialects API, providing the capability for JDBC dialects to override the 
> default Java timestamp conversion behavior. This enhancement is particularly 
> beneficial for databases such as PostgreSQL, which feature special values for 
> timestamps representing positive and negative infinity. 
> The pre-existing default behavior of timestamp conversion potentially 
> triggers an overflow due to these special values (i.e. The executor would 
> crash if you select a column that contains infinity timestamps in 
> PostgreSQL.) By integrating this new function, we can mitigate such issues, 
> enabling more versatile and robust timestamp value conversions across various 
> JDBC-based connectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44627) org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils#resultSetToRows produces wrong data

2023-08-01 Thread Min Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhao updated SPARK-44627:
-
Description: 
When the resultSet exists a timestmp column and it's value is null, but column 
define is not null. In the row it generates, this column will use the value of 
the same column in the previous row.  

 

In mysql, if a datetime column is defined, meanwhile it is not null. When a 
value is '-00-00 00:00:00', mysql provided a property of 
zeroDateTimeBehavior, it will return null.  

table define:
CREATE TABLE `test_timestamp` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键id',
`unbind_time` datetime NOT NULL DEFAULT '-00-00 00:00:00' ,
PRIMARY KEY (`id`)
)
example:

the value of resultSet

1, 2023-01-01 12:00:00

2, null

 

the value of row

1, 2023-01-01 12:00:00

2, 2023-01-01 12:00:00

 

  was:
When the resultSet exists a timestmp column and it's value is null, In the row 
it generates, this column will use the value of the same column in the previous 
row.  

 

In mysql, if a datetime column is defined, meanwhile it is not null. When a 
value is '-00-00 00:00:00', mysql provided a property of 
zeroDateTimeBehavior, it will return null.  

table define:
CREATE TABLE `test_timestamp` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键id',
`unbind_time` datetime NOT NULL DEFAULT '-00-00 00:00:00' ,
PRIMARY KEY (`id`)
)
example:

the value of resultSet

1, 2023-01-01 12:00:00

2, null

 

the value of row

1, 2023-01-01 12:00:00

2, 2023-01-01 12:00:00

 


> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils#resultSetToRows 
> produces wrong data
> -
>
> Key: SPARK-44627
> URL: https://issues.apache.org/jira/browse/SPARK-44627
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2, 3.3.1
>Reporter: Min Zhao
>Priority: Minor
> Attachments: image-2023-08-02-14-01-54-447.png
>
>
> When the resultSet exists a timestmp column and it's value is null, but 
> column define is not null. In the row it generates, this column will use the 
> value of the same column in the previous row.  
>  
> In mysql, if a datetime column is defined, meanwhile it is not null. When a 
> value is '-00-00 00:00:00', mysql provided a property of 
> zeroDateTimeBehavior, it will return null.  
> table define:
> CREATE TABLE `test_timestamp` (
> `id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键id',
> `unbind_time` datetime NOT NULL DEFAULT '-00-00 00:00:00' ,
> PRIMARY KEY (`id`)
> )
> example:
> the value of resultSet
> 1, 2023-01-01 12:00:00
> 2, null
>  
> the value of row
> 1, 2023-01-01 12:00:00
> 2, 2023-01-01 12:00:00
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44572) Clean up unused installers ASAP

2023-08-01 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-44572:
--
Summary: Clean up unused installers ASAP  (was: Clean up unused installer 
ASAP)

> Clean up unused installers ASAP
> ---
>
> Key: SPARK-44572
> URL: https://issues.apache.org/jira/browse/SPARK-44572
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44572) Clean up unused installer ASAP

2023-08-01 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-44572:
--
Summary: Clean up unused installer ASAP  (was: Clean up unused files ASAP)

> Clean up unused installer ASAP
> --
>
> Key: SPARK-44572
> URL: https://issues.apache.org/jira/browse/SPARK-44572
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43043) Improve the performance of MapOutputTracker.updateMapOutput

2023-08-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-43043:
-
Fix Version/s: 3.5.0
   (was: 3.4.1)

> Improve the performance of MapOutputTracker.updateMapOutput
> ---
>
> Key: SPARK-43043
> URL: https://issues.apache.org/jira/browse/SPARK-43043
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2
>Reporter: Xingbo Jiang
>Assignee: Xingbo Jiang
>Priority: Major
> Fix For: 3.5.0
>
>
> Inside of MapOutputTracker, there is a line of code which does a linear find 
> through a mapStatuses collection: 
> https://github.com/apache/spark/blob/cb48c0e48eeff2b7b51176d0241491300e5aad6f/core/src/main/scala/org/apache/spark/MapOutputTracker.scala#L167
>   (plus a similar search a few lines down at 
> https://github.com/apache/spark/blob/cb48c0e48eeff2b7b51176d0241491300e5aad6f/core/src/main/scala/org/apache/spark/MapOutputTracker.scala#L174)
> This scan is necessary because we only know the mapId of the updated status 
> and not its mapPartitionId.
> We perform this scan once per migrated block, so if a large proportion of all 
> blocks in the map are migrated then we get O(n^2) total runtime across all of 
> the calls.
> I think we might be able to fix this by extending ShuffleStatus to have an 
> OpenHashMap mapping from mapId to mapPartitionId. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44630) Revert SPARK-43043 Improve the performance of MapOutputTracker.updateMapOutput

2023-08-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44630:


Assignee: Dongjoon Hyun

> Revert SPARK-43043 Improve the performance of MapOutputTracker.updateMapOutput
> --
>
> Key: SPARK-44630
> URL: https://issues.apache.org/jira/browse/SPARK-44630
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.1
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44630) Revert SPARK-43043 Improve the performance of MapOutputTracker.updateMapOutput

2023-08-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44630.
--
Fix Version/s: 3.4.2
   Resolution: Fixed

Issue resolved by pull request 42285
[https://github.com/apache/spark/pull/42285]

> Revert SPARK-43043 Improve the performance of MapOutputTracker.updateMapOutput
> --
>
> Key: SPARK-44630
> URL: https://issues.apache.org/jira/browse/SPARK-44630
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.1
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44627) org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils#resultSetToRows produces wrong data

2023-08-01 Thread Min Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhao updated SPARK-44627:
-
Priority: Minor  (was: Major)

> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils#resultSetToRows 
> produces wrong data
> -
>
> Key: SPARK-44627
> URL: https://issues.apache.org/jira/browse/SPARK-44627
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2, 3.3.1
>Reporter: Min Zhao
>Priority: Minor
> Attachments: image-2023-08-02-14-01-54-447.png
>
>
> When the resultSet exists a timestmp column and it's value is null, In the 
> row it generates, this column will use the value of the same column in the 
> previous row.  
>  
> In mysql, if a datetime column is defined, meanwhile it is not null. When a 
> value is '-00-00 00:00:00', mysql provided a property of 
> zeroDateTimeBehavior, it will return null.  
> table define:
> CREATE TABLE `test_timestamp` (
> `id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键id',
> `unbind_time` datetime NOT NULL DEFAULT '-00-00 00:00:00' ,
> PRIMARY KEY (`id`)
> )
> example:
> the value of resultSet
> 1, 2023-01-01 12:00:00
> 2, null
>  
> the value of row
> 1, 2023-01-01 12:00:00
> 2, 2023-01-01 12:00:00
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44627) org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils#resultSetToRows produces wrong data

2023-08-01 Thread Min Zhao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17750115#comment-17750115
 ] 

Min Zhao commented on SPARK-44627:
--

!image-2023-08-02-14-01-54-447.png!

it only update isNull to true, but the value keep with last row.

> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils#resultSetToRows 
> produces wrong data
> -
>
> Key: SPARK-44627
> URL: https://issues.apache.org/jira/browse/SPARK-44627
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2, 3.3.1
>Reporter: Min Zhao
>Priority: Major
> Attachments: image-2023-08-02-14-01-54-447.png
>
>
> When the resultSet exists a timestmp column and it's value is null, In the 
> row it generates, this column will use the value of the same column in the 
> previous row.  
>  
> In mysql, if a datetime column is defined, meanwhile it is not null. When a 
> value is '-00-00 00:00:00', mysql provided a property of 
> zeroDateTimeBehavior, it will return null.  
> table define:
> CREATE TABLE `test_timestamp` (
> `id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键id',
> `unbind_time` datetime NOT NULL DEFAULT '-00-00 00:00:00' ,
> PRIMARY KEY (`id`)
> )
> example:
> the value of resultSet
> 1, 2023-01-01 12:00:00
> 2, null
>  
> the value of row
> 1, 2023-01-01 12:00:00
> 2, 2023-01-01 12:00:00
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44627) org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils#resultSetToRows produces wrong data

2023-08-01 Thread Min Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhao updated SPARK-44627:
-
Description: 
When the resultSet exists a timestmp column and it's value is null, In the row 
it generates, this column will use the value of the same column in the previous 
row.  

 

In mysql, if a datetime column is defined, meanwhile it is not null. When a 
value is '-00-00 00:00:00', mysql provided a property of 
zeroDateTimeBehavior, it will return null.  

table define:
CREATE TABLE `test_timestamp` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键id',
`unbind_time` datetime NOT NULL DEFAULT '-00-00 00:00:00' ,
PRIMARY KEY (`id`)
)
example:

the value of resultSet

1, 2023-01-01 12:00:00

2, null

 

the value of row

1, 2023-01-01 12:00:00

2, 2023-01-01 12:00:00

 

  was:
When the resultSet exists a timestmp column and it's value is null, In the row 
it generates, this column will use the value of the same column in the previous 
row.  

 

In mysql, if a datetime column is defined, meanwhile it is not null. When a 
value is '-00-00 00:00:00', mysql provided a property of 
zeroDateTimeBehavior, it will return null.  

table definite:
CREATE TABLE `test_timestamp` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键id',
`unbind_time` datetime NOT NULL DEFAULT '-00-00 00:00:00' ,
PRIMARY KEY (`id`)
)
example:

the value of resultSet

1, 2023-01-01 12:00:00

2, null

 

the value of row

1, 2023-01-01 12:00:00

2, 2023-01-01 12:00:00

 


> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils#resultSetToRows 
> produces wrong data
> -
>
> Key: SPARK-44627
> URL: https://issues.apache.org/jira/browse/SPARK-44627
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2, 3.3.1
>Reporter: Min Zhao
>Priority: Major
> Attachments: image-2023-08-02-14-01-54-447.png
>
>
> When the resultSet exists a timestmp column and it's value is null, In the 
> row it generates, this column will use the value of the same column in the 
> previous row.  
>  
> In mysql, if a datetime column is defined, meanwhile it is not null. When a 
> value is '-00-00 00:00:00', mysql provided a property of 
> zeroDateTimeBehavior, it will return null.  
> table define:
> CREATE TABLE `test_timestamp` (
> `id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键id',
> `unbind_time` datetime NOT NULL DEFAULT '-00-00 00:00:00' ,
> PRIMARY KEY (`id`)
> )
> example:
> the value of resultSet
> 1, 2023-01-01 12:00:00
> 2, null
>  
> the value of row
> 1, 2023-01-01 12:00:00
> 2, 2023-01-01 12:00:00
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44627) org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils#resultSetToRows produces wrong data

2023-08-01 Thread Min Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhao updated SPARK-44627:
-
Attachment: image-2023-08-02-14-01-54-447.png

> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils#resultSetToRows 
> produces wrong data
> -
>
> Key: SPARK-44627
> URL: https://issues.apache.org/jira/browse/SPARK-44627
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2, 3.3.1
>Reporter: Min Zhao
>Priority: Major
> Attachments: image-2023-08-02-14-01-54-447.png
>
>
> When the resultSet exists a timestmp column and it's value is null, In the 
> row it generates, this column will use the value of the same column in the 
> previous row.  
>  
> In mysql, if a datetime column is defined, meanwhile it is not null. When a 
> value is '-00-00 00:00:00', mysql provided a property of 
> zeroDateTimeBehavior, it will return null.  
> table define:
> CREATE TABLE `test_timestamp` (
> `id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键id',
> `unbind_time` datetime NOT NULL DEFAULT '-00-00 00:00:00' ,
> PRIMARY KEY (`id`)
> )
> example:
> the value of resultSet
> 1, 2023-01-01 12:00:00
> 2, null
>  
> the value of row
> 1, 2023-01-01 12:00:00
> 2, 2023-01-01 12:00:00
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44627) org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils#resultSetToRows produces wrong data

2023-08-01 Thread Min Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhao updated SPARK-44627:
-
Description: 
When the resultSet exists a timestmp column and it's value is null, In the row 
it generates, this column will use the value of the same column in the previous 
row.  

 

In mysql, if a datetime column is defined, meanwhile it is not null. When a 
value is '-00-00 00:00:00', mysql provided a property of 
zeroDateTimeBehavior, it will return null.  

table definite:
CREATE TABLE `test_timestamp` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键id',
`unbind_time` datetime NOT NULL DEFAULT '-00-00 00:00:00' ,
PRIMARY KEY (`id`)
)
example:

the value of resultSet

1, 2023-01-01 12:00:00

2, null

 

the value of row

1, 2023-01-01 12:00:00

2, 2023-01-01 12:00:00

 

  was:
when the resultSet exists a timestmp column and it's value is null, In the row 
it generates, this column will use the value of the same column in the previous 
row. 

example:

the value of resultSet

1, 2023-01-01 12:00:00

2, null

 

the value of row

1, 2023-01-01 12:00:00

2, 2023-01-01 12:00:00

 


> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils#resultSetToRows 
> produces wrong data
> -
>
> Key: SPARK-44627
> URL: https://issues.apache.org/jira/browse/SPARK-44627
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2, 3.3.1
>Reporter: Min Zhao
>Priority: Major
> Attachments: image-2023-08-02-14-01-54-447.png
>
>
> When the resultSet exists a timestmp column and it's value is null, In the 
> row it generates, this column will use the value of the same column in the 
> previous row.  
>  
> In mysql, if a datetime column is defined, meanwhile it is not null. When a 
> value is '-00-00 00:00:00', mysql provided a property of 
> zeroDateTimeBehavior, it will return null.  
> table definite:
> CREATE TABLE `test_timestamp` (
> `id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键id',
> `unbind_time` datetime NOT NULL DEFAULT '-00-00 00:00:00' ,
> PRIMARY KEY (`id`)
> )
> example:
> the value of resultSet
> 1, 2023-01-01 12:00:00
> 2, null
>  
> the value of row
> 1, 2023-01-01 12:00:00
> 2, 2023-01-01 12:00:00
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44555) Use checkError() to check Exception in command Suite & assign some error class names

2023-08-01 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-44555.
--
Fix Version/s: 3.5.0
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 42169
[https://github.com/apache/spark/pull/42169]

> Use checkError() to check Exception in command Suite & assign some error 
> class names
> 
>
> Key: SPARK-44555
> URL: https://issues.apache.org/jira/browse/SPARK-44555
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0, 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.5.0, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44555) Use checkError() to check Exception in command Suite & assign some error class names

2023-08-01 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-44555:


Assignee: BingKun Pan

> Use checkError() to check Exception in command Suite & assign some error 
> class names
> 
>
> Key: SPARK-44555
> URL: https://issues.apache.org/jira/browse/SPARK-44555
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0, 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44632) DiskBlockManager should check and be able to handle stale directories

2023-08-01 Thread Kent Yao (Jira)
Kent Yao created SPARK-44632:


 Summary: DiskBlockManager should check and be able to handle stale 
directories
 Key: SPARK-44632
 URL: https://issues.apache.org/jira/browse/SPARK-44632
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.1, 3.5.0
Reporter: Kent Yao


The subDir in the memory cache could be stale, for example, after a damaged 
disk repair or replacement. This dir could be accessed subsequently by others. 
Especially,  `filename` generated by `RDDBlockId` is unchanged between task 
reties, so it probably attempts to access the same subDir repeatedly. 
Therefore, it is necessary to check if the subDir exists. If it is stale and 
the hardware has been recovered without data and directories, we will recreate 
the subDir to prevent FileNotFoundException during writing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44631) Remove session-based directory when the isolated session cache is evicted

2023-08-01 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-44631:


 Summary: Remove session-based directory when the isolated session 
cache is evicted
 Key: SPARK-44631
 URL: https://issues.apache.org/jira/browse/SPARK-44631
 Project: Spark
  Issue Type: Task
  Components: Connect
Affects Versions: 3.5.0
Reporter: Hyukjin Kwon


SPARK-44078 added the cache for isolated sessions, and SPARK-44348 added the 
session-based directory for isolation.

 

When the isolated session cache is evicted, we should remove the session-based 
directory so it doesn't fail when the same session is used, see also 
https://github.com/apache/spark/pull/41625#discussion_r1251427466



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44588) Migrated shuffle blocks are encrypted multiple times when io.encryption is enabled

2023-08-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44588:
--
Fix Version/s: 3.3.3

> Migrated shuffle blocks are encrypted multiple times when io.encryption is 
> enabled 
> ---
>
> Key: SPARK-44588
> URL: https://issues.apache.org/jira/browse/SPARK-44588
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 
> 3.3.1, 3.2.3, 3.2.4, 3.3.2, 3.4.0, 3.4.1
>Reporter: Henry Mai
>Assignee: Henry Mai
>Priority: Critical
> Fix For: 3.3.3, 3.4.2, 3.5.0
>
>
> Shuffle blocks upon migration are wrapped for encryption again when being 
> written out to a file on the receiver side.
>  
> Pull request to fix this: https://github.com/apache/spark/pull/42214
>  
> Details:
> Sender/Read side:
> BlockManagerDecommissioner:run()
>     blocks = bm.migratableResolver.getMigrationBlocks()
>     *dataFile = IndexShuffleBlockResolver:getDataFile(...)*
>    buffer = FileSegmentManagedBuffer(..., dataFile)
>    *^ This reads straight from disk without decryption*
>     blocks.foreach((blockId, buffer) => 
> bm.blockTransferService.uploadBlockSync(..., buffer, ...))
>     -> uploadBlockSync() -> uploadBlock(..., buffer, ...)
>     -> client.uploadStream(UploadBlockStream, buffer, ...)
>  - Notice that there is no decryption here on the sender/read side.
> Receiver/Write side:
> NettyBlockRpcServer:receiveStream() <--- This is the UploadBlockStream handler
>     putBlockDataAsStream()
>     migratableResolver.putShuffleBlockAsStream()
>     *-> file = IndexShuffleBlockResolver:getDataFile(...)*
>     -> tmpFile = (file + . extension)
>     *-> Creates an encrypting writable channel to a tmpFile using 
> serializerManager.wrapStream()*
>     -> onData() writes the data into the channel
>     -> onComplete() renames the tmpFile to the file
>  - Notice:
>  * Both getMigrationBlocks()[read] and putShuffleBlockAsStream()[write] 
> target IndexShuffleBlockResolver:getDataFile()
>  * The read path does not decrypt but the write path encrypts.
>  * As a thought exercise: if this cycle happens more than once (where this 
> receiver is now a sender) even if we assume that the shuffle blocks are 
> initially unencrypted*, then bytes in the file will just have more and more 
> layers of encryption applied to it each time it gets migrated.
>  * *In practice, the shuffle blocks are encrypted on disk to begin with, this 
> is just a thought exercise



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44588) Migrated shuffle blocks are encrypted multiple times when io.encryption is enabled

2023-08-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44588:
--
Fix Version/s: 3.4.2

> Migrated shuffle blocks are encrypted multiple times when io.encryption is 
> enabled 
> ---
>
> Key: SPARK-44588
> URL: https://issues.apache.org/jira/browse/SPARK-44588
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 
> 3.3.1, 3.2.3, 3.2.4, 3.3.2, 3.4.0, 3.4.1
>Reporter: Henry Mai
>Assignee: Henry Mai
>Priority: Critical
> Fix For: 3.4.2, 3.5.0
>
>
> Shuffle blocks upon migration are wrapped for encryption again when being 
> written out to a file on the receiver side.
>  
> Pull request to fix this: https://github.com/apache/spark/pull/42214
>  
> Details:
> Sender/Read side:
> BlockManagerDecommissioner:run()
>     blocks = bm.migratableResolver.getMigrationBlocks()
>     *dataFile = IndexShuffleBlockResolver:getDataFile(...)*
>    buffer = FileSegmentManagedBuffer(..., dataFile)
>    *^ This reads straight from disk without decryption*
>     blocks.foreach((blockId, buffer) => 
> bm.blockTransferService.uploadBlockSync(..., buffer, ...))
>     -> uploadBlockSync() -> uploadBlock(..., buffer, ...)
>     -> client.uploadStream(UploadBlockStream, buffer, ...)
>  - Notice that there is no decryption here on the sender/read side.
> Receiver/Write side:
> NettyBlockRpcServer:receiveStream() <--- This is the UploadBlockStream handler
>     putBlockDataAsStream()
>     migratableResolver.putShuffleBlockAsStream()
>     *-> file = IndexShuffleBlockResolver:getDataFile(...)*
>     -> tmpFile = (file + . extension)
>     *-> Creates an encrypting writable channel to a tmpFile using 
> serializerManager.wrapStream()*
>     -> onData() writes the data into the channel
>     -> onComplete() renames the tmpFile to the file
>  - Notice:
>  * Both getMigrationBlocks()[read] and putShuffleBlockAsStream()[write] 
> target IndexShuffleBlockResolver:getDataFile()
>  * The read path does not decrypt but the write path encrypts.
>  * As a thought exercise: if this cycle happens more than once (where this 
> receiver is now a sender) even if we assume that the shuffle blocks are 
> initially unencrypted*, then bytes in the file will just have more and more 
> layers of encryption applied to it each time it gets migrated.
>  * *In practice, the shuffle blocks are encrypted on disk to begin with, this 
> is just a thought exercise



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44600) Make `repl` module daily test pass

2023-08-01 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-44600:
-
Description: 
[https://github.com/apache/spark/actions/runs/5727123477/job/15518895421]

 
{code:java}
- SPARK-15236: use Hive catalog *** FAILED ***
18137  isContain was true Interpreter output contained 'Exception':
18138  Welcome to
18139  __
18140   / __/__  ___ _/ /__
18141  _\ \/ _ \/ _ `/ __/  '_/
18142 /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
18143/_/
18144   
18145  Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_372)
18146  Type in expressions to have them evaluated.
18147  Type :help for more information.
18148  
18149  scala> 
18150  scala> java.lang.NoClassDefFoundError: 
org/sparkproject/guava/cache/CacheBuilder
18151at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.(SessionCatalog.scala:197)
18152at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.catalog$lzycompute(BaseSessionStateBuilder.scala:153)
18153at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.catalog(BaseSessionStateBuilder.scala:152)
18154at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.v2SessionCatalog$lzycompute(BaseSessionStateBuilder.scala:166)
18155at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.v2SessionCatalog(BaseSessionStateBuilder.scala:166)
18156at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.catalogManager$lzycompute(BaseSessionStateBuilder.scala:168)
18157at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.catalogManager(BaseSessionStateBuilder.scala:168)
18158at 
org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$1.(BaseSessionStateBuilder.scala:185)
18159at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.analyzer(BaseSessionStateBuilder.scala:185)
18160at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.$anonfun$build$2(BaseSessionStateBuilder.scala:374)
18161at 
org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:92)
18162at 
org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:92)
18163at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:77)
18164at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138)
18165at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:219)
18166at 
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546)
18167at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219)
18168at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
18169at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218)
18170at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:77)
18171at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
18172at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
18173at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
18174at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
18175at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:98)
18176at 
org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:691)
18177at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
18178at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682)
18179at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:713)
18180at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:744)
18181... 100 elided
18182  Caused by: java.lang.ClassNotFoundException: 
org.sparkproject.guava.cache.CacheBuilder
18183at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
18184at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
18185at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
18186at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
18187... 130 more
18188  
18189  scala>  | 
18190  scala> :quit (ReplSuite.scala:83) {code}

> Make `repl` module daily test pass
> --
>
> Key: SPARK-44600
> URL: https://issues.apache.org/jira/browse/SPARK-44600
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> [https://github.com/apache/spark/actions/runs/5727123477/job/15518895421]
>  
> {code:java}
> - SPARK-15236: use Hive catalog *** FAILED ***
> 18137  isContain was true Interpreter output contained 'Exception':
> 18138  Welcome to
> 18139 

[jira] [Resolved] (SPARK-44607) Remove unused function `containsNestedColumn` from `Filter`

2023-08-01 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-44607.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42239
[https://github.com/apache/spark/pull/42239]

> Remove unused function `containsNestedColumn` from `Filter`
> ---
>
> Key: SPARK-44607
> URL: https://issues.apache.org/jira/browse/SPARK-44607
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44607) Remove unused function `containsNestedColumn` from `Filter`

2023-08-01 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-44607:


Assignee: Yang Jie

> Remove unused function `containsNestedColumn` from `Filter`
> ---
>
> Key: SPARK-44607
> URL: https://issues.apache.org/jira/browse/SPARK-44607
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44630) Revert SPARK-43043 Improve the performance of MapOutputTracker.updateMapOutput

2023-08-01 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-44630:
-

 Summary: Revert SPARK-43043 Improve the performance of 
MapOutputTracker.updateMapOutput
 Key: SPARK-44630
 URL: https://issues.apache.org/jira/browse/SPARK-44630
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.1
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44629) Publish PySpark Test Guidelines webpage

2023-08-01 Thread Amanda Liu (Jira)
Amanda Liu created SPARK-44629:
--

 Summary: Publish PySpark Test Guidelines webpage
 Key: SPARK-44629
 URL: https://issues.apache.org/jira/browse/SPARK-44629
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Amanda Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43241) MultiIndex.append not checking names for equality

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43241:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> MultiIndex.append not checking names for equality
> -
>
> Key: SPARK-43241
> URL: https://issues.apache.org/jira/browse/SPARK-43241
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> To match the behavior with pandas: 
> https://github.com/pandas-dev/pandas/pull/48288



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42621) Add `inclusive` parameter for date_range

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42621:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Add `inclusive` parameter for date_range
> 
>
> Key: SPARK-42621
> URL: https://issues.apache.org/jira/browse/SPARK-42621
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> See https://github.com/pandas-dev/pandas/issues/40245



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42620) Add `inclusive` parameter for (DataFrame|Series).between_time

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42620:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Add `inclusive` parameter for (DataFrame|Series).between_time
> -
>
> Key: SPARK-42620
> URL: https://issues.apache.org/jira/browse/SPARK-42620
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> See https://github.com/pandas-dev/pandas/pull/43248



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43194) PySpark 3.4.0 cannot convert timestamp-typed objects to pandas with pandas 2.0

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43194:

Affects Version/s: 4.0.0
   (was: 3.4.0)

> PySpark 3.4.0 cannot convert timestamp-typed objects to pandas with pandas 2.0
> --
>
> Key: SPARK-43194
> URL: https://issues.apache.org/jira/browse/SPARK-43194
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
> Environment: {code}
> In [4]: import pandas as pd
> In [5]: pd.__version__
> Out[5]: '2.0.0'
> In [6]: import pyspark as ps
> In [7]: ps.__version__
> Out[7]: '3.4.0'
> {code}
>Reporter: Phillip Cloud
>Priority: Major
>
> {code}
> In [1]: from pyspark.sql import SparkSession
> In [2]: session = SparkSession.builder.appName("test").getOrCreate()
> 23/04/19 09:21:42 WARN Utils: Your hostname, albatross resolves to a loopback 
> address: 127.0.0.2; using 192.168.1.170 instead (on interface enp5s0)
> 23/04/19 09:21:42 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 23/04/19 09:21:42 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> In [3]: session.sql("select now()").toPandas()
> {code}
> Results in:
> {code}
> ...
> TypeError: Casting to unit-less dtype 'datetime64' is not supported. Pass 
> e.g. 'datetime64[ns]' instead.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42619) Add `show_counts` parameter for DataFrame.info

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42619:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Add `show_counts` parameter for DataFrame.info
> --
>
> Key: SPARK-42619
> URL: https://issues.apache.org/jira/browse/SPARK-42619
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> See https://github.com/pandas-dev/pandas/pull/37999



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42617) Support `isocalendar`

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42617:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Support `isocalendar`
> -
>
> Key: SPARK-42617
> URL: https://issues.apache.org/jira/browse/SPARK-42617
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should support `isocalendar` to match pandas behavior 
> (https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.dt.isocalendar.html)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43271) Match behavior with DataFrame.reindex with specifying `index`.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43271:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Match behavior with DataFrame.reindex with specifying `index`.
> --
>
> Key: SPARK-43271
> URL: https://issues.apache.org/jira/browse/SPARK-43271
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Re-enable pandas 2.0.0 test in DataFrameTests.test_reindex in proper way.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43451) Enable RollingTests.test_rolling_count for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43451:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable RollingTests.test_rolling_count for pandas 2.0.0.
> 
>
> Key: SPARK-43451
> URL: https://issues.apache.org/jira/browse/SPARK-43451
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable RollingTests.test_rolling_count for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43282) Investigate DataFrame.sort_values with pandas behavior.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43282:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Investigate DataFrame.sort_values with pandas behavior.
> ---
>
> Key: SPARK-43282
> URL: https://issues.apache.org/jira/browse/SPARK-43282
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> {code:java}
> import pandas as pd
> pdf = pd.DataFrame(
>     {
>         "a": pd.Categorical([1, 2, 3, 1, 2, 3]),
>         "b": pd.Categorical(
>             ["b", "a", "c", "c", "b", "a"], categories=["c", "b", "d", "a"]
>         ),
>     },
> )
> pdf.groupby("a").apply(lambda x: x).sort_values(["a"])
> Traceback (most recent call last):
> ...
> ValueError: 'a' is both an index level and a column label, which is 
> ambiguous. {code}
> We should investigate this issue whether this is intended behavior or just 
> bug in pandas.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43245) Fix DatetimeIndex.microsecond to return 'int32' instead of 'int64' type of Index.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43245:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Fix DatetimeIndex.microsecond to return 'int32' instead of 'int64' type of 
> Index.
> -
>
> Key: SPARK-43245
> URL: https://issues.apache.org/jira/browse/SPARK-43245
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html#index-can-now-hold-numpy-numeric-dtypes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43433) Match `GroupBy.nth` behavior with new pandas behavior

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43433:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Match `GroupBy.nth` behavior with new pandas behavior
> -
>
> Key: SPARK-43433
> URL: https://issues.apache.org/jira/browse/SPARK-43433
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Match behavior with 
> https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html#dataframegroupby-nth-and-seriesgroupby-nth-now-behave-as-filtrations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43432) Fix `min_periods` for Rolling to work same as pandas

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43432:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Fix `min_periods` for Rolling to work same as pandas 
> -
>
> Key: SPARK-43432
> URL: https://issues.apache.org/jira/browse/SPARK-43432
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Fix `min_periods` for Rolling to work same as pandas
> https://github.com/pandas-dev/pandas/issues/31302



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43291) Match behavior for DataFrame.cov on string DataFrame

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43291:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Match behavior for DataFrame.cov on string DataFrame
> 
>
> Key: SPARK-43291
> URL: https://issues.apache.org/jira/browse/SPARK-43291
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Should enable test below:
> {code:java}
> pdf = pd.DataFrame([("1", "2"), ("0", "3"), ("2", "0"), ("1", "1")], 
> columns=["a", "b"])
> psdf = ps.from_pandas(pdf)
> self.assert_eq(pdf.cov(), psdf.cov()) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43295) Make DataFrameGroupBy.sum support for string type columns

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43295:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Make DataFrameGroupBy.sum support for string type columns
> -
>
> Key: SPARK-43295
> URL: https://issues.apache.org/jira/browse/SPARK-43295
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> From pandas 2.0.0, DataFrameGroupBy.sum also works for string type columns:
> {code:java}
> >>> psdf
>    A    B  C      D
> 0  1  3.1  a   True
> 1  2  4.1  b  False
> 2  1  4.1  b  False
> 3  2  3.1  a   True
> >>> psdf.groupby("A").sum().sort_index()
>      B  D
> A
> 1  7.2  1
> 2  7.2  1
> >>> psdf.to_pandas().groupby("A").sum().sort_index()
>      B   C  D
> A
> 1  7.2  ab  1
> 2  7.2  ba  1 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44628) Clear some unused codes in "***Errors" and extract some common logic

2023-08-01 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-44628:
---

 Summary: Clear some unused codes in "***Errors" and extract some 
common logic
 Key: SPARK-44628
 URL: https://issues.apache.org/jira/browse/SPARK-44628
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43460) Enable OpsOnDiffFramesGroupByTests.test_groupby_different_lengths for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43460:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable OpsOnDiffFramesGroupByTests.test_groupby_different_lengths for pandas 
> 2.0.0.
> ---
>
> Key: SPARK-43460
> URL: https://issues.apache.org/jira/browse/SPARK-43460
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable OpsOnDiffFramesGroupByTests.test_groupby_different_lengths for pandas 
> 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43453) Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43453:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0.
> 
>
> Key: SPARK-43453
> URL: https://issues.apache.org/jira/browse/SPARK-43453
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43459) Enable OpsOnDiffFramesGroupByTests.test_groupby_multiindex_columns for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43459:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable OpsOnDiffFramesGroupByTests.test_groupby_multiindex_columns for pandas 
> 2.0.0.
> 
>
> Key: SPARK-43459
> URL: https://issues.apache.org/jira/browse/SPARK-43459
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable OpsOnDiffFramesGroupByTests.test_groupby_multiindex_columns for pandas 
> 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43476) Enable SeriesStringTests.test_string_replace for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43476:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable SeriesStringTests.test_string_replace for pandas 2.0.0.
> --
>
> Key: SPARK-43476
> URL: https://issues.apache.org/jira/browse/SPARK-43476
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable SeriesStringTests.test_string_replace for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43458) Enable SeriesConversionTests.test_to_latex for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43458:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable SeriesConversionTests.test_to_latex for pandas 2.0.0.
> 
>
> Key: SPARK-43458
> URL: https://issues.apache.org/jira/browse/SPARK-43458
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable SeriesConversionTests.test_to_latex for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43452) Enable RollingTests.test_groupby_rolling_count for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43452:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable RollingTests.test_groupby_rolling_count for pandas 2.0.0.
> 
>
> Key: SPARK-43452
> URL: https://issues.apache.org/jira/browse/SPARK-43452
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable RollingTests.test_groupby_rolling_count for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43462) Enable SeriesDateTimeTests.test_date_subtraction for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43462:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable SeriesDateTimeTests.test_date_subtraction for pandas 2.0.0.
> --
>
> Key: SPARK-43462
> URL: https://issues.apache.org/jira/browse/SPARK-43462
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable SeriesDateTimeTests.test_date_subtraction for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43477) Enable SeriesStringTests.test_string_rsplit for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43477:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable SeriesStringTests.test_string_rsplit for pandas 2.0.0.
> -
>
> Key: SPARK-43477
> URL: https://issues.apache.org/jira/browse/SPARK-43477
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable SeriesStringTests.test_string_rsplit for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43497) Enable StatsTests.test_cov_corr_meta for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43497:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable StatsTests.test_cov_corr_meta for pandas 2.0.0.
> --
>
> Key: SPARK-43497
> URL: https://issues.apache.org/jira/browse/SPARK-43497
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable StatsTests.test_cov_corr_meta for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43498) Enable StatsTests.test_axis_on_dataframe for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43498:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable StatsTests.test_axis_on_dataframe for pandas 2.0.0.
> --
>
> Key: SPARK-43498
> URL: https://issues.apache.org/jira/browse/SPARK-43498
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable StatsTests.test_axis_on_dataframe for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43478) Enable SeriesStringTests.test_string_split for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43478:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable SeriesStringTests.test_string_split for pandas 2.0.0.
> 
>
> Key: SPARK-43478
> URL: https://issues.apache.org/jira/browse/SPARK-43478
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable SeriesStringTests.test_string_split for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43506) Enable ArrowTests.test_toPandas_empty_columns for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43506:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable ArrowTests.test_toPandas_empty_columns for pandas 2.0.0.
> ---
>
> Key: SPARK-43506
> URL: https://issues.apache.org/jira/browse/SPARK-43506
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable ArrowTests.test_toPandas_empty_columns for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43499) Enable StatsTests.test_stat_functions_with_no_numeric_columns for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43499:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable StatsTests.test_stat_functions_with_no_numeric_columns for pandas 
> 2.0.0.
> ---
>
> Key: SPARK-43499
> URL: https://issues.apache.org/jira/browse/SPARK-43499
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable StatsTests.test_stat_functions_with_no_numeric_columns for pandas 
> 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43561) Enable DataFrameConversionTests.test_to_latex for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43561:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable DataFrameConversionTests.test_to_latex for pandas 2.0.0.
> ---
>
> Key: SPARK-43561
> URL: https://issues.apache.org/jira/browse/SPARK-43561
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable DataFrameConversionTests.test_to_latex for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43562) Enable DataFrameTests.test_append for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43562:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable DataFrameTests.test_append for pandas 2.0.0.
> ---
>
> Key: SPARK-43562
> URL: https://issues.apache.org/jira/browse/SPARK-43562
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable DataFrameTests.test_append for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43533) Enable MultiIndex test for IndexesTests.test_difference

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43533:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable MultiIndex test for IndexesTests.test_difference
> ---
>
> Key: SPARK-43533
> URL: https://issues.apache.org/jira/browse/SPARK-43533
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable MultiIndex test for IndexesTests.test_difference



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43563) Enable CsvTests.test_read_csv_with_squeeze for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43563:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable CsvTests.test_read_csv_with_squeeze for pandas 2.0.0.
> 
>
> Key: SPARK-43563
> URL: https://issues.apache.org/jira/browse/SPARK-43563
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable CsvTests.test_read_csv_with_squeeze for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43570) Enable DateOpsTests.test_rsub for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43570:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable DateOpsTests.test_rsub for pandas 2.0.0.
> ---
>
> Key: SPARK-43570
> URL: https://issues.apache.org/jira/browse/SPARK-43570
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable DateOpsTests.test_rsub for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43608) Enable IndexesTests.test_union for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43608:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable IndexesTests.test_union for pandas 2.0.0.
> 
>
> Key: SPARK-43608
> URL: https://issues.apache.org/jira/browse/SPARK-43608
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable IndexesTests.test_union for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43705) Enable TimedeltaIndexTests.test_properties for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43705:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable TimedeltaIndexTests.test_properties for pandas 2.0.0.
> 
>
> Key: SPARK-43705
> URL: https://issues.apache.org/jira/browse/SPARK-43705
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable TimedeltaIndexTests.test_properties for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43644) Enable DatetimeIndexTests.test_indexer_between_time for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43644:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable DatetimeIndexTests.test_indexer_between_time for pandas 2.0.0.
> -
>
> Key: SPARK-43644
> URL: https://issues.apache.org/jira/browse/SPARK-43644
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable DatetimeIndexTests.test_indexer_between_time for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43606) Enable IndexesTests.test_index_basic for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43606:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable IndexesTests.test_index_basic for pandas 2.0.0.
> --
>
> Key: SPARK-43606
> URL: https://issues.apache.org/jira/browse/SPARK-43606
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable IndexesTests.test_index_basic for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43567) Enable CategoricalIndexTests.test_factorize for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43567:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable CategoricalIndexTests.test_factorize for pandas 2.0.0.
> -
>
> Key: SPARK-43567
> URL: https://issues.apache.org/jira/browse/SPARK-43567
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable CategoricalIndexTests.test_factorize for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43571) Enable DateOpsTests.test_sub for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43571:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable DateOpsTests.test_sub for pandas 2.0.0.
> --
>
> Key: SPARK-43571
> URL: https://issues.apache.org/jira/browse/SPARK-43571
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable DateOpsTests.test_sub for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43607) Enable IndexesTests.test_intersection for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43607:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable IndexesTests.test_intersection for pandas 2.0.0.
> ---
>
> Key: SPARK-43607
> URL: https://issues.apache.org/jira/browse/SPARK-43607
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable IndexesTests.test_intersection for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43568) Enable CategoricalIndexTests.test_categories_setter for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43568:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable CategoricalIndexTests.test_categories_setter for pandas 2.0.0.
> -
>
> Key: SPARK-43568
> URL: https://issues.apache.org/jira/browse/SPARK-43568
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable CategoricalIndexTests.test_categories_setter for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43633) Enable CategoricalIndexTests.test_remove_categories for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43633:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable CategoricalIndexTests.test_remove_categories for pandas 2.0.0.
> -
>
> Key: SPARK-43633
> URL: https://issues.apache.org/jira/browse/SPARK-43633
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable CategoricalIndexTests.test_remove_categories for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43811) Enable DataFrameTests.test_reindex for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43811:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable DataFrameTests.test_reindex for pandas 2.0.0.
> 
>
> Key: SPARK-43811
> URL: https://issues.apache.org/jira/browse/SPARK-43811
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43869) Enable GroupBySlowTests for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43869:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable GroupBySlowTests for pandas 2.0.0.
> -
>
> Key: SPARK-43869
> URL: https://issues.apache.org/jira/browse/SPARK-43869
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> test list:
>  * test_value_counts
>  * test_split_apply_combine_on_series



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43709) Enable NamespaceTests.test_date_range for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43709:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable NamespaceTests.test_date_range for pandas 2.0.0.
> ---
>
> Key: SPARK-43709
> URL: https://issues.apache.org/jira/browse/SPARK-43709
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43812) Enable DataFrameTests.test_all for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43812:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable DataFrameTests.test_all for pandas 2.0.0.
> 
>
> Key: SPARK-43812
> URL: https://issues.apache.org/jira/browse/SPARK-43812
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43871) Enable SeriesDateTimeTests for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43871:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable SeriesDateTimeTests for pandas 2.0.0.
> 
>
> Key: SPARK-43871
> URL: https://issues.apache.org/jira/browse/SPARK-43871
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> test list:
>  * test_day
>  * test_dayofweek
>  * test_dayofyear
>  * test_days_in_month
>  * test_daysinmonth
>  * test_hour
>  * test_microsecond
>  * test_minute
>  * test_month
>  * test_quarter
>  * test_second
>  * test_wrrkday
>  * test_year



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43872) Enable DataFramePlotMatplotlibTests for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43872:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable DataFramePlotMatplotlibTests for pandas 2.0.0.
> -
>
> Key: SPARK-43872
> URL: https://issues.apache.org/jira/browse/SPARK-43872
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> test list:
>  * test_area_plot
>  * test_area_plot_stacked_false
>  * test_area_plot_y
>  * test_bar_plot
>  * test_bar_with_x_y
>  * test_barh_plot_with_x_y
>  * test_barh_plot
>  * test_line_plot
>  * test_pie_plot
>  * test_scatter_plot
>  * test_hist_plot
>  * test_kde_plot



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43873) Enable DataFrameSlowTests for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43873:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable DataFrameSlowTests for pandas 2.0.0.
> ---
>
> Key: SPARK-43873
> URL: https://issues.apache.org/jira/browse/SPARK-43873
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> test list:
>  * test_describe
>  * test_between_time
>  * test_product
>  * test_iteritems
>  * test_mad
>  * test_cov
>  * test_quantile



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43870) Enable SeriesTests for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43870:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable SeriesTests for pandas 2.0.0.
> 
>
> Key: SPARK-43870
> URL: https://issues.apache.org/jira/browse/SPARK-43870
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> test list:
>  * test_value_counts
>  * test_append
>  * test_astype
>  * test_between
>  * test_mad
>  * test_quantile
>  * test_rank
>  * test_between_time
>  * test_iteritems
>  * test_product
>  * test_factorize



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43874) Enable GroupByTests for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43874:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable GroupByTests for pandas 2.0.0.
> -
>
> Key: SPARK-43874
> URL: https://issues.apache.org/jira/browse/SPARK-43874
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> test list:
>  * test_prod
>  * test_nth
>  * test_mad
>  * test_basic_stat_funcs
>  * test_groupby_multiindex_columns
>  * test_apply_without_shortcut
>  * test_mean
>  * test_apply



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43875) Enable CategoricalTests for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43875:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable CategoricalTests for pandas 2.0.0.
> -
>
> Key: SPARK-43875
> URL: https://issues.apache.org/jira/browse/SPARK-43875
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> test list:
>  * test_factorize
>  * test_as_ordered_unordered
>  * test_categories_setter
>  * test_remove_categories
>  * test_groupby_apply_without_shortcut



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44624) Spark Connect reattachable Execute when initial ExecutePlan didn't reach server

2023-08-01 Thread Juliusz Sompolski (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-44624:
--
Epic Link: SPARK-43754

> Spark Connect reattachable Execute when initial ExecutePlan didn't reach 
> server
> ---
>
> Key: SPARK-44624
> URL: https://issues.apache.org/jira/browse/SPARK-44624
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> If the ExecutePlan never reached the server, a ReattachExecute will fail with 
> INVALID_HANDLE.OPERATION_NOT_FOUND. In that case, we could try to send 
> ExecutePlan again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44624) Spark Connect reattachable Execute when initial ExecutePlan didn't reach server

2023-08-01 Thread Juliusz Sompolski (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-44624:
--
Description: If the ExecutePlan never reached the server, a ReattachExecute 
will fail with INVALID_HANDLE.OPERATION_NOT_FOUND. In that case, we could try 
to send ExecutePlan again.  (was: Even though we empirically observed that 
error is throws only from first next() or hasNext() of the response 
StreamObserver, wrap the initial call in retries as well to not depend on it in 
case it's just an quirk that's not fully dependable.)

> Spark Connect reattachable Execute when initial ExecutePlan didn't reach 
> server
> ---
>
> Key: SPARK-44624
> URL: https://issues.apache.org/jira/browse/SPARK-44624
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> If the ExecutePlan never reached the server, a ReattachExecute will fail with 
> INVALID_HANDLE.OPERATION_NOT_FOUND. In that case, we could try to send 
> ExecutePlan again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44624) Spark Connect reattachable Execute when initial ExecutePlan didn't reach server

2023-08-01 Thread Juliusz Sompolski (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-44624:
--
Summary: Spark Connect reattachable Execute when initial ExecutePlan didn't 
reach server  (was: Wrap retries around initial streaming GRPC call in connect)

> Spark Connect reattachable Execute when initial ExecutePlan didn't reach 
> server
> ---
>
> Key: SPARK-44624
> URL: https://issues.apache.org/jira/browse/SPARK-44624
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Even though we empirically observed that error is throws only from first 
> next() or hasNext() of the response StreamObserver, wrap the initial call in 
> retries as well to not depend on it in case it's just an quirk that's not 
> fully dependable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44627) org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils#resultSetToRows produces wrong data

2023-08-01 Thread Min Zhao (Jira)
Min Zhao created SPARK-44627:


 Summary: 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils#resultSetToRows 
produces wrong data
 Key: SPARK-44627
 URL: https://issues.apache.org/jira/browse/SPARK-44627
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.1, 2.3.2
Reporter: Min Zhao


when the resultSet exists a timestmp column and it's value is null, In the row 
it generates, this column will use the value of the same column in the previous 
row. 

example:

the value of resultSet

1, 2023-01-01 12:00:00

2, null

 

the value of row

1, 2023-01-01 12:00:00

2, 2023-01-01 12:00:00

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42941) Add support for streaming listener in Python

2023-08-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42941.
--
Resolution: Fixed

Issue resolved by pull request 42250
[https://github.com/apache/spark/pull/42250]

> Add support for streaming listener in Python
> 
>
> Key: SPARK-42941
> URL: https://issues.apache.org/jira/browse/SPARK-42941
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Assignee: Wei Liu
>Priority: Major
> Fix For: 3.5.0
>
>
> Add support of streaming listener in Python. 
> This likely requires a design doc to hash out the details. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42730) Update Spark Standalone Mode - Starting a Cluster Manually

2023-08-01 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17750044#comment-17750044
 ] 

Hyukjin Kwon commented on SPARK-42730:
--

Please go ahead. Refs: [https://spark.apache.org/contributing.html] , 
[https://spark.apache.org/developer-tools.html]

> Update Spark Standalone Mode - Starting a Cluster Manually
> --
>
> Key: SPARK-42730
> URL: https://issues.apache.org/jira/browse/SPARK-42730
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Documentation
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> https://spark.apache.org/docs/latest/spark-standalone.html
> Add start-connect-server.sh to this list and cover Spark Connect sessions - 
> other changes needed here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44218) Customize diff log in assertDataFrameEqual error message format

2023-08-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44218:


Assignee: Amanda Liu

> Customize diff log in assertDataFrameEqual error message format
> ---
>
> Key: SPARK-44218
> URL: https://issues.apache.org/jira/browse/SPARK-44218
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Assignee: Amanda Liu
>Priority: Major
>
> SPIP: 
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44218) Customize diff log in assertDataFrameEqual error message format

2023-08-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44218.
--
Fix Version/s: 3.5.0
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 42196
[https://github.com/apache/spark/pull/42196]

> Customize diff log in assertDataFrameEqual error message format
> ---
>
> Key: SPARK-44218
> URL: https://issues.apache.org/jira/browse/SPARK-44218
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Assignee: Amanda Liu
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> SPIP: 
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44626) Followup on streaming query termination when client session is timed out for Spark Connect

2023-08-01 Thread Bo Gao (Jira)
Bo Gao created SPARK-44626:
--

 Summary: Followup on streaming query termination when client 
session is timed out for Spark Connect
 Key: SPARK-44626
 URL: https://issues.apache.org/jira/browse/SPARK-44626
 Project: Spark
  Issue Type: Task
  Components: Connect, Structured Streaming
Affects Versions: 3.5.0
Reporter: Bo Gao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44588) Migrated shuffle blocks are encrypted multiple times when io.encryption is enabled

2023-08-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44588:
-

Assignee: Henry Mai

> Migrated shuffle blocks are encrypted multiple times when io.encryption is 
> enabled 
> ---
>
> Key: SPARK-44588
> URL: https://issues.apache.org/jira/browse/SPARK-44588
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 
> 3.3.1, 3.2.3, 3.2.4, 3.3.2, 3.4.0, 3.4.1
>Reporter: Henry Mai
>Assignee: Henry Mai
>Priority: Critical
> Fix For: 3.5.0
>
>
> Shuffle blocks upon migration are wrapped for encryption again when being 
> written out to a file on the receiver side.
>  
> Pull request to fix this: https://github.com/apache/spark/pull/42214
>  
> Details:
> Sender/Read side:
> BlockManagerDecommissioner:run()
>     blocks = bm.migratableResolver.getMigrationBlocks()
>     *dataFile = IndexShuffleBlockResolver:getDataFile(...)*
>    buffer = FileSegmentManagedBuffer(..., dataFile)
>    *^ This reads straight from disk without decryption*
>     blocks.foreach((blockId, buffer) => 
> bm.blockTransferService.uploadBlockSync(..., buffer, ...))
>     -> uploadBlockSync() -> uploadBlock(..., buffer, ...)
>     -> client.uploadStream(UploadBlockStream, buffer, ...)
>  - Notice that there is no decryption here on the sender/read side.
> Receiver/Write side:
> NettyBlockRpcServer:receiveStream() <--- This is the UploadBlockStream handler
>     putBlockDataAsStream()
>     migratableResolver.putShuffleBlockAsStream()
>     *-> file = IndexShuffleBlockResolver:getDataFile(...)*
>     -> tmpFile = (file + . extension)
>     *-> Creates an encrypting writable channel to a tmpFile using 
> serializerManager.wrapStream()*
>     -> onData() writes the data into the channel
>     -> onComplete() renames the tmpFile to the file
>  - Notice:
>  * Both getMigrationBlocks()[read] and putShuffleBlockAsStream()[write] 
> target IndexShuffleBlockResolver:getDataFile()
>  * The read path does not decrypt but the write path encrypts.
>  * As a thought exercise: if this cycle happens more than once (where this 
> receiver is now a sender) even if we assume that the shuffle blocks are 
> initially unencrypted*, then bytes in the file will just have more and more 
> layers of encryption applied to it each time it gets migrated.
>  * *In practice, the shuffle blocks are encrypted on disk to begin with, this 
> is just a thought exercise



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44588) Migrated shuffle blocks are encrypted multiple times when io.encryption is enabled

2023-08-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44588.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Migrated shuffle blocks are encrypted multiple times when io.encryption is 
> enabled 
> ---
>
> Key: SPARK-44588
> URL: https://issues.apache.org/jira/browse/SPARK-44588
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 
> 3.3.1, 3.2.3, 3.2.4, 3.3.2, 3.4.0, 3.4.1
>Reporter: Henry Mai
>Priority: Critical
> Fix For: 3.5.0
>
>
> Shuffle blocks upon migration are wrapped for encryption again when being 
> written out to a file on the receiver side.
>  
> Pull request to fix this: https://github.com/apache/spark/pull/42214
>  
> Details:
> Sender/Read side:
> BlockManagerDecommissioner:run()
>     blocks = bm.migratableResolver.getMigrationBlocks()
>     *dataFile = IndexShuffleBlockResolver:getDataFile(...)*
>    buffer = FileSegmentManagedBuffer(..., dataFile)
>    *^ This reads straight from disk without decryption*
>     blocks.foreach((blockId, buffer) => 
> bm.blockTransferService.uploadBlockSync(..., buffer, ...))
>     -> uploadBlockSync() -> uploadBlock(..., buffer, ...)
>     -> client.uploadStream(UploadBlockStream, buffer, ...)
>  - Notice that there is no decryption here on the sender/read side.
> Receiver/Write side:
> NettyBlockRpcServer:receiveStream() <--- This is the UploadBlockStream handler
>     putBlockDataAsStream()
>     migratableResolver.putShuffleBlockAsStream()
>     *-> file = IndexShuffleBlockResolver:getDataFile(...)*
>     -> tmpFile = (file + . extension)
>     *-> Creates an encrypting writable channel to a tmpFile using 
> serializerManager.wrapStream()*
>     -> onData() writes the data into the channel
>     -> onComplete() renames the tmpFile to the file
>  - Notice:
>  * Both getMigrationBlocks()[read] and putShuffleBlockAsStream()[write] 
> target IndexShuffleBlockResolver:getDataFile()
>  * The read path does not decrypt but the write path encrypts.
>  * As a thought exercise: if this cycle happens more than once (where this 
> receiver is now a sender) even if we assume that the shuffle blocks are 
> initially unencrypted*, then bytes in the file will just have more and more 
> layers of encryption applied to it each time it gets migrated.
>  * *In practice, the shuffle blocks are encrypted on disk to begin with, this 
> is just a thought exercise



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44563) Upgrade Apache Arrow to 13.0.0

2023-08-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44563.
---
Resolution: Duplicate

> Upgrade Apache Arrow to 13.0.0
> --
>
> Key: SPARK-44563
> URL: https://issues.apache.org/jira/browse/SPARK-44563
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-44563) Upgrade Apache Arrow to 13.0.0

2023-08-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-44563.
-

> Upgrade Apache Arrow to 13.0.0
> --
>
> Key: SPARK-44563
> URL: https://issues.apache.org/jira/browse/SPARK-44563
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44625) Spark Connect clean up abandoned executions

2023-08-01 Thread Juliusz Sompolski (Jira)
Juliusz Sompolski created SPARK-44625:
-

 Summary: Spark Connect clean up abandoned executions
 Key: SPARK-44625
 URL: https://issues.apache.org/jira/browse/SPARK-44625
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0, 4.0.0
Reporter: Juliusz Sompolski


With reattachable executions, some executions might get orphaned when 
ReattachExecute and ReleaseExecute never comes. Add a mechanism to track that 
and to clean them up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44601) Make `hive-thriftserver` module daily test pass

2023-08-01 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-44601:


Assignee: Yang Jie

> Make `hive-thriftserver` module daily test pass
> ---
>
> Key: SPARK-44601
> URL: https://issues.apache.org/jira/browse/SPARK-44601
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>
> [https://github.com/LuciferYang/spark/actions/runs/5694334367/job/15435297305]
>  
> {code:java}
> *** RUN ABORTED ***
> 20159  java.lang.NoClassDefFoundError: 
> org/codehaus/jackson/map/type/TypeFactory
> 20160  at org.apache.hadoop.hive.ql.udf.UDFJson.(UDFJson.java:64)
> 20161  at java.lang.Class.forName0(Native Method)
> 20162  at java.lang.Class.forName(Class.java:348)
> 20163  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClassInternal(GenericUDFBridge.java:142)
> 20164  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:132)
> 20165  at 
> org.apache.hadoop.hive.ql.exec.FunctionInfo.getFunctionClass(FunctionInfo.java:151)
> 20166  at 
> org.apache.hadoop.hive.ql.exec.Registry.addFunction(Registry.java:519)
> 20167  at 
> org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:163)
> 20168  at 
> org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:154)
> 20169  at 
> org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:147)
> 20170  ...
> 20171  Cause: java.lang.ClassNotFoundException: 
> org.codehaus.jackson.map.type.TypeFactory
> 20172  at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
> 20173  at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> 20174  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> 20175  at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> 20176  at org.apache.hadoop.hive.ql.udf.UDFJson.(UDFJson.java:64)
> 20177  at java.lang.Class.forName0(Native Method)
> 20178  at java.lang.Class.forName(Class.java:348)
> 20179  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClassInternal(GenericUDFBridge.java:142)
> 20180  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:132)
> 20181  at 
> org.apache.hadoop.hive.ql.exec.FunctionInfo.getFunctionClass(FunctionInfo.java:151)
> 20182  ... {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44601) Make `hive-thriftserver` module daily test pass

2023-08-01 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-44601.
--
Fix Version/s: 3.5.0
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 42260
[https://github.com/apache/spark/pull/42260]

> Make `hive-thriftserver` module daily test pass
> ---
>
> Key: SPARK-44601
> URL: https://issues.apache.org/jira/browse/SPARK-44601
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> [https://github.com/LuciferYang/spark/actions/runs/5694334367/job/15435297305]
>  
> {code:java}
> *** RUN ABORTED ***
> 20159  java.lang.NoClassDefFoundError: 
> org/codehaus/jackson/map/type/TypeFactory
> 20160  at org.apache.hadoop.hive.ql.udf.UDFJson.(UDFJson.java:64)
> 20161  at java.lang.Class.forName0(Native Method)
> 20162  at java.lang.Class.forName(Class.java:348)
> 20163  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClassInternal(GenericUDFBridge.java:142)
> 20164  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:132)
> 20165  at 
> org.apache.hadoop.hive.ql.exec.FunctionInfo.getFunctionClass(FunctionInfo.java:151)
> 20166  at 
> org.apache.hadoop.hive.ql.exec.Registry.addFunction(Registry.java:519)
> 20167  at 
> org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:163)
> 20168  at 
> org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:154)
> 20169  at 
> org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:147)
> 20170  ...
> 20171  Cause: java.lang.ClassNotFoundException: 
> org.codehaus.jackson.map.type.TypeFactory
> 20172  at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
> 20173  at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> 20174  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> 20175  at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> 20176  at org.apache.hadoop.hive.ql.udf.UDFJson.(UDFJson.java:64)
> 20177  at java.lang.Class.forName0(Native Method)
> 20178  at java.lang.Class.forName(Class.java:348)
> 20179  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClassInternal(GenericUDFBridge.java:142)
> 20180  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:132)
> 20181  at 
> org.apache.hadoop.hive.ql.exec.FunctionInfo.getFunctionClass(FunctionInfo.java:151)
> 20182  ... {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44624) Wrap retries around initial streaming GRPC call in connect

2023-08-01 Thread Juliusz Sompolski (Jira)
Juliusz Sompolski created SPARK-44624:
-

 Summary: Wrap retries around initial streaming GRPC call in connect
 Key: SPARK-44624
 URL: https://issues.apache.org/jira/browse/SPARK-44624
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0, 4.0.0
Reporter: Juliusz Sompolski


Even though we empirically observed that error is throws only from first next() 
or hasNext() of the response StreamObserver, wrap the initial call in retries 
as well to not depend on it in case it's just an quirk that's not fully 
dependable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44480) Add option for thread pool to perform maintenance for RocksDB/HDFS State Store Providers

2023-08-01 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-44480:


Assignee: Eric Marnadi

> Add option for thread pool to perform maintenance for RocksDB/HDFS State 
> Store Providers
> 
>
> Key: SPARK-44480
> URL: https://issues.apache.org/jira/browse/SPARK-44480
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Eric Marnadi
>Assignee: Eric Marnadi
>Priority: Major
>
> Maintenance tasks on StateStore was being done by a single background thread, 
> which is prone to straggling. In this change, the single background thread 
> would instead schedule maintenance tasks to a thread pool.
> Introduce 
> {{spark.sql.streaming.stateStore.enableStateStoreMaintenanceThreadPool}} 
> config so that the user can enable a thread pool for maintenance manually.
> Introduce {{spark.sql.streaming.stateStore.numStateStoreMaintenanceThreads}} 
> config so the thread pool size is configurable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44480) Add option for thread pool to perform maintenance for RocksDB/HDFS State Store Providers

2023-08-01 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-44480.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42066
[https://github.com/apache/spark/pull/42066]

> Add option for thread pool to perform maintenance for RocksDB/HDFS State 
> Store Providers
> 
>
> Key: SPARK-44480
> URL: https://issues.apache.org/jira/browse/SPARK-44480
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Eric Marnadi
>Assignee: Eric Marnadi
>Priority: Major
> Fix For: 4.0.0
>
>
> Maintenance tasks on StateStore was being done by a single background thread, 
> which is prone to straggling. In this change, the single background thread 
> would instead schedule maintenance tasks to a thread pool.
> Introduce 
> {{spark.sql.streaming.stateStore.enableStateStoreMaintenanceThreadPool}} 
> config so that the user can enable a thread pool for maintenance manually.
> Introduce {{spark.sql.streaming.stateStore.numStateStoreMaintenanceThreads}} 
> config so the thread pool size is configurable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44623) Upgrade commons-lang3 to 3.13.0

2023-08-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44623:
-

Assignee: Dongjoon Hyun

> Upgrade commons-lang3 to 3.13.0
> ---
>
> Key: SPARK-44623
> URL: https://issues.apache.org/jira/browse/SPARK-44623
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44623) Upgrade commons-lang3 to 3.13.0

2023-08-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44623.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42269
[https://github.com/apache/spark/pull/42269]

> Upgrade commons-lang3 to 3.13.0
> ---
>
> Key: SPARK-44623
> URL: https://issues.apache.org/jira/browse/SPARK-44623
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29497) Cannot assign instance of java.lang.invoke.SerializedLambda to field

2023-08-01 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-29497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17749991#comment-17749991
 ] 

Herman van Hövell commented on SPARK-29497:
---

I have added a check for this to Spark Connect. If someone is brave enough they 
can do the same thing for other UDFs.

> Cannot assign instance of java.lang.invoke.SerializedLambda to field
> 
>
> Key: SPARK-29497
> URL: https://issues.apache.org/jira/browse/SPARK-29497
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.3, 3.0.1, 3.2.0
> Environment: Spark 2.4.3 Scala 2.12
> Spark 3.2.0 Scala 2.13.5 (Java 11.0.12)
>Reporter: Rob Russo
>Priority: Major
>
> Note this is for scala 2.12:
> There seems to be an issue in spark with serializing a udf that is created 
> from a function assigned to a class member that references another function 
> assigned to a class member. This is similar to 
> https://issues.apache.org/jira/browse/SPARK-25047 but it looks like the 
> resolution has an issue with this case. After trimming it down to the base 
> issue I came up with the following to reproduce:
>  
>  
> {code:java}
> object TestLambdaShell extends Serializable {
>   val hello: String => String = s => s"hello $s!"  
>   val lambdaTest: String => String = hello( _ )  
>   def functionTest: String => String = hello( _ )
> }
> val hello = udf( TestLambdaShell.hello )
> val functionTest = udf( TestLambdaShell.functionTest )
> val lambdaTest = udf( TestLambdaShell.lambdaTest )
> sc.parallelize(Seq("world"),1).toDF("test").select(hello($"test")).show(1)
> sc.parallelize(Seq("world"),1).toDF("test").select(functionTest($"test")).show(1)
> sc.parallelize(Seq("world"),1).toDF("test").select(lambdaTest($"test")).show(1)
> {code}
>  
> All of which works except the last line which results in an exception on the 
> executors:
>  
> {code:java}
> Caused by: java.lang.ClassCastException: cannot assign instance of 
> java.lang.invoke.SerializedLambda to field 
> $$$82b5b23cea489b2712a1db46c77e458w$TestLambdaShell$.lambdaTest of type 
> scala.Function1 in instance of 
> $$$82b5b23cea489b2712a1db46c77e458w$TestLambdaShell$
>   at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
>   at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2251)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1933)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1933)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
>   at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1933)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1933)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.r

[jira] [Resolved] (SPARK-44613) Add Encoders.scala to Spark Connect Scala Client

2023-08-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44613.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Add Encoders.scala to Spark Connect Scala Client
> 
>
> Key: SPARK-44613
> URL: https://issues.apache.org/jira/browse/SPARK-44613
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44616) Hive Generic UDF support no longer supports short-circuiting of argument evaluation

2023-08-01 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-44616:
---
Description: 
PR [https://github.com/apache/spark/pull/39555] changed DeferredObject to no 
longer contain a function, and instead contains a value. This removes the 
deferred evaluation capability and means that HiveGenericUDF implementations 
can no longer short-circuit the evaluation of their arguments, which could be a 
performance issue for some users.

Here is a relevant javadoc comment from the Hive source for DeferredObject:

{code:java}
  /**
   * A Defered Object allows us to do lazy-evaluation and short-circuiting.
   * GenericUDF use DeferedObject to pass arguments.
   */
  public static interface DeferredObject {
{code}

 

  was:
PR https://github.com/apache/spark/pull/39555 changed DeferredObject to no 
longer contain a function, and instead contains a value. This removes the 
deferred evaluation capability and means that HiveGenericUDF implementations 
can no longer short-circuit the evaluation of their arguments, which could be a 
performance issue for some users.

Here is a relevant javadoc comment from the Hive source for DeferredObject:

{{{
  /**
   * A Defered Object allows us to do lazy-evaluation and short-circuiting.
   * GenericUDF use DeferedObject to pass arguments.
   */
  public static interface DeferredObject {
}}}


> Hive Generic UDF support no longer supports short-circuiting of argument 
> evaluation
> ---
>
> Key: SPARK-44616
> URL: https://issues.apache.org/jira/browse/SPARK-44616
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Andy Grove
>Priority: Major
>
> PR [https://github.com/apache/spark/pull/39555] changed DeferredObject to no 
> longer contain a function, and instead contains a value. This removes the 
> deferred evaluation capability and means that HiveGenericUDF implementations 
> can no longer short-circuit the evaluation of their arguments, which could be 
> a performance issue for some users.
> Here is a relevant javadoc comment from the Hive source for DeferredObject:
> {code:java}
>   /**
>* A Defered Object allows us to do lazy-evaluation and short-circuiting.
>* GenericUDF use DeferedObject to pass arguments.
>*/
>   public static interface DeferredObject {
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >