date:20220410

[jira] [Updated] (SPARK-37960) A new framework to represent catalyst expressions in DS v2 APIs

2022-04-10 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37960:
---
Parent: SPARK-38852
Issue Type: Sub-task  (was: New Feature)

> A new framework to represent catalyst expressions in DS v2 APIs
> ---
>
> Key: SPARK-37960
> URL: https://issues.apache.org/jira/browse/SPARK-37960
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>
> Spark need a new framework to represent catalyst expressions in DS v2 APIs.
> CASE ... WHEN ... ELSE ... END is just the first use case.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37867) Compile aggregate functions of build-in JDBC dialect

2022-04-10 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37867:
---
Parent: SPARK-38852
Issue Type: Sub-task  (was: New Feature)

> Compile aggregate functions of build-in JDBC dialect
> 
>
> Key: SPARK-37867
> URL: https://issues.apache.org/jira/browse/SPARK-37867
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37839) DS V2 supports partial aggregate push-down AVG

2022-04-10 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37839:
---
Parent: SPARK-38852
Issue Type: Sub-task  (was: New Feature)

> DS V2 supports partial aggregate push-down AVG
> --
>
> Key: SPARK-37839
> URL: https://issues.apache.org/jira/browse/SPARK-37839
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, DS V2 supports complete aggregate push-down AVG. But, supports 
> partial aggregate push-down for AVG is very useful.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37839) DS V2 supports partial aggregate push-down AVG

2022-04-10 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37839:
---
Epic Link: (was: SPARK-38788)

> DS V2 supports partial aggregate push-down AVG
> --
>
> Key: SPARK-37839
> URL: https://issues.apache.org/jira/browse/SPARK-37839
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, DS V2 supports complete aggregate push-down AVG. But, supports 
> partial aggregate push-down for AVG is very useful.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37527) Translate more standard aggregate functions for pushdown

2022-04-10 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37527:
---
Parent: SPARK-38852
Issue Type: Sub-task  (was: Improvement)

> Translate more standard aggregate functions for pushdown
> 
>
> Key: SPARK-37527
> URL: https://issues.apache.org/jira/browse/SPARK-37527
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, Spark aggregate pushdown will translate some standard aggregate 
> functions, so that compile these functions suitable specify database.
> After this job, users could override JdbcDialect.compileAggregate to 
> implement some aggregate functions supported by some database.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37734) Upgrade h2 from 1.4.195 to 2.0.202

2022-04-10 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37734:
---
Parent: SPARK-38852
Issue Type: Sub-task  (was: Dependency upgrade)

> Upgrade h2 from 1.4.195 to 2.0.202
> --
>
> Key: SPARK-37734
> URL: https://issues.apache.org/jira/browse/SPARK-37734
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, the om.h2database exists 1 vulnerability, ref: 
> https://www.tenable.com/cve/CVE-2021-23463 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37732) Improve the implement of JDBCV2Suite

2022-04-10 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37732:
---
Parent: SPARK-38852
Issue Type: Sub-task  (was: New Feature)

> Improve the implement of JDBCV2Suite
> 
>
> Key: SPARK-37732
> URL: https://issues.apache.org/jira/browse/SPARK-37732
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>
> When I reading the implement of JDBCV2Suite, I find we can improve the code.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37644) Support datasource v2 complete aggregate pushdown

2022-04-10 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37644:
---
Epic Link: (was: SPARK-38788)

> Support datasource v2 complete aggregate pushdown 
> --
>
> Key: SPARK-37644
> URL: https://issues.apache.org/jira/browse/SPARK-37644
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently , Spark supports push down aggregate with partial-agg and final-agg 
> . For some data source (e.g. JDBC ) , we can avoid partial-agg and final-agg 
> by running completely on database.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37644) Support datasource v2 complete aggregate pushdown

2022-04-10 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37644:
---
Parent: SPARK-38852
Issue Type: Sub-task  (was: New Feature)

> Support datasource v2 complete aggregate pushdown 
> --
>
> Key: SPARK-37644
> URL: https://issues.apache.org/jira/browse/SPARK-37644
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently , Spark supports push down aggregate with partial-agg and final-agg 
> . For some data source (e.g. JDBC ) , we can avoid partial-agg and final-agg 
> by running completely on database.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37483) Support push down top N to JDBC data source V2

2022-04-10 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37483:
---
Epic Link: (was: SPARK-38788)

> Support push down top N to JDBC data source V2
> --
>
> Key: SPARK-37483
> URL: https://issues.apache.org/jira/browse/SPARK-37483
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37483) Support push down top N to JDBC data source V2

2022-04-10 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37483:
---
Parent: SPARK-38852
Issue Type: Sub-task  (was: New Feature)

> Support push down top N to JDBC data source V2
> --
>
> Key: SPARK-37483
> URL: https://issues.apache.org/jira/browse/SPARK-37483
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37286) Move compileAggregates from JDBCRDD to JdbcDialect

2022-04-10 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37286:
---
Parent: SPARK-38852
Issue Type: Sub-task  (was: Improvement)

> Move compileAggregates from JDBCRDD to JdbcDialect
> --
>
> Key: SPARK-37286
> URL: https://issues.apache.org/jira/browse/SPARK-37286
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, the method compileAggregates in JDBCRDD. But it is not reasonable, 
> because the JDBC source knowns how to compile aggregate expressions to 
> itself's dialect.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37212) Improve the implement of aggregate pushdown.

2022-04-10 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37212:
---
Parent: SPARK-38852
Issue Type: Sub-task  (was: Improvement)

> Improve the implement of aggregate pushdown.
> 
>
> Key: SPARK-37212
> URL: https://issues.apache.org/jira/browse/SPARK-37212
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>
> Spark SQL supported aggregate pushdown for JDBC. When I reading the current 
> implement, I find some little issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36574) pushDownPredicate=false should prevent push down filters to JDBC data source

2022-04-10 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-36574:
---
Parent: SPARK-38852
Issue Type: Sub-task  (was: Bug)

> pushDownPredicate=false should prevent push down filters to JDBC data source
> 
>
> Key: SPARK-36574
> URL: https://issues.apache.org/jira/browse/SPARK-36574
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.2.0
>
>
> Spark SQL includes a data source that can read data from other databases 
> using JDBC. 
> Spark also supports the case-insensitive option pushDownPredicate.
> According to http://spark.apache.org/docs/latest/sql-data-sources-jdbc.html, 
> If set pushDownPredicate to false, no filter will be pushed down to the JDBC 
> data source and thus all filters will be handled by Spark.
> But I find it still be pushed down to JDBC data source.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38852) Better Data Source V2 operator pushdown framework

2022-04-10 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-38852:
---
Description: 
Currently, Spark supports push down Filters and Aggregates to data source.
However, the Data Source V2 operator pushdown framework has the following 
shortcomings:

# Only simple filter and aggregate are supported, which makes it impossible to 
apply in most scenarios
# The incompatibility of SQL syntax makes it impossible to apply in most 
scenarios
# Aggregate push down does not support multiple partitions of data sources
# Spark's additional aggregate will cause some overhead
# Limit push down is not supported
# Top n push down is not supported
# Paging push down is not supported

  was:
Currently, Spark supports push down Filters and Aggregates to data source.
But, the 

# Only simple filter and aggregate are supported, which makes it impossible to 
apply in most scenarios
# The incompatibility of SQL syntax makes it impossible to apply in most 
scenarios
# Aggregate push down does not support multiple partitions of data sources
# Spark's additional aggregate will cause some overhead
# Limit push down is not supported
# Top n push down is not supported
# Paging push down is not supported


> Better Data Source V2 operator pushdown framework
> -
>
> Key: SPARK-38852
> URL: https://issues.apache.org/jira/browse/SPARK-38852
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, Spark supports push down Filters and Aggregates to data source.
> However, the Data Source V2 operator pushdown framework has the following 
> shortcomings:
> # Only simple filter and aggregate are supported, which makes it impossible 
> to apply in most scenarios
> # The incompatibility of SQL syntax makes it impossible to apply in most 
> scenarios
> # Aggregate push down does not support multiple partitions of data sources
> # Spark's additional aggregate will cause some overhead
> # Limit push down is not supported
> # Top n push down is not supported
> # Paging push down is not supported



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38852) Better Data Source V2 operator pushdown framework

2022-04-10 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-38852:
---
Description: 
Currently, Spark supports push down Filters and Aggregates to data source.
But, the 

# Only simple filter and aggregate are supported, which makes it impossible to 
apply in most scenarios
# The incompatibility of SQL syntax makes it impossible to apply in most 
scenarios
# Aggregate push down does not support multiple partitions of data sources
# Spark's additional aggregate will cause some overhead
# Limit push down is not supported
# Top n push down is not supported
# Paging push down is not supported

> Better Data Source V2 operator pushdown framework
> -
>
> Key: SPARK-38852
> URL: https://issues.apache.org/jira/browse/SPARK-38852
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, Spark supports push down Filters and Aggregates to data source.
> But, the 
> # Only simple filter and aggregate are supported, which makes it impossible 
> to apply in most scenarios
> # The incompatibility of SQL syntax makes it impossible to apply in most 
> scenarios
> # Aggregate push down does not support multiple partitions of data sources
> # Spark's additional aggregate will cause some overhead
> # Limit push down is not supported
> # Top n push down is not supported
> # Paging push down is not supported



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38852) Better Data Source V2 operator pushdown framework

2022-04-10 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-38852:
--

 Summary: Better Data Source V2 operator pushdown framework
 Key: SPARK-38852
 URL: https://issues.apache.org/jira/browse/SPARK-38852
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38678) Enable RocksDB tests on Apple Silicon on MacOS

2022-04-10 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520322#comment-17520322
 ] 

Apache Spark commented on SPARK-38678:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36139

> Enable RocksDB tests on Apple Silicon on MacOS
> --
>
> Key: SPARK-38678
> URL: https://issues.apache.org/jira/browse/SPARK-38678
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38851) Refactor `HistoryServerSuite` to add UTs for RocksDB

2022-04-10 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38851:


Assignee: Apache Spark

> Refactor `HistoryServerSuite` to add UTs for RocksDB
> 
>
> Key: SPARK-38851
> URL: https://issues.apache.org/jira/browse/SPARK-38851
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> HistoryServerSuite now only test leveldb backend



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38851) Refactor `HistoryServerSuite` to add UTs for RocksDB

2022-04-10 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38851:


Assignee: (was: Apache Spark)

> Refactor `HistoryServerSuite` to add UTs for RocksDB
> 
>
> Key: SPARK-38851
> URL: https://issues.apache.org/jira/browse/SPARK-38851
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> HistoryServerSuite now only test leveldb backend



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38851) Refactor `HistoryServerSuite` to add UTs for RocksDB

2022-04-10 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520314#comment-17520314
 ] 

Apache Spark commented on SPARK-38851:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36138

> Refactor `HistoryServerSuite` to add UTs for RocksDB
> 
>
> Key: SPARK-38851
> URL: https://issues.apache.org/jira/browse/SPARK-38851
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> HistoryServerSuite now only test leveldb backend



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38761) DS V2 supports push down misc non-aggregate functions

2022-04-10 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-38761:
---

Assignee: jiaan.geng

> DS V2 supports push down misc non-aggregate functions
> -
>
> Key: SPARK-38761
> URL: https://issues.apache.org/jira/browse/SPARK-38761
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
>
> Currently, Spark have a lot misc non-aggregate functions of ANSI standard.
> abs,
> coalesce,
> nullif,
> when
> DS V2 should supports push down these misc non-aggregate functions



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38851) Refactor `HistoryServerSuite` to add UTs for RocksDB

2022-04-10 Thread Yang Jie (Jira)

Yang Jie created SPARK-38851:


 Summary: Refactor `HistoryServerSuite` to add UTs for RocksDB
 Key: SPARK-38851
 URL: https://issues.apache.org/jira/browse/SPARK-38851
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 3.4.0
Reporter: Yang Jie


HistoryServerSuite now only test leveldb backend



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38761) DS V2 supports push down misc non-aggregate functions

2022-04-10 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-38761.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 36039
[https://github.com/apache/spark/pull/36039]

> DS V2 supports push down misc non-aggregate functions
> -
>
> Key: SPARK-38761
> URL: https://issues.apache.org/jira/browse/SPARK-38761
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, Spark have a lot misc non-aggregate functions of ANSI standard.
> abs,
> coalesce,
> nullif,
> when
> DS V2 should supports push down these misc non-aggregate functions



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38837) Implement `dropna` parameter of `SeriesGroupBy.value_counts`

2022-04-10 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-38837:
-
Fix Version/s: 3.3.0

> Implement `dropna` parameter of `SeriesGroupBy.value_counts`
> 
>
> Key: SPARK-38837
> URL: https://issues.apache.org/jira/browse/SPARK-38837
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
>
> Implement `dropna` parameter of `SeriesGroupBy.value_counts`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38837) Implement `dropna` parameter of `SeriesGroupBy.value_counts`

2022-04-10 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-38837:


Assignee: Xinrong Meng

> Implement `dropna` parameter of `SeriesGroupBy.value_counts`
> 
>
> Key: SPARK-38837
> URL: https://issues.apache.org/jira/browse/SPARK-38837
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> Implement `dropna` parameter of `SeriesGroupBy.value_counts`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38837) Implement `dropna` parameter of `SeriesGroupBy.value_counts`

2022-04-10 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38837.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36093
[https://github.com/apache/spark/pull/36093]

> Implement `dropna` parameter of `SeriesGroupBy.value_counts`
> 
>
> Key: SPARK-38837
> URL: https://issues.apache.org/jira/browse/SPARK-38837
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.4.0
>
>
> Implement `dropna` parameter of `SeriesGroupBy.value_counts`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38846) Teradata's Number is either converted to its floor value or ceiling value despite its fractional part.

2022-04-10 Thread eugene (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520299#comment-17520299
 ] 

eugene commented on SPARK-38846:


[~hyukjin.kwon]  Thanks, just tried with latest Spark version (Spark 3.2.1) , 
the issue is still there.

> Teradata's Number is either converted to its floor value or ceiling value 
> despite its fractional part.
> --
>
> Key: SPARK-38846
> URL: https://issues.apache.org/jira/browse/SPARK-38846
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: Spark2.3.0 on Yarn
> Teradata 16.20.32.59
>Reporter: eugene
>Priority: Major
>
> I'm trying to load data from Teradata, the code using is:
>     
>     sparkSession.read
>           .format("jdbc")
>           .options(
>             Map(
>               "url" -> "jdbc:teradata://hostname, user=$username, 
> password=$password",
>               "MAYBENULL" -> "ON",
>               "SIP_SUPPORT" -> "ON",
>               "driver" -> "com.teradata.jdbc.TeraDriver",
>               "dbtable" -> $table_name
>             )
>           )
>           .load()
> However, some data lost its fractional part after loading. To be more 
> concise, the column in Teradata is in the type of [Number][1] and after 
> loading, the data type in Spark is `DecimalType(38,0)`, the scale value is 0 
> which means no digits after decimal point. 
> Data in Teradata is something like,
>     id column1 column2
>     1   50.23    100.23
>     2   25.8     20.669
>     3   30.2     19.23
> The `dataframe` of Spark is like,
>     id column1 column2
>     1   50     100
>     2   26     21
>     3   30     19
> The meta data of the table in Teradata is like:
>     CREATE SET TABLE table_name (id BIGINT, column1 NUMBER, column2 NUMBER) 
> PRIMARY INDEX (id);
> The Spark version is 2.3.0 and Teradata is 16.20.32.59. 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38829) New configuration for controlling timestamp inference of Parquet

2022-04-10 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520297#comment-17520297
 ] 

Apache Spark commented on SPARK-38829:
--

User 'sadikovi' has created a pull request for this issue:
https://github.com/apache/spark/pull/36137

> New configuration for controlling timestamp inference of Parquet
> 
>
> Key: SPARK-38829
> URL: https://issues.apache.org/jira/browse/SPARK-38829
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Ivan Sadikov
>Priority: Major
>
> A new SQL conf which can fallback to the behavior that reads all the Parquet 
> Timestamp column as TimestampType.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38829) New configuration for controlling timestamp inference of Parquet

2022-04-10 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38829:


Assignee: Apache Spark  (was: Ivan Sadikov)

> New configuration for controlling timestamp inference of Parquet
> 
>
> Key: SPARK-38829
> URL: https://issues.apache.org/jira/browse/SPARK-38829
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> A new SQL conf which can fallback to the behavior that reads all the Parquet 
> Timestamp column as TimestampType.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38829) New configuration for controlling timestamp inference of Parquet

2022-04-10 Thread Ivan Sadikov (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520296#comment-17520296
 ] 

Ivan Sadikov commented on SPARK-38829:
--

I opened [https://github.com/apache/spark/pull/36137] to disable 
TimestampNTZType support in Parquet in 3.3.

> New configuration for controlling timestamp inference of Parquet
> 
>
> Key: SPARK-38829
> URL: https://issues.apache.org/jira/browse/SPARK-38829
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Ivan Sadikov
>Priority: Major
>
> A new SQL conf which can fallback to the behavior that reads all the Parquet 
> Timestamp column as TimestampType.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38829) New configuration for controlling timestamp inference of Parquet

2022-04-10 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38829:


Assignee: Ivan Sadikov  (was: Apache Spark)

> New configuration for controlling timestamp inference of Parquet
> 
>
> Key: SPARK-38829
> URL: https://issues.apache.org/jira/browse/SPARK-38829
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Ivan Sadikov
>Priority: Major
>
> A new SQL conf which can fallback to the behavior that reads all the Parquet 
> Timestamp column as TimestampType.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38829) New configuration for controlling timestamp inference of Parquet

2022-04-10 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520295#comment-17520295
 ] 

Apache Spark commented on SPARK-38829:
--

User 'sadikovi' has created a pull request for this issue:
https://github.com/apache/spark/pull/36137

> New configuration for controlling timestamp inference of Parquet
> 
>
> Key: SPARK-38829
> URL: https://issues.apache.org/jira/browse/SPARK-38829
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Ivan Sadikov
>Priority: Major
>
> A new SQL conf which can fallback to the behavior that reads all the Parquet 
> Timestamp column as TimestampType.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38845) SparkContext init before SparkSession will cause hive table not found

2022-04-10 Thread wangshengjie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520274#comment-17520274
 ] 

wangshengjie commented on SPARK-38845:
--

Duplicate issue, close this issue

> SparkContext init before SparkSession will cause hive table not found
> -
>
> Key: SPARK-38845
> URL: https://issues.apache.org/jira/browse/SPARK-38845
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.1
>Reporter: wangshengjie
>Priority: Major
>
> If we init SparkContext before SparkSession when using DataFrame to query 
> hive table, if will throw table not found exception.
> {code:java}
> //代码占位符
> val sparkContext = new SparkContext()
> val sparkSession = SparkSession
>   .builder
>   .appName("SparkSession Test")
>   .enableHiveSupport()
>   .getOrCreate()
> sparkSession.sql("select * from tableA"){code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38845) SparkContext init before SparkSession will cause hive table not found

2022-04-10 Thread wangshengjie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangshengjie resolved SPARK-38845.
--
Resolution: Duplicate

> SparkContext init before SparkSession will cause hive table not found
> -
>
> Key: SPARK-38845
> URL: https://issues.apache.org/jira/browse/SPARK-38845
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.1
>Reporter: wangshengjie
>Priority: Major
>
> If we init SparkContext before SparkSession when using DataFrame to query 
> hive table, if will throw table not found exception.
> {code:java}
> //代码占位符
> val sparkContext = new SparkContext()
> val sparkSession = SparkSession
>   .builder
>   .appName("SparkSession Test")
>   .enableHiveSupport()
>   .getOrCreate()
> sparkSession.sql("select * from tableA"){code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38708) Upgrade Hive Metastore Client to the 3.1.3 for Hive 3.1

2022-04-10 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-38708.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36018
[https://github.com/apache/spark/pull/36018]

> Upgrade Hive Metastore Client to the 3.1.3 for Hive 3.1
> ---
>
> Key: SPARK-38708
> URL: https://issues.apache.org/jira/browse/SPARK-38708
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38708) Upgrade Hive Metastore Client to the 3.1.3 for Hive 3.1

2022-04-10 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-38708:
-

Assignee: Yuming Wang

> Upgrade Hive Metastore Client to the 3.1.3 for Hive 3.1
> ---
>
> Key: SPARK-38708
> URL: https://issues.apache.org/jira/browse/SPARK-38708
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38767) Support ignoreCorruptFiles and ignoreMissingFiles in Data Source options

2022-04-10 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38767:
--
Affects Version/s: 3.4.0
   (was: 3.2.1)

> Support ignoreCorruptFiles and ignoreMissingFiles in Data Source options
> 
>
> Key: SPARK-38767
> URL: https://issues.apache.org/jira/browse/SPARK-38767
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yaohua Zhao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38767) Support ignoreCorruptFiles and ignoreMissingFiles in Data Source options

2022-04-10 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38767:
--
Priority: Minor  (was: Major)

> Support ignoreCorruptFiles and ignoreMissingFiles in Data Source options
> 
>
> Key: SPARK-38767
> URL: https://issues.apache.org/jira/browse/SPARK-38767
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yaohua Zhao
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38850) Upgrade Kafka to 3.1.1

2022-04-10 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38850:


Assignee: (was: Apache Spark)

> Upgrade Kafka to 3.1.1
> --
>
> Key: SPARK-38850
> URL: https://issues.apache.org/jira/browse/SPARK-38850
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38850) Upgrade Kafka to 3.1.1

2022-04-10 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520261#comment-17520261
 ] 

Apache Spark commented on SPARK-38850:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36135

> Upgrade Kafka to 3.1.1
> --
>
> Key: SPARK-38850
> URL: https://issues.apache.org/jira/browse/SPARK-38850
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38850) Upgrade Kafka to 3.1.1

2022-04-10 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38850:


Assignee: Apache Spark

> Upgrade Kafka to 3.1.1
> --
>
> Key: SPARK-38850
> URL: https://issues.apache.org/jira/browse/SPARK-38850
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38800) Explicitly document the supported pandas version.

2022-04-10 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38800:
--
Fix Version/s: 3.3.0
   (was: 3.4.0)

> Explicitly document the supported pandas version.
> -
>
> Key: SPARK-38800
> URL: https://issues.apache.org/jira/browse/SPARK-38800
> Project: Spark
>  Issue Type: Test
>  Components: Documentation, PySpark
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.3.0
>
>
> pandas has different behavior per each version for some APIs.
> So, we should explicitly follow one pandas version for one pandas-on-Spark 
> version and document it.
> For example, if some APIs follow the behavior of pandas 1.3 whereas some APIs 
> follow the behavior of pandas 1.4, users may be confused.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38850) Upgrade Kafka to 3.1.1

2022-04-10 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-38850:
-

 Summary: Upgrade Kafka to 3.1.1
 Key: SPARK-38850
 URL: https://issues.apache.org/jira/browse/SPARK-38850
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.3.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38830) Warn corrupted Netty RPC messages

2022-04-10 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-38830:
-

Assignee: Dongjoon Hyun

> Warn corrupted Netty RPC messages
> -
>
> Key: SPARK-38830
> URL: https://issues.apache.org/jira/browse/SPARK-38830
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38830) Warn corrupted Netty RPC messages

2022-04-10 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-38830.
---
Fix Version/s: 3.3.0
   3.2.2
   Resolution: Fixed

Issue resolved by pull request 36116
[https://github.com/apache/spark/pull/36116]

> Warn corrupted Netty RPC messages
> -
>
> Key: SPARK-38830
> URL: https://issues.apache.org/jira/browse/SPARK-38830
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38830) Warn on corrupted block messages

2022-04-10 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38830:
--
Summary: Warn on corrupted block messages  (was: Warn corrupted Netty RPC 
messages)

> Warn on corrupted block messages
> 
>
> Key: SPARK-38830
> URL: https://issues.apache.org/jira/browse/SPARK-38830
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38830) Warn on corrupted block messages

2022-04-10 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38830:
--
Affects Version/s: 3.2.1

> Warn on corrupted block messages
> 
>
> Key: SPARK-38830
> URL: https://issues.apache.org/jira/browse/SPARK-38830
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36681) Fail to load Snappy codec

2022-04-10 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520254#comment-17520254
 ] 

Apache Spark commented on SPARK-36681:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/36136

> Fail to load Snappy codec
> -
>
> Key: SPARK-36681
> URL: https://issues.apache.org/jira/browse/SPARK-36681
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.3.0
>
>
> snappy-java as a native library should not be relocated in Hadoop shaded 
> client libraries. Currently we use Hadoop shaded client libraries in Spark. 
> If trying to use SnappyCodec to write sequence file, we will encounter the 
> following error:
> {code}
> [info]   Cause: java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Ljava/nio/ByteBuffer;IILjava/nio/ByteBuffer;I)I
> [info]   at 
> org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Native 
> Method)   
>   
> [info]   at 
> org.apache.hadoop.shaded.org.xerial.snappy.Snappy.compress(Snappy.java:151)   
>   
>
> [info]   at 
> org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressDirectBuf(SnappyCompressor.java:282)
> [info]   at 
> org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:210)
> [info]   at 
> org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149)
> [info]   at 
> org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142)
> [info]   at 
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.writeBuffer(SequenceFile.java:1589)
>  
> [info]   at 
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.sync(SequenceFile.java:1605)
> [info]   at 
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1629)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36681) Fail to load Snappy codec

2022-04-10 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520255#comment-17520255
 ] 

Apache Spark commented on SPARK-36681:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/36136

> Fail to load Snappy codec
> -
>
> Key: SPARK-36681
> URL: https://issues.apache.org/jira/browse/SPARK-36681
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.3.0
>
>
> snappy-java as a native library should not be relocated in Hadoop shaded 
> client libraries. Currently we use Hadoop shaded client libraries in Spark. 
> If trying to use SnappyCodec to write sequence file, we will encounter the 
> following error:
> {code}
> [info]   Cause: java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Ljava/nio/ByteBuffer;IILjava/nio/ByteBuffer;I)I
> [info]   at 
> org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Native 
> Method)   
>   
> [info]   at 
> org.apache.hadoop.shaded.org.xerial.snappy.Snappy.compress(Snappy.java:151)   
>   
>
> [info]   at 
> org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressDirectBuf(SnappyCompressor.java:282)
> [info]   at 
> org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:210)
> [info]   at 
> org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149)
> [info]   at 
> org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142)
> [info]   at 
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.writeBuffer(SequenceFile.java:1589)
>  
> [info]   at 
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.sync(SequenceFile.java:1605)
> [info]   at 
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1629)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38831) How to enable encryption for checkpoint data?

2022-04-10 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520251#comment-17520251
 ] 

Hyukjin Kwon commented on SPARK-38831:
--

[~zoli] Let's interact with Spark mailing list for a question before filing it 
here.

> How to enable encryption for checkpoint data?
> -
>
> Key: SPARK-38831
> URL: https://issues.apache.org/jira/browse/SPARK-38831
> Project: Spark
>  Issue Type: Question
>  Components: Security
>Affects Versions: 3.2.1
>Reporter: zoli
>Priority: Major
>
> Setting  spark.io.encryption.enabled to true as described here:
> [https://spark.apache.org/docs/latest/security.html#local-storage-encryption 
> |https://spark.apache.org/docs/latest/security.html#local-storage-encryption]has
>  no effect at all on checkpoints.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38831) How to enable encryption for checkpoint data?

2022-04-10 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38831.
--
Resolution: Invalid

> How to enable encryption for checkpoint data?
> -
>
> Key: SPARK-38831
> URL: https://issues.apache.org/jira/browse/SPARK-38831
> Project: Spark
>  Issue Type: Question
>  Components: Security
>Affects Versions: 3.2.1
>Reporter: zoli
>Priority: Major
>
> Setting  spark.io.encryption.enabled to true as described here:
> [https://spark.apache.org/docs/latest/security.html#local-storage-encryption 
> |https://spark.apache.org/docs/latest/security.html#local-storage-encryption]has
>  no effect at all on checkpoints.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38839) Creating a struct with a float inside

2022-04-10 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38839.
--
Resolution: Duplicate

> Creating a struct with a float inside 
> --
>
> Key: SPARK-38839
> URL: https://issues.apache.org/jira/browse/SPARK-38839
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: Daniel deCordoba
>Priority: Minor
>
> When creating a dataframe using createDataFrame that contains a float inside 
> a struct, the float is set to null. This only happens if using a list of 
> dictionaries as data type, if I use a list of Rows it works fine:
> {code:java}
> data = [{"MyStruct": {"MyInt": 10, "MyFloat": 10.1}, "MyFloat": 10.1}]
> spark.createDataFrame(data).show()
> # +---+--+
> # |MyFloat|MyStruct                      |
> # +---+--+
> # |10.1   |{MyInt -> 10, MyFloat -> null}|
> # +---+--+ 
> data = [Row(MyStruct=Row(MyInt=10, MyFloat=10.1), MyFloat=10.1)]
> spark.createDataFrame(data).show()
> # +---+--+
> # |MyFloat|MyStruct                      |
> # +---+--+
> # |10.1   |{MyInt -> 10, MyFloat -> 10.1}|
> # +---+--+ {code}
> Note MyFloat inside MyStruct is set to null in the first example. 
> Interestingly enough, when I do the same with Row, or if I specify the 
> schema, then this does not happen (second example).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38839) Creating a struct with a float inside

2022-04-10 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520250#comment-17520250
 ] 

Hyukjin Kwon commented on SPARK-38839:
--

This is because by default the inner "{"MyInt": 10, "MyFloat": 10.1}" get 
inferred as a map:

{code}
root
 |-- MyFloat: double (nullable = true)
 |-- MyStruct: map (nullable = true)
 ||-- key: string
 ||-- value: long (valueContainsNull = true)
{code}

and since 10.1 is not a long, it becomes {{null}}. You can work around via 
setting {{spark.sql.pyspark.inferNestedDictAsStruct.enabled}} configuration to 
{{true}} from Spark 3.3.0.

> Creating a struct with a float inside 
> --
>
> Key: SPARK-38839
> URL: https://issues.apache.org/jira/browse/SPARK-38839
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: Daniel deCordoba
>Priority: Minor
>
> When creating a dataframe using createDataFrame that contains a float inside 
> a struct, the float is set to null. This only happens if using a list of 
> dictionaries as data type, if I use a list of Rows it works fine:
> {code:java}
> data = [{"MyStruct": {"MyInt": 10, "MyFloat": 10.1}, "MyFloat": 10.1}]
> spark.createDataFrame(data).show()
> # +---+--+
> # |MyFloat|MyStruct                      |
> # +---+--+
> # |10.1   |{MyInt -> 10, MyFloat -> null}|
> # +---+--+ 
> data = [Row(MyStruct=Row(MyInt=10, MyFloat=10.1), MyFloat=10.1)]
> spark.createDataFrame(data).show()
> # +---+--+
> # |MyFloat|MyStruct                      |
> # +---+--+
> # |10.1   |{MyInt -> 10, MyFloat -> 10.1}|
> # +---+--+ {code}
> Note MyFloat inside MyStruct is set to null in the first example. 
> Interestingly enough, when I do the same with Row, or if I specify the 
> schema, then this does not happen (second example).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38846) Teradata's Number is either converted to its floor value or ceiling value despite its fractional part.

2022-04-10 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520249#comment-17520249
 ] 

Hyukjin Kwon commented on SPARK-38846:
--

Spark 2.x is EOL. Mind trying Spark 3+ out?

> Teradata's Number is either converted to its floor value or ceiling value 
> despite its fractional part.
> --
>
> Key: SPARK-38846
> URL: https://issues.apache.org/jira/browse/SPARK-38846
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: Spark2.3.0 on Yarn
> Teradata 16.20.32.59
>Reporter: eugene
>Priority: Major
>
> I'm trying to load data from Teradata, the code using is:
>     
>     sparkSession.read
>           .format("jdbc")
>           .options(
>             Map(
>               "url" -> "jdbc:teradata://hostname, user=$username, 
> password=$password",
>               "MAYBENULL" -> "ON",
>               "SIP_SUPPORT" -> "ON",
>               "driver" -> "com.teradata.jdbc.TeraDriver",
>               "dbtable" -> $table_name
>             )
>           )
>           .load()
> However, some data lost its fractional part after loading. To be more 
> concise, the column in Teradata is in the type of [Number][1] and after 
> loading, the data type in Spark is `DecimalType(38,0)`, the scale value is 0 
> which means no digits after decimal point. 
> Data in Teradata is something like,
>     id column1 column2
>     1   50.23    100.23
>     2   25.8     20.669
>     3   30.2     19.23
> The `dataframe` of Spark is like,
>     id column1 column2
>     1   50     100
>     2   26     21
>     3   30     19
> The meta data of the table in Teradata is like:
>     CREATE SET TABLE table_name (id BIGINT, column1 NUMBER, column2 NUMBER) 
> PRIMARY INDEX (id);
> The Spark version is 2.3.0 and Teradata is 16.20.32.59. 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38849) How to do load balancing of spark thrift server ?

2022-04-10 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520248#comment-17520248
 ] 

Hyukjin Kwon commented on SPARK-38849:
--

For questions, let's interact Spark mailing list before filing it here.

> How to do load balancing of spark thrift server ?
> -
>
> Key: SPARK-38849
> URL: https://issues.apache.org/jira/browse/SPARK-38849
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: ramakrishna chilaka
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38849) How to do load balancing of spark thrift server ?

2022-04-10 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38849.
--
Resolution: Invalid

> How to do load balancing of spark thrift server ?
> -
>
> Key: SPARK-38849
> URL: https://issues.apache.org/jira/browse/SPARK-38849
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: ramakrishna chilaka
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38849) How to do load balancing of spark thrift server ?

2022-04-10 Thread ramakrishna chilaka (Jira)

ramakrishna chilaka created SPARK-38849:
---

 Summary: How to do load balancing of spark thrift server ?
 Key: SPARK-38849
 URL: https://issues.apache.org/jira/browse/SPARK-38849
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: ramakrishna chilaka






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38807) Error when starting spark shell on Windows system

2022-04-10 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520187#comment-17520187
 ] 

Apache Spark commented on SPARK-38807:
--

User '1104056452' has created a pull request for this issue:
https://github.com/apache/spark/pull/36134

> Error when starting spark shell on Windows system
> -
>
> Key: SPARK-38807
> URL: https://issues.apache.org/jira/browse/SPARK-38807
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Ming Li
>Priority: Major
>
> Using the release version of spark-3.2.1  and the default configuration, 
> starting spark shell on Windows system fails. (spark 3.1.2 doesn't show this 
> issue)
> Here is the stack trace of the exception:
> {code:java}
> 22/04/06 21:47:45 ERROR SparkContext: Error initializing SparkContext.
> java.lang.reflect.InvocationTargetException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         ...
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.net.URISyntaxException: Illegal character in path at index 
> 30: spark://192.168.X.X:56964/F:\classes
>         at java.net.URI$Parser.fail(URI.java:2845)
>         at java.net.URI$Parser.checkChars(URI.java:3018)
>         at java.net.URI$Parser.parseHierarchical(URI.java:3102)
>         at java.net.URI$Parser.parse(URI.java:3050)
>         at java.net.URI.(URI.java:588)
>         at 
> org.apache.spark.repl.ExecutorClassLoader.(ExecutorClassLoader.scala:57)
>         ... 70 more
> 22/04/06 21:47:45 ERROR Utils: Uncaught exception in thread main
> java.lang.NullPointerException
> ... {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38807) Error when starting spark shell on Windows system

2022-04-10 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520186#comment-17520186
 ] 

Apache Spark commented on SPARK-38807:
--

User '1104056452' has created a pull request for this issue:
https://github.com/apache/spark/pull/36134

> Error when starting spark shell on Windows system
> -
>
> Key: SPARK-38807
> URL: https://issues.apache.org/jira/browse/SPARK-38807
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Ming Li
>Priority: Major
>
> Using the release version of spark-3.2.1  and the default configuration, 
> starting spark shell on Windows system fails. (spark 3.1.2 doesn't show this 
> issue)
> Here is the stack trace of the exception:
> {code:java}
> 22/04/06 21:47:45 ERROR SparkContext: Error initializing SparkContext.
> java.lang.reflect.InvocationTargetException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         ...
>         at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>         at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>         at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>         at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.net.URISyntaxException: Illegal character in path at index 
> 30: spark://192.168.X.X:56964/F:\classes
>         at java.net.URI$Parser.fail(URI.java:2845)
>         at java.net.URI$Parser.checkChars(URI.java:3018)
>         at java.net.URI$Parser.parseHierarchical(URI.java:3102)
>         at java.net.URI$Parser.parse(URI.java:3050)
>         at java.net.URI.(URI.java:588)
>         at 
> org.apache.spark.repl.ExecutorClassLoader.(ExecutorClassLoader.scala:57)
>         ... 70 more
> 22/04/06 21:47:45 ERROR Utils: Uncaught exception in thread main
> java.lang.NullPointerException
> ... {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38848) Replcace all `@Test(expected = XXException)` with assertThrows

2022-04-10 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38848:


Assignee: (was: Apache Spark)

> Replcace all `@Test(expected = XXException)` with assertThrows
> --
>
> Key: SPARK-38848
> URL: https://issues.apache.org/jira/browse/SPARK-38848
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> {{@Test}} no longer has {{expected parameters in Junit 5, use assertThrows}} 
> instead



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38848) Replcace all `@Test(expected = XXException)` with assertThrows

2022-04-10 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520172#comment-17520172
 ] 

Apache Spark commented on SPARK-38848:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36133

> Replcace all `@Test(expected = XXException)` with assertThrows
> --
>
> Key: SPARK-38848
> URL: https://issues.apache.org/jira/browse/SPARK-38848
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> {{@Test}} no longer has {{expected parameters in Junit 5, use assertThrows}} 
> instead



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38848) Replcace all `@Test(expected = XXException)` with assertThrows

2022-04-10 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520173#comment-17520173
 ] 

Apache Spark commented on SPARK-38848:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36133

> Replcace all `@Test(expected = XXException)` with assertThrows
> --
>
> Key: SPARK-38848
> URL: https://issues.apache.org/jira/browse/SPARK-38848
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> {{@Test}} no longer has {{expected parameters in Junit 5, use assertThrows}} 
> instead



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38848) Replcace all `@Test(expected = XXException)` with assertThrows

2022-04-10 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38848:


Assignee: Apache Spark

> Replcace all `@Test(expected = XXException)` with assertThrows
> --
>
> Key: SPARK-38848
> URL: https://issues.apache.org/jira/browse/SPARK-38848
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> {{@Test}} no longer has {{expected parameters in Junit 5, use assertThrows}} 
> instead



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38848) Replcace all `@Test(expected = XXException)` with assertThrows

2022-04-10 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-38848:
-
Description: {{@Test}} no longer has {{expected parameters in Junit 5, use 
assertThrows}} instead  (was: {{@Test}} no longer has {{expected }}parameters 
in Junit 5, use{{ assertThrows}} instead)

> Replcace all `@Test(expected = XXException)` with assertThrows
> --
>
> Key: SPARK-38848
> URL: https://issues.apache.org/jira/browse/SPARK-38848
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> {{@Test}} no longer has {{expected parameters in Junit 5, use assertThrows}} 
> instead



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38848) Replcace all `@Test(expected = XXException)` with assertThrows

2022-04-10 Thread Yang Jie (Jira)

Yang Jie created SPARK-38848:


 Summary: Replcace all `@Test(expected = XXException)` with 
assertThrows
 Key: SPARK-38848
 URL: https://issues.apache.org/jira/browse/SPARK-38848
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 3.4.0
Reporter: Yang Jie


{{@Test}} no longer has {{expected }}parameters in Junit 5, use{{ 
assertThrows}} instead



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38847) Introduce a `viewToSeq` function for `KVUtils`

2022-04-10 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38847:


Assignee: (was: Apache Spark)

> Introduce a `viewToSeq` function for `KVUtils`
> --
>
> Key: SPARK-38847
> URL: https://issues.apache.org/jira/browse/SPARK-38847
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> There are many codes in spark that convert KVStoreView into `List`, and these 
> codes will not close `KVStoreIterator`, these resources are mainly recycled 
> by `finalize()` method implemented in `LevelDB` and `RockSB`, this makes 
> `KVStoreIterator` resource recycling unpredictable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38847) Introduce a `viewToSeq` function for `KVUtils`

2022-04-10 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38847:


Assignee: Apache Spark

> Introduce a `viewToSeq` function for `KVUtils`
> --
>
> Key: SPARK-38847
> URL: https://issues.apache.org/jira/browse/SPARK-38847
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> There are many codes in spark that convert KVStoreView into `List`, and these 
> codes will not close `KVStoreIterator`, these resources are mainly recycled 
> by `finalize()` method implemented in `LevelDB` and `RockSB`, this makes 
> `KVStoreIterator` resource recycling unpredictable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38847) Introduce a `viewToSeq` function for `KVUtils`

2022-04-10 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520171#comment-17520171
 ] 

Apache Spark commented on SPARK-38847:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36132

> Introduce a `viewToSeq` function for `KVUtils`
> --
>
> Key: SPARK-38847
> URL: https://issues.apache.org/jira/browse/SPARK-38847
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> There are many codes in spark that convert KVStoreView into `List`, and these 
> codes will not close `KVStoreIterator`, these resources are mainly recycled 
> by `finalize()` method implemented in `LevelDB` and `RockSB`, this makes 
> `KVStoreIterator` resource recycling unpredictable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38847) Introduce a `viewToSeq` function for `KVUtils`

2022-04-10 Thread Yang Jie (Jira)

Yang Jie created SPARK-38847:


 Summary: Introduce a `viewToSeq` function for `KVUtils`
 Key: SPARK-38847
 URL: https://issues.apache.org/jira/browse/SPARK-38847
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Affects Versions: 3.4.0
Reporter: Yang Jie


There are many codes in spark that convert KVStoreView into `List`, and these 
codes will not close `KVStoreIterator`, these resources are mainly recycled by 
`finalize()` method implemented in `LevelDB` and `RockSB`, this makes 
`KVStoreIterator` resource recycling unpredictable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38846) Teradata's Number is either converted to its floor value or ceiling value despite its fractional part.

2022-04-10 Thread eugene (Jira)

eugene created SPARK-38846:
--

 Summary: Teradata's Number is either converted to its floor value 
or ceiling value despite its fractional part.
 Key: SPARK-38846
 URL: https://issues.apache.org/jira/browse/SPARK-38846
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
 Environment: Spark2.3.0 on Yarn

Teradata 16.20.32.59
Reporter: eugene


I'm trying to load data from Teradata, the code using is:
    

    sparkSession.read
          .format("jdbc")
          .options(
            Map(
              "url" -> "jdbc:teradata://hostname, user=$username, 
password=$password",
              "MAYBENULL" -> "ON",
              "SIP_SUPPORT" -> "ON",
              "driver" -> "com.teradata.jdbc.TeraDriver",
              "dbtable" -> $table_name
            )
          )
          .load()

However, some data lost its fractional part after loading. To be more concise, 
the column in Teradata is in the type of [Number][1] and after loading, the 
data type in Spark is `DecimalType(38,0)`, the scale value is 0 which means no 
digits after decimal point. 

Data in Teradata is something like,

    id column1 column2
    1   50.23    100.23
    2   25.8     20.669
    3   30.2     19.23

The `dataframe` of Spark is like,

    id column1 column2
    1   50     100
    2   26     21
    3   30     19

The meta data of the table in Teradata is like:

    CREATE SET TABLE table_name (id BIGINT, column1 NUMBER, column2 NUMBER) 
PRIMARY INDEX (id);

The Spark version is 2.3.0 and Teradata is 16.20.32.59. 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38565) Support Left Semi join in row level runtime filters

2022-04-10 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38565:


Assignee: (was: Apache Spark)

> Support Left Semi join in row level runtime filters
> ---
>
> Key: SPARK-38565
> URL: https://issues.apache.org/jira/browse/SPARK-38565
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Abhishek Somani
>Priority: Major
>
> Support Left Semi join in the runtime filtering as well.
> This is a follow up to https://issues.apache.org/jira/browse/SPARK-32268 once 
> [https://github.com/apache/spark/pull/35789] is merged.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38565) Support Left Semi join in row level runtime filters

2022-04-10 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520167#comment-17520167
 ] 

Apache Spark commented on SPARK-38565:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/36131

> Support Left Semi join in row level runtime filters
> ---
>
> Key: SPARK-38565
> URL: https://issues.apache.org/jira/browse/SPARK-38565
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Abhishek Somani
>Priority: Major
>
> Support Left Semi join in the runtime filtering as well.
> This is a follow up to https://issues.apache.org/jira/browse/SPARK-32268 once 
> [https://github.com/apache/spark/pull/35789] is merged.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38565) Support Left Semi join in row level runtime filters

2022-04-10 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38565:


Assignee: Apache Spark

> Support Left Semi join in row level runtime filters
> ---
>
> Key: SPARK-38565
> URL: https://issues.apache.org/jira/browse/SPARK-38565
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Abhishek Somani
>Assignee: Apache Spark
>Priority: Major
>
> Support Left Semi join in the runtime filtering as well.
> This is a follow up to https://issues.apache.org/jira/browse/SPARK-32268 once 
> [https://github.com/apache/spark/pull/35789] is merged.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3723) DecisionTree, RandomForest: Add more instrumentation

2022-04-10 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520162#comment-17520162
 ] 

Apache Spark commented on SPARK-3723:
-

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/36130

> DecisionTree, RandomForest: Add more instrumentation
> 
>
> Key: SPARK-3723
> URL: https://issues.apache.org/jira/browse/SPARK-3723
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>  Labels: bulk-closed
>
> Some simple instrumentation would help advanced users understand performance, 
> and to check whether parameters (such as maxMemoryInMB) need to be tuned.
> Most important instrumentation (simple):
> * min, avg, max nodes per group
> * number of groups (passes over data)
> More advanced instrumentation:
> * For each tree (or averaged over trees), training set accuracy after 
> training each level.  This would be useful for visualizing learning behavior 
> (to convince oneself that model selection was being done correctly).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3723) DecisionTree, RandomForest: Add more instrumentation

2022-04-10 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520161#comment-17520161
 ] 

Apache Spark commented on SPARK-3723:
-

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/36130

> DecisionTree, RandomForest: Add more instrumentation
> 
>
> Key: SPARK-3723
> URL: https://issues.apache.org/jira/browse/SPARK-3723
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>  Labels: bulk-closed
>
> Some simple instrumentation would help advanced users understand performance, 
> and to check whether parameters (such as maxMemoryInMB) need to be tuned.
> Most important instrumentation (simple):
> * min, avg, max nodes per group
> * number of groups (passes over data)
> More advanced instrumentation:
> * For each tree (or averaged over trees), training set accuracy after 
> training each level.  This would be useful for visualizing learning behavior 
> (to convince oneself that model selection was being done correctly).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37234) Inline type hints for python/pyspark/mllib/stat/_statistics.py

2022-04-10 Thread Maciej Szymkiewicz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz resolved SPARK-37234.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34513
[https://github.com/apache/spark/pull/34513]

> Inline type hints for python/pyspark/mllib/stat/_statistics.py
> --
>
> Key: SPARK-37234
> URL: https://issues.apache.org/jira/browse/SPARK-37234
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: dch nguyen
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37234) Inline type hints for python/pyspark/mllib/stat/_statistics.py

2022-04-10 Thread Maciej Szymkiewicz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz reassigned SPARK-37234:
--

Assignee: dch nguyen

> Inline type hints for python/pyspark/mllib/stat/_statistics.py
> --
>
> Key: SPARK-37234
> URL: https://issues.apache.org/jira/browse/SPARK-37234
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38108) Use error classes in the compilation errors of UDF/UDAF

2022-04-10 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-38108:


Assignee: huangtengfei

> Use error classes in the compilation errors of UDF/UDAF
> ---
>
> Key: SPARK-38108
> URL: https://issues.apache.org/jira/browse/SPARK-38108
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: huangtengfei
>Priority: Major
>
> Migrate the following errors in QueryCompilationErrors:
> * noHandlerForUDAFError
> * unexpectedEvalTypesForUDFsError
> * usingUntypedScalaUDFError
> * udfClassDoesNotImplementAnyUDFInterfaceError
> * udfClassNotAllowedToImplementMultiUDFInterfacesError
> * udfClassWithTooManyTypeArgumentsError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38108) Use error classes in the compilation errors of UDF/UDAF

2022-04-10 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-38108.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36064
[https://github.com/apache/spark/pull/36064]

> Use error classes in the compilation errors of UDF/UDAF
> ---
>
> Key: SPARK-38108
> URL: https://issues.apache.org/jira/browse/SPARK-38108
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: huangtengfei
>Priority: Major
> Fix For: 3.4.0
>
>
> Migrate the following errors in QueryCompilationErrors:
> * noHandlerForUDAFError
> * unexpectedEvalTypesForUDFsError
> * usingUntypedScalaUDFError
> * udfClassDoesNotImplementAnyUDFInterfaceError
> * udfClassNotAllowedToImplementMultiUDFInterfacesError
> * udfClassWithTooManyTypeArgumentsError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38845) SparkContext init before SparkSession will cause hive table not found

2022-04-10 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38845:


Assignee: (was: Apache Spark)

> SparkContext init before SparkSession will cause hive table not found
> -
>
> Key: SPARK-38845
> URL: https://issues.apache.org/jira/browse/SPARK-38845
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.1
>Reporter: wangshengjie
>Priority: Major
>
> If we init SparkContext before SparkSession when using DataFrame to query 
> hive table, if will throw table not found exception.
> {code:java}
> //代码占位符
> val sparkContext = new SparkContext()
> val sparkSession = SparkSession
>   .builder
>   .appName("SparkSession Test")
>   .enableHiveSupport()
>   .getOrCreate()
> sparkSession.sql("select * from tableA"){code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38845) SparkContext init before SparkSession will cause hive table not found

2022-04-10 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38845:


Assignee: Apache Spark

> SparkContext init before SparkSession will cause hive table not found
> -
>
> Key: SPARK-38845
> URL: https://issues.apache.org/jira/browse/SPARK-38845
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.1
>Reporter: wangshengjie
>Assignee: Apache Spark
>Priority: Major
>
> If we init SparkContext before SparkSession when using DataFrame to query 
> hive table, if will throw table not found exception.
> {code:java}
> //代码占位符
> val sparkContext = new SparkContext()
> val sparkSession = SparkSession
>   .builder
>   .appName("SparkSession Test")
>   .enableHiveSupport()
>   .getOrCreate()
> sparkSession.sql("select * from tableA"){code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38845) SparkContext init before SparkSession will cause hive table not found

2022-04-10 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520121#comment-17520121
 ] 

Apache Spark commented on SPARK-38845:
--

User 'wangshengjie123' has created a pull request for this issue:
https://github.com/apache/spark/pull/36129

> SparkContext init before SparkSession will cause hive table not found
> -
>
> Key: SPARK-38845
> URL: https://issues.apache.org/jira/browse/SPARK-38845
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.1
>Reporter: wangshengjie
>Priority: Major
>
> If we init SparkContext before SparkSession when using DataFrame to query 
> hive table, if will throw table not found exception.
> {code:java}
> //代码占位符
> val sparkContext = new SparkContext()
> val sparkSession = SparkSession
>   .builder
>   .appName("SparkSession Test")
>   .enableHiveSupport()
>   .getOrCreate()
> sparkSession.sql("select * from tableA"){code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38845) SparkContext init before SparkSession will cause hive table not found

2022-04-10 Thread wangshengjie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangshengjie updated SPARK-38845:
-
Environment: (was: If we init SparkContext before SparkSession when 
using DataFrame to query hive table, if will throw table not found exception.
{code:java}
//代码占位符
val sparkContext = new SparkContext()

val sparkSession = SparkSession
  .builder
  .appName("SparkSession Test")
  .enableHiveSupport()
  .getOrCreate()

sparkSession.sql("select * from tableA"){code})

> SparkContext init before SparkSession will cause hive table not found
> -
>
> Key: SPARK-38845
> URL: https://issues.apache.org/jira/browse/SPARK-38845
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.1
>Reporter: wangshengjie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38845) SparkContext init before SparkSession will cause hive table not found

2022-04-10 Thread wangshengjie (Jira)

wangshengjie created SPARK-38845:


 Summary: SparkContext init before SparkSession will cause hive 
table not found
 Key: SPARK-38845
 URL: https://issues.apache.org/jira/browse/SPARK-38845
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.1, 3.1.2
 Environment: If we init SparkContext before SparkSession when using 
DataFrame to query hive table, if will throw table not found exception.
{code:java}
//代码占位符
val sparkContext = new SparkContext()

val sparkSession = SparkSession
  .builder
  .appName("SparkSession Test")
  .enableHiveSupport()
  .getOrCreate()

sparkSession.sql("select * from tableA"){code}
Reporter: wangshengjie






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38845) SparkContext init before SparkSession will cause hive table not found

2022-04-10 Thread wangshengjie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangshengjie updated SPARK-38845:
-
Description: 
If we init SparkContext before SparkSession when using DataFrame to query hive 
table, if will throw table not found exception.
{code:java}
//代码占位符
val sparkContext = new SparkContext()

val sparkSession = SparkSession
  .builder
  .appName("SparkSession Test")
  .enableHiveSupport()
  .getOrCreate()

sparkSession.sql("select * from tableA"){code}

> SparkContext init before SparkSession will cause hive table not found
> -
>
> Key: SPARK-38845
> URL: https://issues.apache.org/jira/browse/SPARK-38845
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.1
>Reporter: wangshengjie
>Priority: Major
>
> If we init SparkContext before SparkSession when using DataFrame to query 
> hive table, if will throw table not found exception.
> {code:java}
> //代码占位符
> val sparkContext = new SparkContext()
> val sparkSession = SparkSession
>   .builder
>   .appName("SparkSession Test")
>   .enableHiveSupport()
>   .getOrCreate()
> sparkSession.sql("select * from tableA"){code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19335) Spark should support doing an efficient DataFrame Upsert via JDBC

2022-04-10 Thread Chandramouli Viswanathan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520107#comment-17520107
 ] 

Chandramouli Viswanathan commented on SPARK-19335:
--

Hi

Is the issue resolved?
if yes, how can I get the sample implementation guide? to move forward.

 

Thanks In Advance

> Spark should support doing an efficient DataFrame Upsert via JDBC
> -
>
> Key: SPARK-19335
> URL: https://issues.apache.org/jira/browse/SPARK-19335
> Project: Spark
>  Issue Type: Improvement
>Reporter: Ilya Ganelin
>Priority: Minor
>
> Doing a database update, as opposed to an insert is useful, particularly when 
> working with streaming applications which may require revisions to previously 
> stored data. 
> Spark DataFrames/DataSets do not currently support an Update feature via the 
> JDBC Writer allowing only Overwrite or Append.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

87 matches

Mail list logo