[jira] [Updated] (SPARK-37960) A new framework to represent catalyst expressions in DS v2 APIs
[ https://issues.apache.org/jira/browse/SPARK-37960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37960: --- Parent: SPARK-38852 Issue Type: Sub-task (was: New Feature) > A new framework to represent catalyst expressions in DS v2 APIs > --- > > Key: SPARK-37960 > URL: https://issues.apache.org/jira/browse/SPARK-37960 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > > Spark need a new framework to represent catalyst expressions in DS v2 APIs. > CASE ... WHEN ... ELSE ... END is just the first use case. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37867) Compile aggregate functions of build-in JDBC dialect
[ https://issues.apache.org/jira/browse/SPARK-37867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37867: --- Parent: SPARK-38852 Issue Type: Sub-task (was: New Feature) > Compile aggregate functions of build-in JDBC dialect > > > Key: SPARK-37867 > URL: https://issues.apache.org/jira/browse/SPARK-37867 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37839) DS V2 supports partial aggregate push-down AVG
[ https://issues.apache.org/jira/browse/SPARK-37839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37839: --- Parent: SPARK-38852 Issue Type: Sub-task (was: New Feature) > DS V2 supports partial aggregate push-down AVG > -- > > Key: SPARK-37839 > URL: https://issues.apache.org/jira/browse/SPARK-37839 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > > Currently, DS V2 supports complete aggregate push-down AVG. But, supports > partial aggregate push-down for AVG is very useful. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37839) DS V2 supports partial aggregate push-down AVG
[ https://issues.apache.org/jira/browse/SPARK-37839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37839: --- Epic Link: (was: SPARK-38788) > DS V2 supports partial aggregate push-down AVG > -- > > Key: SPARK-37839 > URL: https://issues.apache.org/jira/browse/SPARK-37839 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > > Currently, DS V2 supports complete aggregate push-down AVG. But, supports > partial aggregate push-down for AVG is very useful. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37527) Translate more standard aggregate functions for pushdown
[ https://issues.apache.org/jira/browse/SPARK-37527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37527: --- Parent: SPARK-38852 Issue Type: Sub-task (was: Improvement) > Translate more standard aggregate functions for pushdown > > > Key: SPARK-37527 > URL: https://issues.apache.org/jira/browse/SPARK-37527 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > > Currently, Spark aggregate pushdown will translate some standard aggregate > functions, so that compile these functions suitable specify database. > After this job, users could override JdbcDialect.compileAggregate to > implement some aggregate functions supported by some database. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37734) Upgrade h2 from 1.4.195 to 2.0.202
[ https://issues.apache.org/jira/browse/SPARK-37734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37734: --- Parent: SPARK-38852 Issue Type: Sub-task (was: Dependency upgrade) > Upgrade h2 from 1.4.195 to 2.0.202 > -- > > Key: SPARK-37734 > URL: https://issues.apache.org/jira/browse/SPARK-37734 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > > Currently, the om.h2database exists 1 vulnerability, ref: > https://www.tenable.com/cve/CVE-2021-23463 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37732) Improve the implement of JDBCV2Suite
[ https://issues.apache.org/jira/browse/SPARK-37732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37732: --- Parent: SPARK-38852 Issue Type: Sub-task (was: New Feature) > Improve the implement of JDBCV2Suite > > > Key: SPARK-37732 > URL: https://issues.apache.org/jira/browse/SPARK-37732 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > > When I reading the implement of JDBCV2Suite, I find we can improve the code. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37644) Support datasource v2 complete aggregate pushdown
[ https://issues.apache.org/jira/browse/SPARK-37644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37644: --- Epic Link: (was: SPARK-38788) > Support datasource v2 complete aggregate pushdown > -- > > Key: SPARK-37644 > URL: https://issues.apache.org/jira/browse/SPARK-37644 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > > Currently , Spark supports push down aggregate with partial-agg and final-agg > . For some data source (e.g. JDBC ) , we can avoid partial-agg and final-agg > by running completely on database. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37644) Support datasource v2 complete aggregate pushdown
[ https://issues.apache.org/jira/browse/SPARK-37644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37644: --- Parent: SPARK-38852 Issue Type: Sub-task (was: New Feature) > Support datasource v2 complete aggregate pushdown > -- > > Key: SPARK-37644 > URL: https://issues.apache.org/jira/browse/SPARK-37644 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > > Currently , Spark supports push down aggregate with partial-agg and final-agg > . For some data source (e.g. JDBC ) , we can avoid partial-agg and final-agg > by running completely on database. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37483) Support push down top N to JDBC data source V2
[ https://issues.apache.org/jira/browse/SPARK-37483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37483: --- Epic Link: (was: SPARK-38788) > Support push down top N to JDBC data source V2 > -- > > Key: SPARK-37483 > URL: https://issues.apache.org/jira/browse/SPARK-37483 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37483) Support push down top N to JDBC data source V2
[ https://issues.apache.org/jira/browse/SPARK-37483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37483: --- Parent: SPARK-38852 Issue Type: Sub-task (was: New Feature) > Support push down top N to JDBC data source V2 > -- > > Key: SPARK-37483 > URL: https://issues.apache.org/jira/browse/SPARK-37483 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37286) Move compileAggregates from JDBCRDD to JdbcDialect
[ https://issues.apache.org/jira/browse/SPARK-37286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37286: --- Parent: SPARK-38852 Issue Type: Sub-task (was: Improvement) > Move compileAggregates from JDBCRDD to JdbcDialect > -- > > Key: SPARK-37286 > URL: https://issues.apache.org/jira/browse/SPARK-37286 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > > Currently, the method compileAggregates in JDBCRDD. But it is not reasonable, > because the JDBC source knowns how to compile aggregate expressions to > itself's dialect. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37212) Improve the implement of aggregate pushdown.
[ https://issues.apache.org/jira/browse/SPARK-37212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37212: --- Parent: SPARK-38852 Issue Type: Sub-task (was: Improvement) > Improve the implement of aggregate pushdown. > > > Key: SPARK-37212 > URL: https://issues.apache.org/jira/browse/SPARK-37212 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > > Spark SQL supported aggregate pushdown for JDBC. When I reading the current > implement, I find some little issue. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36574) pushDownPredicate=false should prevent push down filters to JDBC data source
[ https://issues.apache.org/jira/browse/SPARK-36574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-36574: --- Parent: SPARK-38852 Issue Type: Sub-task (was: Bug) > pushDownPredicate=false should prevent push down filters to JDBC data source > > > Key: SPARK-36574 > URL: https://issues.apache.org/jira/browse/SPARK-36574 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.2.0 > > > Spark SQL includes a data source that can read data from other databases > using JDBC. > Spark also supports the case-insensitive option pushDownPredicate. > According to http://spark.apache.org/docs/latest/sql-data-sources-jdbc.html, > If set pushDownPredicate to false, no filter will be pushed down to the JDBC > data source and thus all filters will be handled by Spark. > But I find it still be pushed down to JDBC data source. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38852) Better Data Source V2 operator pushdown framework
[ https://issues.apache.org/jira/browse/SPARK-38852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-38852: --- Description: Currently, Spark supports push down Filters and Aggregates to data source. However, the Data Source V2 operator pushdown framework has the following shortcomings: # Only simple filter and aggregate are supported, which makes it impossible to apply in most scenarios # The incompatibility of SQL syntax makes it impossible to apply in most scenarios # Aggregate push down does not support multiple partitions of data sources # Spark's additional aggregate will cause some overhead # Limit push down is not supported # Top n push down is not supported # Paging push down is not supported was: Currently, Spark supports push down Filters and Aggregates to data source. But, the # Only simple filter and aggregate are supported, which makes it impossible to apply in most scenarios # The incompatibility of SQL syntax makes it impossible to apply in most scenarios # Aggregate push down does not support multiple partitions of data sources # Spark's additional aggregate will cause some overhead # Limit push down is not supported # Top n push down is not supported # Paging push down is not supported > Better Data Source V2 operator pushdown framework > - > > Key: SPARK-38852 > URL: https://issues.apache.org/jira/browse/SPARK-38852 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Spark supports push down Filters and Aggregates to data source. > However, the Data Source V2 operator pushdown framework has the following > shortcomings: > # Only simple filter and aggregate are supported, which makes it impossible > to apply in most scenarios > # The incompatibility of SQL syntax makes it impossible to apply in most > scenarios > # Aggregate push down does not support multiple partitions of data sources > # Spark's additional aggregate will cause some overhead > # Limit push down is not supported > # Top n push down is not supported > # Paging push down is not supported -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38852) Better Data Source V2 operator pushdown framework
[ https://issues.apache.org/jira/browse/SPARK-38852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-38852: --- Description: Currently, Spark supports push down Filters and Aggregates to data source. But, the # Only simple filter and aggregate are supported, which makes it impossible to apply in most scenarios # The incompatibility of SQL syntax makes it impossible to apply in most scenarios # Aggregate push down does not support multiple partitions of data sources # Spark's additional aggregate will cause some overhead # Limit push down is not supported # Top n push down is not supported # Paging push down is not supported > Better Data Source V2 operator pushdown framework > - > > Key: SPARK-38852 > URL: https://issues.apache.org/jira/browse/SPARK-38852 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Spark supports push down Filters and Aggregates to data source. > But, the > # Only simple filter and aggregate are supported, which makes it impossible > to apply in most scenarios > # The incompatibility of SQL syntax makes it impossible to apply in most > scenarios > # Aggregate push down does not support multiple partitions of data sources > # Spark's additional aggregate will cause some overhead > # Limit push down is not supported > # Top n push down is not supported > # Paging push down is not supported -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38852) Better Data Source V2 operator pushdown framework
jiaan.geng created SPARK-38852: -- Summary: Better Data Source V2 operator pushdown framework Key: SPARK-38852 URL: https://issues.apache.org/jira/browse/SPARK-38852 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: jiaan.geng -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38678) Enable RocksDB tests on Apple Silicon on MacOS
[ https://issues.apache.org/jira/browse/SPARK-38678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520322#comment-17520322 ] Apache Spark commented on SPARK-38678: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/36139 > Enable RocksDB tests on Apple Silicon on MacOS > -- > > Key: SPARK-38678 > URL: https://issues.apache.org/jira/browse/SPARK-38678 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38851) Refactor `HistoryServerSuite` to add UTs for RocksDB
[ https://issues.apache.org/jira/browse/SPARK-38851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38851: Assignee: Apache Spark > Refactor `HistoryServerSuite` to add UTs for RocksDB > > > Key: SPARK-38851 > URL: https://issues.apache.org/jira/browse/SPARK-38851 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > HistoryServerSuite now only test leveldb backend -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38851) Refactor `HistoryServerSuite` to add UTs for RocksDB
[ https://issues.apache.org/jira/browse/SPARK-38851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38851: Assignee: (was: Apache Spark) > Refactor `HistoryServerSuite` to add UTs for RocksDB > > > Key: SPARK-38851 > URL: https://issues.apache.org/jira/browse/SPARK-38851 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > HistoryServerSuite now only test leveldb backend -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38851) Refactor `HistoryServerSuite` to add UTs for RocksDB
[ https://issues.apache.org/jira/browse/SPARK-38851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520314#comment-17520314 ] Apache Spark commented on SPARK-38851: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/36138 > Refactor `HistoryServerSuite` to add UTs for RocksDB > > > Key: SPARK-38851 > URL: https://issues.apache.org/jira/browse/SPARK-38851 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > HistoryServerSuite now only test leveldb backend -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38761) DS V2 supports push down misc non-aggregate functions
[ https://issues.apache.org/jira/browse/SPARK-38761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-38761: --- Assignee: jiaan.geng > DS V2 supports push down misc non-aggregate functions > - > > Key: SPARK-38761 > URL: https://issues.apache.org/jira/browse/SPARK-38761 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > > Currently, Spark have a lot misc non-aggregate functions of ANSI standard. > abs, > coalesce, > nullif, > when > DS V2 should supports push down these misc non-aggregate functions -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38851) Refactor `HistoryServerSuite` to add UTs for RocksDB
Yang Jie created SPARK-38851: Summary: Refactor `HistoryServerSuite` to add UTs for RocksDB Key: SPARK-38851 URL: https://issues.apache.org/jira/browse/SPARK-38851 Project: Spark Issue Type: Improvement Components: Tests Affects Versions: 3.4.0 Reporter: Yang Jie HistoryServerSuite now only test leveldb backend -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38761) DS V2 supports push down misc non-aggregate functions
[ https://issues.apache.org/jira/browse/SPARK-38761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-38761. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 36039 [https://github.com/apache/spark/pull/36039] > DS V2 supports push down misc non-aggregate functions > - > > Key: SPARK-38761 > URL: https://issues.apache.org/jira/browse/SPARK-38761 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > > Currently, Spark have a lot misc non-aggregate functions of ANSI standard. > abs, > coalesce, > nullif, > when > DS V2 should supports push down these misc non-aggregate functions -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38837) Implement `dropna` parameter of `SeriesGroupBy.value_counts`
[ https://issues.apache.org/jira/browse/SPARK-38837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-38837: - Fix Version/s: 3.3.0 > Implement `dropna` parameter of `SeriesGroupBy.value_counts` > > > Key: SPARK-38837 > URL: https://issues.apache.org/jira/browse/SPARK-38837 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.3.0, 3.4.0 > > > Implement `dropna` parameter of `SeriesGroupBy.value_counts` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38837) Implement `dropna` parameter of `SeriesGroupBy.value_counts`
[ https://issues.apache.org/jira/browse/SPARK-38837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-38837: Assignee: Xinrong Meng > Implement `dropna` parameter of `SeriesGroupBy.value_counts` > > > Key: SPARK-38837 > URL: https://issues.apache.org/jira/browse/SPARK-38837 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > Implement `dropna` parameter of `SeriesGroupBy.value_counts` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38837) Implement `dropna` parameter of `SeriesGroupBy.value_counts`
[ https://issues.apache.org/jira/browse/SPARK-38837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38837. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36093 [https://github.com/apache/spark/pull/36093] > Implement `dropna` parameter of `SeriesGroupBy.value_counts` > > > Key: SPARK-38837 > URL: https://issues.apache.org/jira/browse/SPARK-38837 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.4.0 > > > Implement `dropna` parameter of `SeriesGroupBy.value_counts` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38846) Teradata's Number is either converted to its floor value or ceiling value despite its fractional part.
[ https://issues.apache.org/jira/browse/SPARK-38846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520299#comment-17520299 ] eugene commented on SPARK-38846: [~hyukjin.kwon] Thanks, just tried with latest Spark version (Spark 3.2.1) , the issue is still there. > Teradata's Number is either converted to its floor value or ceiling value > despite its fractional part. > -- > > Key: SPARK-38846 > URL: https://issues.apache.org/jira/browse/SPARK-38846 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: Spark2.3.0 on Yarn > Teradata 16.20.32.59 >Reporter: eugene >Priority: Major > > I'm trying to load data from Teradata, the code using is: > > sparkSession.read > .format("jdbc") > .options( > Map( > "url" -> "jdbc:teradata://hostname, user=$username, > password=$password", > "MAYBENULL" -> "ON", > "SIP_SUPPORT" -> "ON", > "driver" -> "com.teradata.jdbc.TeraDriver", > "dbtable" -> $table_name > ) > ) > .load() > However, some data lost its fractional part after loading. To be more > concise, the column in Teradata is in the type of [Number][1] and after > loading, the data type in Spark is `DecimalType(38,0)`, the scale value is 0 > which means no digits after decimal point. > Data in Teradata is something like, > id column1 column2 > 1 50.23 100.23 > 2 25.8 20.669 > 3 30.2 19.23 > The `dataframe` of Spark is like, > id column1 column2 > 1 50 100 > 2 26 21 > 3 30 19 > The meta data of the table in Teradata is like: > CREATE SET TABLE table_name (id BIGINT, column1 NUMBER, column2 NUMBER) > PRIMARY INDEX (id); > The Spark version is 2.3.0 and Teradata is 16.20.32.59. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38829) New configuration for controlling timestamp inference of Parquet
[ https://issues.apache.org/jira/browse/SPARK-38829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520297#comment-17520297 ] Apache Spark commented on SPARK-38829: -- User 'sadikovi' has created a pull request for this issue: https://github.com/apache/spark/pull/36137 > New configuration for controlling timestamp inference of Parquet > > > Key: SPARK-38829 > URL: https://issues.apache.org/jira/browse/SPARK-38829 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Ivan Sadikov >Priority: Major > > A new SQL conf which can fallback to the behavior that reads all the Parquet > Timestamp column as TimestampType. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38829) New configuration for controlling timestamp inference of Parquet
[ https://issues.apache.org/jira/browse/SPARK-38829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38829: Assignee: Apache Spark (was: Ivan Sadikov) > New configuration for controlling timestamp inference of Parquet > > > Key: SPARK-38829 > URL: https://issues.apache.org/jira/browse/SPARK-38829 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > A new SQL conf which can fallback to the behavior that reads all the Parquet > Timestamp column as TimestampType. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38829) New configuration for controlling timestamp inference of Parquet
[ https://issues.apache.org/jira/browse/SPARK-38829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520296#comment-17520296 ] Ivan Sadikov commented on SPARK-38829: -- I opened [https://github.com/apache/spark/pull/36137] to disable TimestampNTZType support in Parquet in 3.3. > New configuration for controlling timestamp inference of Parquet > > > Key: SPARK-38829 > URL: https://issues.apache.org/jira/browse/SPARK-38829 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Ivan Sadikov >Priority: Major > > A new SQL conf which can fallback to the behavior that reads all the Parquet > Timestamp column as TimestampType. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38829) New configuration for controlling timestamp inference of Parquet
[ https://issues.apache.org/jira/browse/SPARK-38829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38829: Assignee: Ivan Sadikov (was: Apache Spark) > New configuration for controlling timestamp inference of Parquet > > > Key: SPARK-38829 > URL: https://issues.apache.org/jira/browse/SPARK-38829 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Ivan Sadikov >Priority: Major > > A new SQL conf which can fallback to the behavior that reads all the Parquet > Timestamp column as TimestampType. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38829) New configuration for controlling timestamp inference of Parquet
[ https://issues.apache.org/jira/browse/SPARK-38829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520295#comment-17520295 ] Apache Spark commented on SPARK-38829: -- User 'sadikovi' has created a pull request for this issue: https://github.com/apache/spark/pull/36137 > New configuration for controlling timestamp inference of Parquet > > > Key: SPARK-38829 > URL: https://issues.apache.org/jira/browse/SPARK-38829 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Ivan Sadikov >Priority: Major > > A new SQL conf which can fallback to the behavior that reads all the Parquet > Timestamp column as TimestampType. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38845) SparkContext init before SparkSession will cause hive table not found
[ https://issues.apache.org/jira/browse/SPARK-38845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520274#comment-17520274 ] wangshengjie commented on SPARK-38845: -- Duplicate issue, close this issue > SparkContext init before SparkSession will cause hive table not found > - > > Key: SPARK-38845 > URL: https://issues.apache.org/jira/browse/SPARK-38845 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.1 >Reporter: wangshengjie >Priority: Major > > If we init SparkContext before SparkSession when using DataFrame to query > hive table, if will throw table not found exception. > {code:java} > //代码占位符 > val sparkContext = new SparkContext() > val sparkSession = SparkSession > .builder > .appName("SparkSession Test") > .enableHiveSupport() > .getOrCreate() > sparkSession.sql("select * from tableA"){code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38845) SparkContext init before SparkSession will cause hive table not found
[ https://issues.apache.org/jira/browse/SPARK-38845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangshengjie resolved SPARK-38845. -- Resolution: Duplicate > SparkContext init before SparkSession will cause hive table not found > - > > Key: SPARK-38845 > URL: https://issues.apache.org/jira/browse/SPARK-38845 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.1 >Reporter: wangshengjie >Priority: Major > > If we init SparkContext before SparkSession when using DataFrame to query > hive table, if will throw table not found exception. > {code:java} > //代码占位符 > val sparkContext = new SparkContext() > val sparkSession = SparkSession > .builder > .appName("SparkSession Test") > .enableHiveSupport() > .getOrCreate() > sparkSession.sql("select * from tableA"){code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38708) Upgrade Hive Metastore Client to the 3.1.3 for Hive 3.1
[ https://issues.apache.org/jira/browse/SPARK-38708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-38708. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36018 [https://github.com/apache/spark/pull/36018] > Upgrade Hive Metastore Client to the 3.1.3 for Hive 3.1 > --- > > Key: SPARK-38708 > URL: https://issues.apache.org/jira/browse/SPARK-38708 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38708) Upgrade Hive Metastore Client to the 3.1.3 for Hive 3.1
[ https://issues.apache.org/jira/browse/SPARK-38708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-38708: - Assignee: Yuming Wang > Upgrade Hive Metastore Client to the 3.1.3 for Hive 3.1 > --- > > Key: SPARK-38708 > URL: https://issues.apache.org/jira/browse/SPARK-38708 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38767) Support ignoreCorruptFiles and ignoreMissingFiles in Data Source options
[ https://issues.apache.org/jira/browse/SPARK-38767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-38767: -- Affects Version/s: 3.4.0 (was: 3.2.1) > Support ignoreCorruptFiles and ignoreMissingFiles in Data Source options > > > Key: SPARK-38767 > URL: https://issues.apache.org/jira/browse/SPARK-38767 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yaohua Zhao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38767) Support ignoreCorruptFiles and ignoreMissingFiles in Data Source options
[ https://issues.apache.org/jira/browse/SPARK-38767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-38767: -- Priority: Minor (was: Major) > Support ignoreCorruptFiles and ignoreMissingFiles in Data Source options > > > Key: SPARK-38767 > URL: https://issues.apache.org/jira/browse/SPARK-38767 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yaohua Zhao >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38850) Upgrade Kafka to 3.1.1
[ https://issues.apache.org/jira/browse/SPARK-38850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38850: Assignee: (was: Apache Spark) > Upgrade Kafka to 3.1.1 > -- > > Key: SPARK-38850 > URL: https://issues.apache.org/jira/browse/SPARK-38850 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38850) Upgrade Kafka to 3.1.1
[ https://issues.apache.org/jira/browse/SPARK-38850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520261#comment-17520261 ] Apache Spark commented on SPARK-38850: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/36135 > Upgrade Kafka to 3.1.1 > -- > > Key: SPARK-38850 > URL: https://issues.apache.org/jira/browse/SPARK-38850 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38850) Upgrade Kafka to 3.1.1
[ https://issues.apache.org/jira/browse/SPARK-38850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38850: Assignee: Apache Spark > Upgrade Kafka to 3.1.1 > -- > > Key: SPARK-38850 > URL: https://issues.apache.org/jira/browse/SPARK-38850 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38800) Explicitly document the supported pandas version.
[ https://issues.apache.org/jira/browse/SPARK-38800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-38800: -- Fix Version/s: 3.3.0 (was: 3.4.0) > Explicitly document the supported pandas version. > - > > Key: SPARK-38800 > URL: https://issues.apache.org/jira/browse/SPARK-38800 > Project: Spark > Issue Type: Test > Components: Documentation, PySpark >Affects Versions: 3.3.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.3.0 > > > pandas has different behavior per each version for some APIs. > So, we should explicitly follow one pandas version for one pandas-on-Spark > version and document it. > For example, if some APIs follow the behavior of pandas 1.3 whereas some APIs > follow the behavior of pandas 1.4, users may be confused. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38850) Upgrade Kafka to 3.1.1
Dongjoon Hyun created SPARK-38850: - Summary: Upgrade Kafka to 3.1.1 Key: SPARK-38850 URL: https://issues.apache.org/jira/browse/SPARK-38850 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.3.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38830) Warn corrupted Netty RPC messages
[ https://issues.apache.org/jira/browse/SPARK-38830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-38830: - Assignee: Dongjoon Hyun > Warn corrupted Netty RPC messages > - > > Key: SPARK-38830 > URL: https://issues.apache.org/jira/browse/SPARK-38830 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38830) Warn corrupted Netty RPC messages
[ https://issues.apache.org/jira/browse/SPARK-38830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-38830. --- Fix Version/s: 3.3.0 3.2.2 Resolution: Fixed Issue resolved by pull request 36116 [https://github.com/apache/spark/pull/36116] > Warn corrupted Netty RPC messages > - > > Key: SPARK-38830 > URL: https://issues.apache.org/jira/browse/SPARK-38830 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0, 3.2.2 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38830) Warn on corrupted block messages
[ https://issues.apache.org/jira/browse/SPARK-38830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-38830: -- Summary: Warn on corrupted block messages (was: Warn corrupted Netty RPC messages) > Warn on corrupted block messages > > > Key: SPARK-38830 > URL: https://issues.apache.org/jira/browse/SPARK-38830 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0, 3.2.2 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38830) Warn on corrupted block messages
[ https://issues.apache.org/jira/browse/SPARK-38830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-38830: -- Affects Version/s: 3.2.1 > Warn on corrupted block messages > > > Key: SPARK-38830 > URL: https://issues.apache.org/jira/browse/SPARK-38830 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1, 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0, 3.2.2 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36681) Fail to load Snappy codec
[ https://issues.apache.org/jira/browse/SPARK-36681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520254#comment-17520254 ] Apache Spark commented on SPARK-36681: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/36136 > Fail to load Snappy codec > - > > Key: SPARK-36681 > URL: https://issues.apache.org/jira/browse/SPARK-36681 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.3.0 > > > snappy-java as a native library should not be relocated in Hadoop shaded > client libraries. Currently we use Hadoop shaded client libraries in Spark. > If trying to use SnappyCodec to write sequence file, we will encounter the > following error: > {code} > [info] Cause: java.lang.UnsatisfiedLinkError: > org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Ljava/nio/ByteBuffer;IILjava/nio/ByteBuffer;I)I > [info] at > org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Native > Method) > > [info] at > org.apache.hadoop.shaded.org.xerial.snappy.Snappy.compress(Snappy.java:151) > > > [info] at > org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressDirectBuf(SnappyCompressor.java:282) > [info] at > org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:210) > [info] at > org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149) > [info] at > org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142) > [info] at > org.apache.hadoop.io.SequenceFile$BlockCompressWriter.writeBuffer(SequenceFile.java:1589) > > [info] at > org.apache.hadoop.io.SequenceFile$BlockCompressWriter.sync(SequenceFile.java:1605) > [info] at > org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1629) > > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36681) Fail to load Snappy codec
[ https://issues.apache.org/jira/browse/SPARK-36681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520255#comment-17520255 ] Apache Spark commented on SPARK-36681: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/36136 > Fail to load Snappy codec > - > > Key: SPARK-36681 > URL: https://issues.apache.org/jira/browse/SPARK-36681 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.3.0 > > > snappy-java as a native library should not be relocated in Hadoop shaded > client libraries. Currently we use Hadoop shaded client libraries in Spark. > If trying to use SnappyCodec to write sequence file, we will encounter the > following error: > {code} > [info] Cause: java.lang.UnsatisfiedLinkError: > org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Ljava/nio/ByteBuffer;IILjava/nio/ByteBuffer;I)I > [info] at > org.apache.hadoop.shaded.org.xerial.snappy.SnappyNative.rawCompress(Native > Method) > > [info] at > org.apache.hadoop.shaded.org.xerial.snappy.Snappy.compress(Snappy.java:151) > > > [info] at > org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressDirectBuf(SnappyCompressor.java:282) > [info] at > org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:210) > [info] at > org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149) > [info] at > org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142) > [info] at > org.apache.hadoop.io.SequenceFile$BlockCompressWriter.writeBuffer(SequenceFile.java:1589) > > [info] at > org.apache.hadoop.io.SequenceFile$BlockCompressWriter.sync(SequenceFile.java:1605) > [info] at > org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1629) > > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38831) How to enable encryption for checkpoint data?
[ https://issues.apache.org/jira/browse/SPARK-38831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520251#comment-17520251 ] Hyukjin Kwon commented on SPARK-38831: -- [~zoli] Let's interact with Spark mailing list for a question before filing it here. > How to enable encryption for checkpoint data? > - > > Key: SPARK-38831 > URL: https://issues.apache.org/jira/browse/SPARK-38831 > Project: Spark > Issue Type: Question > Components: Security >Affects Versions: 3.2.1 >Reporter: zoli >Priority: Major > > Setting spark.io.encryption.enabled to true as described here: > [https://spark.apache.org/docs/latest/security.html#local-storage-encryption > |https://spark.apache.org/docs/latest/security.html#local-storage-encryption]has > no effect at all on checkpoints. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38831) How to enable encryption for checkpoint data?
[ https://issues.apache.org/jira/browse/SPARK-38831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38831. -- Resolution: Invalid > How to enable encryption for checkpoint data? > - > > Key: SPARK-38831 > URL: https://issues.apache.org/jira/browse/SPARK-38831 > Project: Spark > Issue Type: Question > Components: Security >Affects Versions: 3.2.1 >Reporter: zoli >Priority: Major > > Setting spark.io.encryption.enabled to true as described here: > [https://spark.apache.org/docs/latest/security.html#local-storage-encryption > |https://spark.apache.org/docs/latest/security.html#local-storage-encryption]has > no effect at all on checkpoints. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38839) Creating a struct with a float inside
[ https://issues.apache.org/jira/browse/SPARK-38839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38839. -- Resolution: Duplicate > Creating a struct with a float inside > -- > > Key: SPARK-38839 > URL: https://issues.apache.org/jira/browse/SPARK-38839 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.1 >Reporter: Daniel deCordoba >Priority: Minor > > When creating a dataframe using createDataFrame that contains a float inside > a struct, the float is set to null. This only happens if using a list of > dictionaries as data type, if I use a list of Rows it works fine: > {code:java} > data = [{"MyStruct": {"MyInt": 10, "MyFloat": 10.1}, "MyFloat": 10.1}] > spark.createDataFrame(data).show() > # +---+--+ > # |MyFloat|MyStruct | > # +---+--+ > # |10.1 |{MyInt -> 10, MyFloat -> null}| > # +---+--+ > data = [Row(MyStruct=Row(MyInt=10, MyFloat=10.1), MyFloat=10.1)] > spark.createDataFrame(data).show() > # +---+--+ > # |MyFloat|MyStruct | > # +---+--+ > # |10.1 |{MyInt -> 10, MyFloat -> 10.1}| > # +---+--+ {code} > Note MyFloat inside MyStruct is set to null in the first example. > Interestingly enough, when I do the same with Row, or if I specify the > schema, then this does not happen (second example). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38839) Creating a struct with a float inside
[ https://issues.apache.org/jira/browse/SPARK-38839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520250#comment-17520250 ] Hyukjin Kwon commented on SPARK-38839: -- This is because by default the inner "{"MyInt": 10, "MyFloat": 10.1}" get inferred as a map: {code} root |-- MyFloat: double (nullable = true) |-- MyStruct: map (nullable = true) ||-- key: string ||-- value: long (valueContainsNull = true) {code} and since 10.1 is not a long, it becomes {{null}}. You can work around via setting {{spark.sql.pyspark.inferNestedDictAsStruct.enabled}} configuration to {{true}} from Spark 3.3.0. > Creating a struct with a float inside > -- > > Key: SPARK-38839 > URL: https://issues.apache.org/jira/browse/SPARK-38839 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.1 >Reporter: Daniel deCordoba >Priority: Minor > > When creating a dataframe using createDataFrame that contains a float inside > a struct, the float is set to null. This only happens if using a list of > dictionaries as data type, if I use a list of Rows it works fine: > {code:java} > data = [{"MyStruct": {"MyInt": 10, "MyFloat": 10.1}, "MyFloat": 10.1}] > spark.createDataFrame(data).show() > # +---+--+ > # |MyFloat|MyStruct | > # +---+--+ > # |10.1 |{MyInt -> 10, MyFloat -> null}| > # +---+--+ > data = [Row(MyStruct=Row(MyInt=10, MyFloat=10.1), MyFloat=10.1)] > spark.createDataFrame(data).show() > # +---+--+ > # |MyFloat|MyStruct | > # +---+--+ > # |10.1 |{MyInt -> 10, MyFloat -> 10.1}| > # +---+--+ {code} > Note MyFloat inside MyStruct is set to null in the first example. > Interestingly enough, when I do the same with Row, or if I specify the > schema, then this does not happen (second example). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38846) Teradata's Number is either converted to its floor value or ceiling value despite its fractional part.
[ https://issues.apache.org/jira/browse/SPARK-38846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520249#comment-17520249 ] Hyukjin Kwon commented on SPARK-38846: -- Spark 2.x is EOL. Mind trying Spark 3+ out? > Teradata's Number is either converted to its floor value or ceiling value > despite its fractional part. > -- > > Key: SPARK-38846 > URL: https://issues.apache.org/jira/browse/SPARK-38846 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: Spark2.3.0 on Yarn > Teradata 16.20.32.59 >Reporter: eugene >Priority: Major > > I'm trying to load data from Teradata, the code using is: > > sparkSession.read > .format("jdbc") > .options( > Map( > "url" -> "jdbc:teradata://hostname, user=$username, > password=$password", > "MAYBENULL" -> "ON", > "SIP_SUPPORT" -> "ON", > "driver" -> "com.teradata.jdbc.TeraDriver", > "dbtable" -> $table_name > ) > ) > .load() > However, some data lost its fractional part after loading. To be more > concise, the column in Teradata is in the type of [Number][1] and after > loading, the data type in Spark is `DecimalType(38,0)`, the scale value is 0 > which means no digits after decimal point. > Data in Teradata is something like, > id column1 column2 > 1 50.23 100.23 > 2 25.8 20.669 > 3 30.2 19.23 > The `dataframe` of Spark is like, > id column1 column2 > 1 50 100 > 2 26 21 > 3 30 19 > The meta data of the table in Teradata is like: > CREATE SET TABLE table_name (id BIGINT, column1 NUMBER, column2 NUMBER) > PRIMARY INDEX (id); > The Spark version is 2.3.0 and Teradata is 16.20.32.59. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38849) How to do load balancing of spark thrift server ?
[ https://issues.apache.org/jira/browse/SPARK-38849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520248#comment-17520248 ] Hyukjin Kwon commented on SPARK-38849: -- For questions, let's interact Spark mailing list before filing it here. > How to do load balancing of spark thrift server ? > - > > Key: SPARK-38849 > URL: https://issues.apache.org/jira/browse/SPARK-38849 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: ramakrishna chilaka >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38849) How to do load balancing of spark thrift server ?
[ https://issues.apache.org/jira/browse/SPARK-38849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38849. -- Resolution: Invalid > How to do load balancing of spark thrift server ? > - > > Key: SPARK-38849 > URL: https://issues.apache.org/jira/browse/SPARK-38849 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: ramakrishna chilaka >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38849) How to do load balancing of spark thrift server ?
ramakrishna chilaka created SPARK-38849: --- Summary: How to do load balancing of spark thrift server ? Key: SPARK-38849 URL: https://issues.apache.org/jira/browse/SPARK-38849 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 3.2.0 Reporter: ramakrishna chilaka -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38807) Error when starting spark shell on Windows system
[ https://issues.apache.org/jira/browse/SPARK-38807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520187#comment-17520187 ] Apache Spark commented on SPARK-38807: -- User '1104056452' has created a pull request for this issue: https://github.com/apache/spark/pull/36134 > Error when starting spark shell on Windows system > - > > Key: SPARK-38807 > URL: https://issues.apache.org/jira/browse/SPARK-38807 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Ming Li >Priority: Major > > Using the release version of spark-3.2.1 and the default configuration, > starting spark shell on Windows system fails. (spark 3.1.2 doesn't show this > issue) > Here is the stack trace of the exception: > {code:java} > 22/04/06 21:47:45 ERROR SparkContext: Error initializing SparkContext. > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > ... > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.URISyntaxException: Illegal character in path at index > 30: spark://192.168.X.X:56964/F:\classes > at java.net.URI$Parser.fail(URI.java:2845) > at java.net.URI$Parser.checkChars(URI.java:3018) > at java.net.URI$Parser.parseHierarchical(URI.java:3102) > at java.net.URI$Parser.parse(URI.java:3050) > at java.net.URI.(URI.java:588) > at > org.apache.spark.repl.ExecutorClassLoader.(ExecutorClassLoader.scala:57) > ... 70 more > 22/04/06 21:47:45 ERROR Utils: Uncaught exception in thread main > java.lang.NullPointerException > ... {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38807) Error when starting spark shell on Windows system
[ https://issues.apache.org/jira/browse/SPARK-38807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520186#comment-17520186 ] Apache Spark commented on SPARK-38807: -- User '1104056452' has created a pull request for this issue: https://github.com/apache/spark/pull/36134 > Error when starting spark shell on Windows system > - > > Key: SPARK-38807 > URL: https://issues.apache.org/jira/browse/SPARK-38807 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Ming Li >Priority: Major > > Using the release version of spark-3.2.1 and the default configuration, > starting spark shell on Windows system fails. (spark 3.1.2 doesn't show this > issue) > Here is the stack trace of the exception: > {code:java} > 22/04/06 21:47:45 ERROR SparkContext: Error initializing SparkContext. > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > ... > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.URISyntaxException: Illegal character in path at index > 30: spark://192.168.X.X:56964/F:\classes > at java.net.URI$Parser.fail(URI.java:2845) > at java.net.URI$Parser.checkChars(URI.java:3018) > at java.net.URI$Parser.parseHierarchical(URI.java:3102) > at java.net.URI$Parser.parse(URI.java:3050) > at java.net.URI.(URI.java:588) > at > org.apache.spark.repl.ExecutorClassLoader.(ExecutorClassLoader.scala:57) > ... 70 more > 22/04/06 21:47:45 ERROR Utils: Uncaught exception in thread main > java.lang.NullPointerException > ... {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38848) Replcace all `@Test(expected = XXException)` with assertThrows
[ https://issues.apache.org/jira/browse/SPARK-38848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38848: Assignee: (was: Apache Spark) > Replcace all `@Test(expected = XXException)` with assertThrows > -- > > Key: SPARK-38848 > URL: https://issues.apache.org/jira/browse/SPARK-38848 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > {{@Test}} no longer has {{expected parameters in Junit 5, use assertThrows}} > instead -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38848) Replcace all `@Test(expected = XXException)` with assertThrows
[ https://issues.apache.org/jira/browse/SPARK-38848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520172#comment-17520172 ] Apache Spark commented on SPARK-38848: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/36133 > Replcace all `@Test(expected = XXException)` with assertThrows > -- > > Key: SPARK-38848 > URL: https://issues.apache.org/jira/browse/SPARK-38848 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > {{@Test}} no longer has {{expected parameters in Junit 5, use assertThrows}} > instead -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38848) Replcace all `@Test(expected = XXException)` with assertThrows
[ https://issues.apache.org/jira/browse/SPARK-38848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520173#comment-17520173 ] Apache Spark commented on SPARK-38848: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/36133 > Replcace all `@Test(expected = XXException)` with assertThrows > -- > > Key: SPARK-38848 > URL: https://issues.apache.org/jira/browse/SPARK-38848 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > {{@Test}} no longer has {{expected parameters in Junit 5, use assertThrows}} > instead -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38848) Replcace all `@Test(expected = XXException)` with assertThrows
[ https://issues.apache.org/jira/browse/SPARK-38848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38848: Assignee: Apache Spark > Replcace all `@Test(expected = XXException)` with assertThrows > -- > > Key: SPARK-38848 > URL: https://issues.apache.org/jira/browse/SPARK-38848 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > {{@Test}} no longer has {{expected parameters in Junit 5, use assertThrows}} > instead -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38848) Replcace all `@Test(expected = XXException)` with assertThrows
[ https://issues.apache.org/jira/browse/SPARK-38848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-38848: - Description: {{@Test}} no longer has {{expected parameters in Junit 5, use assertThrows}} instead (was: {{@Test}} no longer has {{expected }}parameters in Junit 5, use{{ assertThrows}} instead) > Replcace all `@Test(expected = XXException)` with assertThrows > -- > > Key: SPARK-38848 > URL: https://issues.apache.org/jira/browse/SPARK-38848 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > {{@Test}} no longer has {{expected parameters in Junit 5, use assertThrows}} > instead -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38848) Replcace all `@Test(expected = XXException)` with assertThrows
Yang Jie created SPARK-38848: Summary: Replcace all `@Test(expected = XXException)` with assertThrows Key: SPARK-38848 URL: https://issues.apache.org/jira/browse/SPARK-38848 Project: Spark Issue Type: Improvement Components: Tests Affects Versions: 3.4.0 Reporter: Yang Jie {{@Test}} no longer has {{expected }}parameters in Junit 5, use{{ assertThrows}} instead -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38847) Introduce a `viewToSeq` function for `KVUtils`
[ https://issues.apache.org/jira/browse/SPARK-38847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38847: Assignee: (was: Apache Spark) > Introduce a `viewToSeq` function for `KVUtils` > -- > > Key: SPARK-38847 > URL: https://issues.apache.org/jira/browse/SPARK-38847 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > There are many codes in spark that convert KVStoreView into `List`, and these > codes will not close `KVStoreIterator`, these resources are mainly recycled > by `finalize()` method implemented in `LevelDB` and `RockSB`, this makes > `KVStoreIterator` resource recycling unpredictable. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38847) Introduce a `viewToSeq` function for `KVUtils`
[ https://issues.apache.org/jira/browse/SPARK-38847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38847: Assignee: Apache Spark > Introduce a `viewToSeq` function for `KVUtils` > -- > > Key: SPARK-38847 > URL: https://issues.apache.org/jira/browse/SPARK-38847 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > There are many codes in spark that convert KVStoreView into `List`, and these > codes will not close `KVStoreIterator`, these resources are mainly recycled > by `finalize()` method implemented in `LevelDB` and `RockSB`, this makes > `KVStoreIterator` resource recycling unpredictable. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38847) Introduce a `viewToSeq` function for `KVUtils`
[ https://issues.apache.org/jira/browse/SPARK-38847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520171#comment-17520171 ] Apache Spark commented on SPARK-38847: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/36132 > Introduce a `viewToSeq` function for `KVUtils` > -- > > Key: SPARK-38847 > URL: https://issues.apache.org/jira/browse/SPARK-38847 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > There are many codes in spark that convert KVStoreView into `List`, and these > codes will not close `KVStoreIterator`, these resources are mainly recycled > by `finalize()` method implemented in `LevelDB` and `RockSB`, this makes > `KVStoreIterator` resource recycling unpredictable. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38847) Introduce a `viewToSeq` function for `KVUtils`
Yang Jie created SPARK-38847: Summary: Introduce a `viewToSeq` function for `KVUtils` Key: SPARK-38847 URL: https://issues.apache.org/jira/browse/SPARK-38847 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Affects Versions: 3.4.0 Reporter: Yang Jie There are many codes in spark that convert KVStoreView into `List`, and these codes will not close `KVStoreIterator`, these resources are mainly recycled by `finalize()` method implemented in `LevelDB` and `RockSB`, this makes `KVStoreIterator` resource recycling unpredictable. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38846) Teradata's Number is either converted to its floor value or ceiling value despite its fractional part.
eugene created SPARK-38846: -- Summary: Teradata's Number is either converted to its floor value or ceiling value despite its fractional part. Key: SPARK-38846 URL: https://issues.apache.org/jira/browse/SPARK-38846 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Environment: Spark2.3.0 on Yarn Teradata 16.20.32.59 Reporter: eugene I'm trying to load data from Teradata, the code using is: sparkSession.read .format("jdbc") .options( Map( "url" -> "jdbc:teradata://hostname, user=$username, password=$password", "MAYBENULL" -> "ON", "SIP_SUPPORT" -> "ON", "driver" -> "com.teradata.jdbc.TeraDriver", "dbtable" -> $table_name ) ) .load() However, some data lost its fractional part after loading. To be more concise, the column in Teradata is in the type of [Number][1] and after loading, the data type in Spark is `DecimalType(38,0)`, the scale value is 0 which means no digits after decimal point. Data in Teradata is something like, id column1 column2 1 50.23 100.23 2 25.8 20.669 3 30.2 19.23 The `dataframe` of Spark is like, id column1 column2 1 50 100 2 26 21 3 30 19 The meta data of the table in Teradata is like: CREATE SET TABLE table_name (id BIGINT, column1 NUMBER, column2 NUMBER) PRIMARY INDEX (id); The Spark version is 2.3.0 and Teradata is 16.20.32.59. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38565) Support Left Semi join in row level runtime filters
[ https://issues.apache.org/jira/browse/SPARK-38565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38565: Assignee: (was: Apache Spark) > Support Left Semi join in row level runtime filters > --- > > Key: SPARK-38565 > URL: https://issues.apache.org/jira/browse/SPARK-38565 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Abhishek Somani >Priority: Major > > Support Left Semi join in the runtime filtering as well. > This is a follow up to https://issues.apache.org/jira/browse/SPARK-32268 once > [https://github.com/apache/spark/pull/35789] is merged. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38565) Support Left Semi join in row level runtime filters
[ https://issues.apache.org/jira/browse/SPARK-38565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520167#comment-17520167 ] Apache Spark commented on SPARK-38565: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/36131 > Support Left Semi join in row level runtime filters > --- > > Key: SPARK-38565 > URL: https://issues.apache.org/jira/browse/SPARK-38565 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Abhishek Somani >Priority: Major > > Support Left Semi join in the runtime filtering as well. > This is a follow up to https://issues.apache.org/jira/browse/SPARK-32268 once > [https://github.com/apache/spark/pull/35789] is merged. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38565) Support Left Semi join in row level runtime filters
[ https://issues.apache.org/jira/browse/SPARK-38565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38565: Assignee: Apache Spark > Support Left Semi join in row level runtime filters > --- > > Key: SPARK-38565 > URL: https://issues.apache.org/jira/browse/SPARK-38565 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Abhishek Somani >Assignee: Apache Spark >Priority: Major > > Support Left Semi join in the runtime filtering as well. > This is a follow up to https://issues.apache.org/jira/browse/SPARK-32268 once > [https://github.com/apache/spark/pull/35789] is merged. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3723) DecisionTree, RandomForest: Add more instrumentation
[ https://issues.apache.org/jira/browse/SPARK-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520162#comment-17520162 ] Apache Spark commented on SPARK-3723: - User 'zero323' has created a pull request for this issue: https://github.com/apache/spark/pull/36130 > DecisionTree, RandomForest: Add more instrumentation > > > Key: SPARK-3723 > URL: https://issues.apache.org/jira/browse/SPARK-3723 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Minor > Labels: bulk-closed > > Some simple instrumentation would help advanced users understand performance, > and to check whether parameters (such as maxMemoryInMB) need to be tuned. > Most important instrumentation (simple): > * min, avg, max nodes per group > * number of groups (passes over data) > More advanced instrumentation: > * For each tree (or averaged over trees), training set accuracy after > training each level. This would be useful for visualizing learning behavior > (to convince oneself that model selection was being done correctly). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3723) DecisionTree, RandomForest: Add more instrumentation
[ https://issues.apache.org/jira/browse/SPARK-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520161#comment-17520161 ] Apache Spark commented on SPARK-3723: - User 'zero323' has created a pull request for this issue: https://github.com/apache/spark/pull/36130 > DecisionTree, RandomForest: Add more instrumentation > > > Key: SPARK-3723 > URL: https://issues.apache.org/jira/browse/SPARK-3723 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Minor > Labels: bulk-closed > > Some simple instrumentation would help advanced users understand performance, > and to check whether parameters (such as maxMemoryInMB) need to be tuned. > Most important instrumentation (simple): > * min, avg, max nodes per group > * number of groups (passes over data) > More advanced instrumentation: > * For each tree (or averaged over trees), training set accuracy after > training each level. This would be useful for visualizing learning behavior > (to convince oneself that model selection was being done correctly). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37234) Inline type hints for python/pyspark/mllib/stat/_statistics.py
[ https://issues.apache.org/jira/browse/SPARK-37234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz resolved SPARK-37234. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34513 [https://github.com/apache/spark/pull/34513] > Inline type hints for python/pyspark/mllib/stat/_statistics.py > -- > > Key: SPARK-37234 > URL: https://issues.apache.org/jira/browse/SPARK-37234 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Assignee: dch nguyen >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37234) Inline type hints for python/pyspark/mllib/stat/_statistics.py
[ https://issues.apache.org/jira/browse/SPARK-37234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz reassigned SPARK-37234: -- Assignee: dch nguyen > Inline type hints for python/pyspark/mllib/stat/_statistics.py > -- > > Key: SPARK-37234 > URL: https://issues.apache.org/jira/browse/SPARK-37234 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: dch nguyen >Assignee: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38108) Use error classes in the compilation errors of UDF/UDAF
[ https://issues.apache.org/jira/browse/SPARK-38108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-38108: Assignee: huangtengfei > Use error classes in the compilation errors of UDF/UDAF > --- > > Key: SPARK-38108 > URL: https://issues.apache.org/jira/browse/SPARK-38108 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: huangtengfei >Priority: Major > > Migrate the following errors in QueryCompilationErrors: > * noHandlerForUDAFError > * unexpectedEvalTypesForUDFsError > * usingUntypedScalaUDFError > * udfClassDoesNotImplementAnyUDFInterfaceError > * udfClassNotAllowedToImplementMultiUDFInterfacesError > * udfClassWithTooManyTypeArgumentsError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryCompilationErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38108) Use error classes in the compilation errors of UDF/UDAF
[ https://issues.apache.org/jira/browse/SPARK-38108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-38108. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36064 [https://github.com/apache/spark/pull/36064] > Use error classes in the compilation errors of UDF/UDAF > --- > > Key: SPARK-38108 > URL: https://issues.apache.org/jira/browse/SPARK-38108 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: huangtengfei >Priority: Major > Fix For: 3.4.0 > > > Migrate the following errors in QueryCompilationErrors: > * noHandlerForUDAFError > * unexpectedEvalTypesForUDFsError > * usingUntypedScalaUDFError > * udfClassDoesNotImplementAnyUDFInterfaceError > * udfClassNotAllowedToImplementMultiUDFInterfacesError > * udfClassWithTooManyTypeArgumentsError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryCompilationErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38845) SparkContext init before SparkSession will cause hive table not found
[ https://issues.apache.org/jira/browse/SPARK-38845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38845: Assignee: (was: Apache Spark) > SparkContext init before SparkSession will cause hive table not found > - > > Key: SPARK-38845 > URL: https://issues.apache.org/jira/browse/SPARK-38845 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.1 >Reporter: wangshengjie >Priority: Major > > If we init SparkContext before SparkSession when using DataFrame to query > hive table, if will throw table not found exception. > {code:java} > //代码占位符 > val sparkContext = new SparkContext() > val sparkSession = SparkSession > .builder > .appName("SparkSession Test") > .enableHiveSupport() > .getOrCreate() > sparkSession.sql("select * from tableA"){code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38845) SparkContext init before SparkSession will cause hive table not found
[ https://issues.apache.org/jira/browse/SPARK-38845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38845: Assignee: Apache Spark > SparkContext init before SparkSession will cause hive table not found > - > > Key: SPARK-38845 > URL: https://issues.apache.org/jira/browse/SPARK-38845 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.1 >Reporter: wangshengjie >Assignee: Apache Spark >Priority: Major > > If we init SparkContext before SparkSession when using DataFrame to query > hive table, if will throw table not found exception. > {code:java} > //代码占位符 > val sparkContext = new SparkContext() > val sparkSession = SparkSession > .builder > .appName("SparkSession Test") > .enableHiveSupport() > .getOrCreate() > sparkSession.sql("select * from tableA"){code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38845) SparkContext init before SparkSession will cause hive table not found
[ https://issues.apache.org/jira/browse/SPARK-38845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520121#comment-17520121 ] Apache Spark commented on SPARK-38845: -- User 'wangshengjie123' has created a pull request for this issue: https://github.com/apache/spark/pull/36129 > SparkContext init before SparkSession will cause hive table not found > - > > Key: SPARK-38845 > URL: https://issues.apache.org/jira/browse/SPARK-38845 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.1 >Reporter: wangshengjie >Priority: Major > > If we init SparkContext before SparkSession when using DataFrame to query > hive table, if will throw table not found exception. > {code:java} > //代码占位符 > val sparkContext = new SparkContext() > val sparkSession = SparkSession > .builder > .appName("SparkSession Test") > .enableHiveSupport() > .getOrCreate() > sparkSession.sql("select * from tableA"){code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38845) SparkContext init before SparkSession will cause hive table not found
[ https://issues.apache.org/jira/browse/SPARK-38845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangshengjie updated SPARK-38845: - Environment: (was: If we init SparkContext before SparkSession when using DataFrame to query hive table, if will throw table not found exception. {code:java} //代码占位符 val sparkContext = new SparkContext() val sparkSession = SparkSession .builder .appName("SparkSession Test") .enableHiveSupport() .getOrCreate() sparkSession.sql("select * from tableA"){code}) > SparkContext init before SparkSession will cause hive table not found > - > > Key: SPARK-38845 > URL: https://issues.apache.org/jira/browse/SPARK-38845 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.1 >Reporter: wangshengjie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38845) SparkContext init before SparkSession will cause hive table not found
wangshengjie created SPARK-38845: Summary: SparkContext init before SparkSession will cause hive table not found Key: SPARK-38845 URL: https://issues.apache.org/jira/browse/SPARK-38845 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1, 3.1.2 Environment: If we init SparkContext before SparkSession when using DataFrame to query hive table, if will throw table not found exception. {code:java} //代码占位符 val sparkContext = new SparkContext() val sparkSession = SparkSession .builder .appName("SparkSession Test") .enableHiveSupport() .getOrCreate() sparkSession.sql("select * from tableA"){code} Reporter: wangshengjie -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38845) SparkContext init before SparkSession will cause hive table not found
[ https://issues.apache.org/jira/browse/SPARK-38845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangshengjie updated SPARK-38845: - Description: If we init SparkContext before SparkSession when using DataFrame to query hive table, if will throw table not found exception. {code:java} //代码占位符 val sparkContext = new SparkContext() val sparkSession = SparkSession .builder .appName("SparkSession Test") .enableHiveSupport() .getOrCreate() sparkSession.sql("select * from tableA"){code} > SparkContext init before SparkSession will cause hive table not found > - > > Key: SPARK-38845 > URL: https://issues.apache.org/jira/browse/SPARK-38845 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.1 >Reporter: wangshengjie >Priority: Major > > If we init SparkContext before SparkSession when using DataFrame to query > hive table, if will throw table not found exception. > {code:java} > //代码占位符 > val sparkContext = new SparkContext() > val sparkSession = SparkSession > .builder > .appName("SparkSession Test") > .enableHiveSupport() > .getOrCreate() > sparkSession.sql("select * from tableA"){code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19335) Spark should support doing an efficient DataFrame Upsert via JDBC
[ https://issues.apache.org/jira/browse/SPARK-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520107#comment-17520107 ] Chandramouli Viswanathan commented on SPARK-19335: -- Hi Is the issue resolved? if yes, how can I get the sample implementation guide? to move forward. Thanks In Advance > Spark should support doing an efficient DataFrame Upsert via JDBC > - > > Key: SPARK-19335 > URL: https://issues.apache.org/jira/browse/SPARK-19335 > Project: Spark > Issue Type: Improvement >Reporter: Ilya Ganelin >Priority: Minor > > Doing a database update, as opposed to an insert is useful, particularly when > working with streaming applications which may require revisions to previously > stored data. > Spark DataFrames/DataSets do not currently support an Update feature via the > JDBC Writer allowing only Overwrite or Append. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org