date:20240523

[jira] [Assigned] (SPARK-46090) Support plan fragment level SQL configs in AQE

2024-05-23 Thread XiDuo You (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You reassigned SPARK-46090:
-

Assignee: XiDuo You

> Support plan fragment level SQL configs  in AQE
> ---
>
> Key: SPARK-46090
> URL: https://issues.apache.org/jira/browse/SPARK-46090
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
>  Labels: pull-request-available
>
> AQE executes query plan stage by stage, so there is a chance to support plan 
> fragment level SQL configs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46090) Support plan fragment level SQL configs in AQE

2024-05-23 Thread XiDuo You (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You resolved SPARK-46090.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44013
[https://github.com/apache/spark/pull/44013]

> Support plan fragment level SQL configs  in AQE
> ---
>
> Key: SPARK-46090
> URL: https://issues.apache.org/jira/browse/SPARK-46090
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> AQE executes query plan stage by stage, so there is a chance to support plan 
> fragment level SQL configs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48406) Upgrade commons-cli to 1.8.0

2024-05-23 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48406.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46727
[https://github.com/apache/spark/pull/46727]

> Upgrade commons-cli to 1.8.0
> 
>
> Key: SPARK-48406
> URL: https://issues.apache.org/jira/browse/SPARK-48406
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> * [https://commons.apache.org/proper/commons-cli/changes-report.html#a1.7.0]
>  * 
> [https://commons.apache.org/proper/commons-cli/changes-report.html#a1.8.0|https://commons.apache.org/proper/commons-cli/changes-report.html#a1.7.8]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48406) Upgrade commons-cli to 1.8.0

2024-05-23 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48406:


Assignee: Yang Jie

> Upgrade commons-cli to 1.8.0
> 
>
> Key: SPARK-48406
> URL: https://issues.apache.org/jira/browse/SPARK-48406
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> * [https://commons.apache.org/proper/commons-cli/changes-report.html#a1.7.0]
>  * 
> [https://commons.apache.org/proper/commons-cli/changes-report.html#a1.8.0|https://commons.apache.org/proper/commons-cli/changes-report.html#a1.7.8]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48408) Simplify `date_format` & `from_unixtime`

2024-05-23 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48408:
---
Labels: pull-request-available  (was: )

> Simplify `date_format` & `from_unixtime`
> 
>
> Key: SPARK-48408
> URL: https://issues.apache.org/jira/browse/SPARK-48408
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48408) Simplify `date_format` & `from_unixtime`

2024-05-23 Thread BingKun Pan (Jira)

BingKun Pan created SPARK-48408:
---

 Summary: Simplify `date_format` & `from_unixtime`
 Key: SPARK-48408
 URL: https://issues.apache.org/jira/browse/SPARK-48408
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48405) Upgrade `commons-compress` to 1.26.2

2024-05-23 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48405.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46725
[https://github.com/apache/spark/pull/46725]

> Upgrade `commons-compress` to 1.26.2
> 
>
> Key: SPARK-48405
> URL: https://issues.apache.org/jira/browse/SPARK-48405
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48405) Upgrade `commons-compress` to 1.26.2

2024-05-23 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48405:


Assignee: BingKun Pan

> Upgrade `commons-compress` to 1.26.2
> 
>
> Key: SPARK-48405
> URL: https://issues.apache.org/jira/browse/SPARK-48405
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48407) Teradata: Document Type Conversion rules between Spark SQL and teradata

2024-05-23 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48407:
---
Labels: pull-request-available  (was: )

> Teradata: Document Type Conversion rules between Spark SQL and teradata
> ---
>
> Key: SPARK-48407
> URL: https://issues.apache.org/jira/browse/SPARK-48407
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48407) Teradata: Document Type Conversion rules between Spark SQL and teradata

2024-05-23 Thread Kent Yao (Jira)

Kent Yao created SPARK-48407:


 Summary: Teradata: Document Type Conversion rules between Spark 
SQL and teradata
 Key: SPARK-48407
 URL: https://issues.apache.org/jira/browse/SPARK-48407
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48406) Upgrade commons-cli to 1.8.0

2024-05-23 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48406:
---
Labels: pull-request-available  (was: )

> Upgrade commons-cli to 1.8.0
> 
>
> Key: SPARK-48406
> URL: https://issues.apache.org/jira/browse/SPARK-48406
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> * [https://commons.apache.org/proper/commons-cli/changes-report.html#a1.7.0]
>  * 
> [https://commons.apache.org/proper/commons-cli/changes-report.html#a1.8.0|https://commons.apache.org/proper/commons-cli/changes-report.html#a1.7.8]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48283) Implement modified Lowercase operation for UTF8_BINARY_LCASE

2024-05-23 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48283:
---
Labels: pull-request-available  (was: )

> Implement modified Lowercase operation for UTF8_BINARY_LCASE
> 
>
> Key: SPARK-48283
> URL: https://issues.apache.org/jira/browse/SPARK-48283
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48406) Upgrade commons-cli to 1.8.0

2024-05-23 Thread Yang Jie (Jira)

Yang Jie created SPARK-48406:


 Summary: Upgrade commons-cli to 1.8.0
 Key: SPARK-48406
 URL: https://issues.apache.org/jira/browse/SPARK-48406
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: Yang Jie


* [https://commons.apache.org/proper/commons-cli/changes-report.html#a1.7.0]
 * 
[https://commons.apache.org/proper/commons-cli/changes-report.html#a1.8.0|https://commons.apache.org/proper/commons-cli/changes-report.html#a1.7.8]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48399) Teradata: ByteType wrongly mapping to teradata byte(binary) type

2024-05-23 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48399.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46715
[https://github.com/apache/spark/pull/46715]

> Teradata: ByteType wrongly mapping to teradata byte(binary) type
> 
>
> Key: SPARK-48399
> URL: https://issues.apache.org/jira/browse/SPARK-48399
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48405) Upgrade `commons-compress` to 1.26.2

2024-05-23 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48405:
---
Labels: pull-request-available  (was: )

> Upgrade `commons-compress` to 1.26.2
> 
>
> Key: SPARK-48405
> URL: https://issues.apache.org/jira/browse/SPARK-48405
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48405) Upgrade `commons-compress` to 1.26.2

2024-05-23 Thread BingKun Pan (Jira)

BingKun Pan created SPARK-48405:
---

 Summary: Upgrade `commons-compress` to 1.26.2
 Key: SPARK-48405
 URL: https://issues.apache.org/jira/browse/SPARK-48405
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48404) Driver and Executor support merge and run in a single jvm

2024-05-23 Thread melin (Jira)

melin created SPARK-48404:
-

 Summary: Driver and Executor support merge and run in a single jvm
 Key: SPARK-48404
 URL: https://issues.apache.org/jira/browse/SPARK-48404
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: melin


Spark is used in data integration scenarios (such as reading data from mysql 
and writing data to other data sources), and in many cases can run tasks in a 
single concurrency. The Driver and Executor consume resources separately. If 
Driver and Executor support merging, especially when running on the cloud. Can 
save calculation cost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33164) SPIP: add SQL support to "SELECT * (EXCEPT someColumn) FROM .." equivalent to DataSet.dropColumn(someColumn)

2024-05-23 Thread Jonathan Boarman (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849126#comment-17849126
 ] 

Jonathan Boarman commented on SPARK-33164:
--

There are significant benefits provided by the `{*}{{EXCEPT}}{*}` feature 
provided by most large data platforms, including Databricks, Snowflake, 
BigQuery, DuckDB, etc.  The list of vendors that support *{{EXCEPT}}* (or 
increasingly called `{*}{{EXCLUDE}}{*}` to avoid conflicts) is pretty long and 
growing.  As such, migrating projects from those platforms to a pure Spark SQL 
environment is extremely costly.

Further, the "risks" associated with `{*}{{SELECT *}}{*}` do not apply to all 
scenarios – very importantly, with CTEs these risks are not applicable since 
the constraints on column selection are generally made in the first CTE.

For example, any subsequent CTEs in a chain of CTEs inherits the field 
selection of the first CTEs.  On platforms that lack this feature, we have a 
different risk caused be crazy levels of duplication if we are forced to 
enumerate fields in each and every CTE.  This is particularly problematic when 
joining two CTEs that share a field, such as an `{*}{{ID}}{*}` column.  In that 
situation, the most efficient and risk-free approach is to `{*}{{SELECT * 
EXCEPT(right.id)}}{*}` from the join of its two dependent CTEs.

Any perceived judgment aside, this is a highly-relied-upon feature in 
enterprise environments that depend on these quality-of-life innovations.  
Clearly such improvements are providing value in those environments, and Spark 
SQL should not be any different in supporting its users that have come to rely 
on such innovations.

> SPIP: add SQL support to "SELECT * (EXCEPT someColumn) FROM .." equivalent to 
> DataSet.dropColumn(someColumn)
> 
>
> Key: SPARK-33164
> URL: https://issues.apache.org/jira/browse/SPARK-33164
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.5, 2.4.6, 2.4.7, 3.0.0, 3.0.1
>Reporter: Arnaud Nauwynck
>Priority: Minor
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> *Q1.* What are you trying to do? Articulate your objectives using absolutely 
> no jargon.
> I would like to have the extended SQL syntax "SELECT * EXCEPT someColumn FROM 
> .." 
> to be able to select all columns except some in a SELECT clause.
> It would be similar to SQL syntax from some databases, like Google BigQuery 
> or PostgresQL.
> https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax
> Google question "select * EXCEPT one column", and you will see many 
> developpers have the same problems.
> example posts: 
> https://blog.jooq.org/2018/05/14/selecting-all-columns-except-one-in-postgresql/
> https://www.thetopsites.net/article/53001825.shtml
> There are several typicall examples where is is very helpfull :
> use-case1:
>  you add "count ( * )  countCol" column, and then filter on it using for 
> example "having countCol = 1" 
>   ... and then you want to select all columns EXCEPT this dummy column which 
> always is "1"
> {noformat}
>   select * (EXCEPT countCol)
>   from (  
>  select count(*) countCol, * 
>from MyTable 
>where ... 
>group by ... having countCol = 1
>   )
> {noformat}
>
> use-case 2:
>  same with analytical function "partition over(...) rankCol  ... where 
> rankCol=1"
>  For example to get the latest row before a given time, in a time series 
> table.
>  This is "Time-Travel" queries addressed by framework like "DeltaLake"
> {noformat}
>  CREATE table t_updates (update_time timestamp, id string, col1 type1, col2 
> type2, ... col42)
>  pastTime=..
>  SELECT * (except rankCol)
>  FROM (
>SELECT *,
>   RANK() OVER (PARTITION BY id ORDER BY update_time) rankCol   
>FROM t_updates
>where update_time < pastTime
>  ) WHERE rankCol = 1
>  
> {noformat}
>  
> use-case 3:
>  copy some data from table "t" to corresponding table "t_snapshot", and back 
> to "t"
> {noformat}
>CREATE TABLE t (col1 type1, col2 type2, col3 type3, ... col42 type42) ...
>
>/* create corresponding table: (snap_id string, col1 type1, col2 type2, 
> col3 type3, ... col42 type42) */
>CREATE TABLE t_snapshot
>AS SELECT '' as snap_id, * FROM t WHERE 1=2
>/* insert data from t to some snapshot */
>INSERT INTO t_snapshot
>SELECT 'snap1' as snap_id, * from t 
>
>/* select some data from snapshot table (without snap_id column) .. */   
>SELECT * (EXCEPT snap_id) FROM t_snapshot where snap_id='snap1' 
>
> {noformat}
>
>
> *Q2.* What problem is this proposal NOT designed to solve?
> It is only a SQL syntaxic sugar. 
> It does not change SQL execution plan or an

[jira] [Commented] (SPARK-48361) Correctness: CSV corrupt record filter with aggregate ignored

2024-05-23 Thread Bruce Robbins (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-48361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849095#comment-17849095
 ] 

Bruce Robbins commented on SPARK-48361:
---

I can take a look at the root cause, unless you are already looking at that, in 
which case I will hold off.

> Correctness: CSV corrupt record filter with aggregate ignored
> -
>
> Key: SPARK-48361
> URL: https://issues.apache.org/jira/browse/SPARK-48361
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.1
> Environment: Using spark shell 3.5.1 on M1 Mac
>Reporter: Ted Chester Jenks
>Priority: Major
>
> Using corrupt record in CSV parsing for some data cleaning logic, I came 
> across a correctness bug.
>  
> The following repro can be ran with spark-shell 3.5.1.
> *Create test.csv with the following content:*
> {code:java}
> test,1,2,three
> four,5,6,seven
> 8,9
> ten,11,12,thirteen {code}
>  
>  
> *In spark-shell:*
> {code:java}
> import org.apache.spark.sql.types._ 
> import org.apache.spark.sql.functions._
>  
> # define a STRING, DOUBLE, DOUBLE, STRING schema for the data
> val schema = StructType(List(StructField("column1", StringType, true), 
> StructField("column2", DoubleType, true), StructField("column3", DoubleType, 
> true), StructField("column4", StringType, true)))
>  
> # add a column for corrupt records to the schema
> val schemaWithCorrupt = StructType(schema.fields :+ 
> StructField("_corrupt_record", StringType, true)) 
>  
> # read the CSV with the schema, headers, permissive parsing, and the corrupt 
> record column
> val df = spark.read.option("header", "true").option("mode", 
> "PERMISSIVE").option("columnNameOfCorruptRecord", 
> "_corrupt_record").schema(schemaWithCorrupt).csv("test.csv") 
>  
> # define a UDF to count the commas in the corrupt record column
> val countCommas = udf((s: String) => if (s != null) s.count(_ == ',') else 
> -1) 
>  
> # add a true/false column for whether the number of commas is 3
> val dfWithJagged = df.withColumn("__is_jagged", 
> when(col("_corrupt_record").isNull, 
> false).otherwise(countCommas(col("_corrupt_record")) =!= 3))
> dfWithJagged.show(){code}
> *Returns:*
> {code:java}
> +---+---+---++---+---+
> |column1|column2|column3| column4|_corrupt_record|__is_jagged|
> +---+---+---++---+---+
> |   four|    5.0|    6.0|   seven|           NULL|      false|
> |      8|    9.0|   NULL|    NULL|            8,9|       true|
> |    ten|   11.0|   12.0|thirteen|           NULL|      false|
> +---+---+---++---+---+ {code}
> So far so good...
>  
> *BUT*
>  
> *If we add an aggregate before we show:*
> {code:java}
> import org.apache.spark.sql.types._ 
> import org.apache.spark.sql.functions._
> val schema = StructType(List(StructField("column1", StringType, true), 
> StructField("column2", DoubleType, true), StructField("column3", DoubleType, 
> true), StructField("column4", StringType, true)))
> val schemaWithCorrupt = StructType(schema.fields :+ 
> StructField("_corrupt_record", StringType, true)) 
> val df = spark.read.option("header", "true").option("mode", 
> "PERMISSIVE").option("columnNameOfCorruptRecord", 
> "_corrupt_record").schema(schemaWithCorrupt).csv("test.csv") 
> val countCommas = udf((s: String) => if (s != null) s.count(_ == ',') else 
> -1) 
> val dfWithJagged = df.withColumn("__is_jagged", 
> when(col("_corrupt_record").isNull, 
> false).otherwise(countCommas(col("_corrupt_record")) =!= 3))
> val dfDropped = dfWithJagged.filter(col("__is_jagged") =!= true)
> val groupedSum = 
> dfDropped.groupBy("column1").agg(sum("column2").alias("sum_column2"))
> groupedSum.show(){code}
> *We get:*
> {code:java}
> +---+---+
> |column1|sum_column2|
> +---+---+
> |      8|        9.0|
> |   four|        5.0|
> |    ten|       11.0|
> +---+---+ {code}
>  
> *Which is not correct*
>  
> With the addition of the aggregate, the filter down to rows with 3 commas in 
> the corrupt record column is ignored. This does not happed with any other 
> operators I have tried - just aggregates so far.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48318) Hash join support for strings with collation (complex types)

2024-05-23 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48318:
---
Labels: pull-request-available  (was: )

> Hash join support for strings with collation (complex types)
> 
>
> Key: SPARK-48318
> URL: https://issues.apache.org/jira/browse/SPARK-48318
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41049) Nondeterministic expressions have unstable values if they are children of CodegenFallback expressions

2024-05-23 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-41049:
---
Labels: correctness pull-request-available  (was: correctness)

> Nondeterministic expressions have unstable values if they are children of 
> CodegenFallback expressions
> -
>
> Key: SPARK-41049
> URL: https://issues.apache.org/jira/browse/SPARK-41049
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Guy Boo
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: correctness, pull-request-available
> Fix For: 3.4.0
>
>
> h2. Expectation
> For a given row, Nondeterministic expressions are expected to have stable 
> values.
> {code:scala}
> import org.apache.spark.sql.functions._
> val df = sparkContext.parallelize(1 to 5).toDF("x")
> val v1 = rand().*(lit(1)).cast(IntegerType)
> df.select(v1, v1).collect{code}
> Returns a set like this:
> |8777|8777|
> |1357|1357|
> |3435|3435|
> |9204|9204|
> |3870|3870|
> where both columns always have the same value, but what that value is changes 
> from row to row. This is different from the following:
> {code:scala}
> df.select(rand(), rand()).collect{code}
> In this case, because the rand() calls are distinct, the values in both 
> columns should be different.
> h2. Problem
> This expectation does not appear to be stable in the event that any 
> subsequent expression is a CodegenFallback. This program:
> {code:scala}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.functions._
> val sparkSession = SparkSession.builder().getOrCreate()
> val df = sparkSession.sparkContext.parallelize(1 to 5).toDF("x")
> val v1 = rand().*(lit(1)).cast(IntegerType)
> val v2 = to_csv(struct(v1.as("a"))) // to_csv is CodegenFallback
> df.select(v1, v1, v2, v2).collect {code}
> produces output like this:
> |8159|8159|8159|{color:#ff}2028{color}|
> |8320|8320|8320|{color:#ff}1640{color}|
> |7937|7937|7937|{color:#ff}769{color}|
> |436|436|436|{color:#ff}8924{color}|
> |8924|8924|2827|{color:#ff}2731{color}|
> Not sure why the first call via the CodegenFallback path should be correct 
> while subsequent calls aren't.
> h2. Workaround
> If the Nondeterministic expression is moved to a separate, earlier select() 
> call, so the CodegenFallback instead only refers to a column reference, then 
> the problem seems to go away. But this workaround may not be reliable if 
> optimization is ever able to restructure adjacent select()s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48403) Fix Upper, Lower, InitCap for UTF8_BINARY_LCASE

2024-05-23 Thread Jira

Uroš Bojanić created SPARK-48403:


 Summary: Fix Upper, Lower, InitCap for UTF8_BINARY_LCASE
 Key: SPARK-48403
 URL: https://issues.apache.org/jira/browse/SPARK-48403
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Uroš Bojanić






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48400) Promote `PrometheusServlet` to `DeveloperApi`

2024-05-23 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48400:
---
Labels: pull-request-available  (was: )

> Promote `PrometheusServlet` to `DeveloperApi`
> -
>
> Key: SPARK-48400
> URL: https://issues.apache.org/jira/browse/SPARK-48400
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-48361) Correctness: CSV corrupt record filter with aggregate ignored

2024-05-23 Thread Ted Chester Jenks (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-48361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848936#comment-17848936
 ] 

Ted Chester Jenks commented on SPARK-48361:
---

Ah yes! Sorry [~bersprockets] I messed up my example but you are right, I meant 
the 
{quote}With `=!= true`, the grouping includes `8, 9` (it shouldn't, as you 
mentioned).
{quote}
as you mentioned. Fixed in the example now.

 

I know persisting fixes - so the bug is specifically without 
collecting/checkpointing/caching the data.

> Correctness: CSV corrupt record filter with aggregate ignored
> -
>
> Key: SPARK-48361
> URL: https://issues.apache.org/jira/browse/SPARK-48361
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.1
> Environment: Using spark shell 3.5.1 on M1 Mac
>Reporter: Ted Chester Jenks
>Priority: Major
>
> Using corrupt record in CSV parsing for some data cleaning logic, I came 
> across a correctness bug.
>  
> The following repro can be ran with spark-shell 3.5.1.
> *Create test.csv with the following content:*
> {code:java}
> test,1,2,three
> four,5,6,seven
> 8,9
> ten,11,12,thirteen {code}
>  
>  
> *In spark-shell:*
> {code:java}
> import org.apache.spark.sql.types._ 
> import org.apache.spark.sql.functions._
>  
> # define a STRING, DOUBLE, DOUBLE, STRING schema for the data
> val schema = StructType(List(StructField("column1", StringType, true), 
> StructField("column2", DoubleType, true), StructField("column3", DoubleType, 
> true), StructField("column4", StringType, true)))
>  
> # add a column for corrupt records to the schema
> val schemaWithCorrupt = StructType(schema.fields :+ 
> StructField("_corrupt_record", StringType, true)) 
>  
> # read the CSV with the schema, headers, permissive parsing, and the corrupt 
> record column
> val df = spark.read.option("header", "true").option("mode", 
> "PERMISSIVE").option("columnNameOfCorruptRecord", 
> "_corrupt_record").schema(schemaWithCorrupt).csv("test.csv") 
>  
> # define a UDF to count the commas in the corrupt record column
> val countCommas = udf((s: String) => if (s != null) s.count(_ == ',') else 
> -1) 
>  
> # add a true/false column for whether the number of commas is 3
> val dfWithJagged = df.withColumn("__is_jagged", 
> when(col("_corrupt_record").isNull, 
> false).otherwise(countCommas(col("_corrupt_record")) =!= 3))
> dfWithJagged.show(){code}
> *Returns:*
> {code:java}
> +---+---+---++---+---+
> |column1|column2|column3| column4|_corrupt_record|__is_jagged|
> +---+---+---++---+---+
> |   four|    5.0|    6.0|   seven|           NULL|      false|
> |      8|    9.0|   NULL|    NULL|            8,9|       true|
> |    ten|   11.0|   12.0|thirteen|           NULL|      false|
> +---+---+---++---+---+ {code}
> So far so good...
>  
> *BUT*
>  
> *If we add an aggregate before we show:*
> {code:java}
> import org.apache.spark.sql.types._ 
> import org.apache.spark.sql.functions._
> val schema = StructType(List(StructField("column1", StringType, true), 
> StructField("column2", DoubleType, true), StructField("column3", DoubleType, 
> true), StructField("column4", StringType, true)))
> val schemaWithCorrupt = StructType(schema.fields :+ 
> StructField("_corrupt_record", StringType, true)) 
> val df = spark.read.option("header", "true").option("mode", 
> "PERMISSIVE").option("columnNameOfCorruptRecord", 
> "_corrupt_record").schema(schemaWithCorrupt).csv("test.csv") 
> val countCommas = udf((s: String) => if (s != null) s.count(_ == ',') else 
> -1) 
> val dfWithJagged = df.withColumn("__is_jagged", 
> when(col("_corrupt_record").isNull, 
> false).otherwise(countCommas(col("_corrupt_record")) =!= 3))
> val dfDropped = dfWithJagged.filter(col("__is_jagged") =!= true)
> val groupedSum = 
> dfDropped.groupBy("column1").agg(sum("column2").alias("sum_column2"))
> groupedSum.show(){code}
> *We get:*
> {code:java}
> +---+---+
> |column1|sum_column2|
> +---+---+
> |      8|        9.0|
> |   four|        5.0|
> |    ten|       11.0|
> +---+---+ {code}
>  
> *Which is not correct*
>  
> With the addition of the aggregate, the filter down to rows with 3 commas in 
> the corrupt record column is ignored. This does not happed with any other 
> operators I have tried - just aggregates so far.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48361) Correctness: CSV corrupt record filter with aggregate ignored

2024-05-23 Thread Ted Chester Jenks (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Chester Jenks updated SPARK-48361:
--
Description: 
Using corrupt record in CSV parsing for some data cleaning logic, I came across 
a correctness bug.

 

The following repro can be ran with spark-shell 3.5.1.

*Create test.csv with the following content:*
{code:java}
test,1,2,three
four,5,6,seven
8,9
ten,11,12,thirteen {code}
 

 

*In spark-shell:*
{code:java}
import org.apache.spark.sql.types._ 
import org.apache.spark.sql.functions._
 
# define a STRING, DOUBLE, DOUBLE, STRING schema for the data
val schema = StructType(List(StructField("column1", StringType, true), 
StructField("column2", DoubleType, true), StructField("column3", DoubleType, 
true), StructField("column4", StringType, true)))
 
# add a column for corrupt records to the schema
val schemaWithCorrupt = StructType(schema.fields :+ 
StructField("_corrupt_record", StringType, true)) 
 
# read the CSV with the schema, headers, permissive parsing, and the corrupt 
record column
val df = spark.read.option("header", "true").option("mode", 
"PERMISSIVE").option("columnNameOfCorruptRecord", 
"_corrupt_record").schema(schemaWithCorrupt).csv("test.csv") 
 
# define a UDF to count the commas in the corrupt record column
val countCommas = udf((s: String) => if (s != null) s.count(_ == ',') else -1) 
 
# add a true/false column for whether the number of commas is 3
val dfWithJagged = df.withColumn("__is_jagged", 
when(col("_corrupt_record").isNull, 
false).otherwise(countCommas(col("_corrupt_record")) =!= 3))

dfWithJagged.show(){code}
*Returns:*
{code:java}
+---+---+---++---+---+
|column1|column2|column3| column4|_corrupt_record|__is_jagged|
+---+---+---++---+---+
|   four|    5.0|    6.0|   seven|           NULL|      false|
|      8|    9.0|   NULL|    NULL|            8,9|       true|
|    ten|   11.0|   12.0|thirteen|           NULL|      false|
+---+---+---++---+---+ {code}
So far so good...

 

*BUT*

 

*If we add an aggregate before we show:*
{code:java}
import org.apache.spark.sql.types._ 
import org.apache.spark.sql.functions._
val schema = StructType(List(StructField("column1", StringType, true), 
StructField("column2", DoubleType, true), StructField("column3", DoubleType, 
true), StructField("column4", StringType, true)))
val schemaWithCorrupt = StructType(schema.fields :+ 
StructField("_corrupt_record", StringType, true)) 
val df = spark.read.option("header", "true").option("mode", 
"PERMISSIVE").option("columnNameOfCorruptRecord", 
"_corrupt_record").schema(schemaWithCorrupt).csv("test.csv") 
val countCommas = udf((s: String) => if (s != null) s.count(_ == ',') else -1) 
val dfWithJagged = df.withColumn("__is_jagged", 
when(col("_corrupt_record").isNull, 
false).otherwise(countCommas(col("_corrupt_record")) =!= 3))
val dfDropped = dfWithJagged.filter(col("__is_jagged") =!= true)
val groupedSum = 
dfDropped.groupBy("column1").agg(sum("column2").alias("sum_column2"))
groupedSum.show(){code}
*We get:*
{code:java}
+---+---+
|column1|sum_column2|
+---+---+
|      8|        9.0|
|   four|        5.0|
|    ten|       11.0|
+---+---+ {code}
 

*Which is not correct*

 

With the addition of the aggregate, the filter down to rows with 3 commas in 
the corrupt record column is ignored. This does not happed with any other 
operators I have tried - just aggregates so far.

 

 

 

  was:
Using corrupt record in CSV parsing for some data cleaning logic, I came across 
a correctness bug.

 

The following repro can be ran with spark-shell 3.5.1.

*Create test.csv with the following content:*
{code:java}
test,1,2,three
four,5,6,seven
8,9
ten,11,12,thirteen {code}
 

 

*In spark-shell:*
{code:java}
import org.apache.spark.sql.types._ 
import org.apache.spark.sql.functions._
 
# define a STRING, DOUBLE, DOUBLE, STRING schema for the data
val schema = StructType(List(StructField("column1", StringType, true), 
StructField("column2", DoubleType, true), StructField("column3", DoubleType, 
true), StructField("column4", StringType, true)))
 
# add a column for corrupt records to the schema
val schemaWithCorrupt = StructType(schema.fields :+ 
StructField("_corrupt_record", StringType, true)) 
 
# read the CSV with the schema, headers, permissive parsing, and the corrupt 
record column
val df = spark.read.option("header", "true").option("mode", 
"PERMISSIVE").option("columnNameOfCorruptRecord", 
"_corrupt_record").schema(schemaWithCorrupt).csv("test.csv") 
 
# define a UDF to count the commas in the corrupt record column
val countCommas = udf((s: String) => if (s != null) s.count(_ == ',') else -1) 
 
# add a true/false column for whether the number of commas is 3
val dfWithJagged = df.withColumn("__is_jagged", 
when(col("_corrupt_reco

[jira] [Commented] (SPARK-45265) Support Hive 4.0 metastore

2024-05-23 Thread Ankit Prakash Gupta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848924#comment-17848924
 ] 

Ankit Prakash Gupta commented on SPARK-45265:
-

Since Hive 4.0.0 is already GA  ! Maybe I can help on this one.

!image-2024-05-23-17-35-48-426.png!

> Support Hive 4.0 metastore
> --
>
> Key: SPARK-45265
> URL: https://issues.apache.org/jira/browse/SPARK-45265
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Attila Zsolt Piros
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-05-23-17-35-48-426.png
>
>
> Although Hive 4.0.0 is still beta I would like to work on this as Hive 4.0.0 
> will support support the pushdowns of partition column filters with 
> VARCHAR/CHAR types.
> For details please see HIVE-26661: Support partition filter for char and 
> varchar types on Hive metastore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45265) Support Hive 4.0 metastore

2024-05-23 Thread Ankit Prakash Gupta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Prakash Gupta updated SPARK-45265:

Attachment: image-2024-05-23-17-35-48-426.png

> Support Hive 4.0 metastore
> --
>
> Key: SPARK-45265
> URL: https://issues.apache.org/jira/browse/SPARK-45265
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Attila Zsolt Piros
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-05-23-17-35-48-426.png
>
>
> Although Hive 4.0.0 is still beta I would like to work on this as Hive 4.0.0 
> will support support the pushdowns of partition column filters with 
> VARCHAR/CHAR types.
> For details please see HIVE-26661: Support partition filter for char and 
> varchar types on Hive metastore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48395) Fix StructType.treeString for parameterized types

2024-05-23 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-48395.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46711
[https://github.com/apache/spark/pull/46711]

> Fix StructType.treeString for parameterized types
> -
>
> Key: SPARK-48395
> URL: https://issues.apache.org/jira/browse/SPARK-48395
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48401) Spark Connect Go Session Type cannot be used in complex code

2024-05-23 Thread David Sisson (Jira)

David Sisson created SPARK-48401:


 Summary: Spark Connect Go Session Type cannot be used in complex 
code
 Key: SPARK-48401
 URL: https://issues.apache.org/jira/browse/SPARK-48401
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.1
Reporter: David Sisson


The return type of a Spark Session builder is an unexported type.  This means 
that code cannot pass around a sql.sparkSession to perform more complex tasks 
(for instance, a routine that sets up a set of temporary views for later 
processing on).

 
spark, err := sql.SparkSession.Builder.Remote(*remote).Build()
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48399) Teradata: ByteType wrongly mapping to teradata byte(binary) type

2024-05-23 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48399:
---
Labels: pull-request-available  (was: )

> Teradata: ByteType wrongly mapping to teradata byte(binary) type
> 
>
> Key: SPARK-48399
> URL: https://issues.apache.org/jira/browse/SPARK-48399
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48400) Promote `PrometheusServlet` to `DeveloperApi`

2024-05-23 Thread Zhou JIANG (Jira)

Zhou JIANG created SPARK-48400:
--

 Summary: Promote `PrometheusServlet` to `DeveloperApi`
 Key: SPARK-48400
 URL: https://issues.apache.org/jira/browse/SPARK-48400
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: kubernetes-operator-0.1.0
Reporter: Zhou JIANG






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48399) Teradata: ByteType wrongly mapping to teradata byte(binary) type

2024-05-23 Thread Kent Yao (Jira)

Kent Yao created SPARK-48399:


 Summary: Teradata: ByteType wrongly mapping to teradata 
byte(binary) type
 Key: SPARK-48399
 URL: https://issues.apache.org/jira/browse/SPARK-48399
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46090) Support plan fragment level SQL configs in AQE

[jira] [Resolved] (SPARK-46090) Support plan fragment level SQL configs in AQE

[jira] [Resolved] (SPARK-48406) Upgrade commons-cli to 1.8.0

[jira] [Assigned] (SPARK-48406) Upgrade commons-cli to 1.8.0

[jira] [Updated] (SPARK-48408) Simplify `date_format` & `from_unixtime`

[jira] [Created] (SPARK-48408) Simplify `date_format` & `from_unixtime`

[jira] [Resolved] (SPARK-48405) Upgrade `commons-compress` to 1.26.2

[jira] [Assigned] (SPARK-48405) Upgrade `commons-compress` to 1.26.2

[jira] [Updated] (SPARK-48407) Teradata: Document Type Conversion rules between Spark SQL and teradata

[jira] [Created] (SPARK-48407) Teradata: Document Type Conversion rules between Spark SQL and teradata

[jira] [Updated] (SPARK-48406) Upgrade commons-cli to 1.8.0

[jira] [Updated] (SPARK-48283) Implement modified Lowercase operation for UTF8_BINARY_LCASE

[jira] [Created] (SPARK-48406) Upgrade commons-cli to 1.8.0

[jira] [Resolved] (SPARK-48399) Teradata: ByteType wrongly mapping to teradata byte(binary) type

[jira] [Updated] (SPARK-48405) Upgrade `commons-compress` to 1.26.2

[jira] [Created] (SPARK-48405) Upgrade `commons-compress` to 1.26.2

[jira] [Created] (SPARK-48404) Driver and Executor support merge and run in a single jvm

[jira] [Commented] (SPARK-33164) SPIP: add SQL support to "SELECT * (EXCEPT someColumn) FROM .." equivalent to DataSet.dropColumn(someColumn)

[jira] [Commented] (SPARK-48361) Correctness: CSV corrupt record filter with aggregate ignored

[jira] [Updated] (SPARK-48318) Hash join support for strings with collation (complex types)

[jira] [Updated] (SPARK-41049) Nondeterministic expressions have unstable values if they are children of CodegenFallback expressions

[jira] [Created] (SPARK-48403) Fix Upper, Lower, InitCap for UTF8_BINARY_LCASE

[jira] [Updated] (SPARK-48400) Promote `PrometheusServlet` to `DeveloperApi`

[jira] [Commented] (SPARK-48361) Correctness: CSV corrupt record filter with aggregate ignored

[jira] [Updated] (SPARK-48361) Correctness: CSV corrupt record filter with aggregate ignored

[jira] [Commented] (SPARK-45265) Support Hive 4.0 metastore

[jira] [Updated] (SPARK-45265) Support Hive 4.0 metastore

[jira] [Resolved] (SPARK-48395) Fix StructType.treeString for parameterized types

[jira] [Created] (SPARK-48401) Spark Connect Go Session Type cannot be used in complex code

[jira] [Updated] (SPARK-48399) Teradata: ByteType wrongly mapping to teradata byte(binary) type

[jira] [Created] (SPARK-48400) Promote `PrometheusServlet` to `DeveloperApi`

[jira] [Created] (SPARK-48399) Teradata: ByteType wrongly mapping to teradata byte(binary) type

32 matches

Site Navigation

Mail list logo

Footer information