[jira] [Resolved] (SPARK-42724) Upgrade buf to v1.15.1

2023-03-08 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-42724.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40348
[https://github.com/apache/spark/pull/40348]

> Upgrade buf to v1.15.1
> --
>
> Key: SPARK-42724
> URL: https://issues.apache.org/jira/browse/SPARK-42724
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Connect
>Affects Versions: 3.4.1
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42724) Upgrade buf to v1.15.1

2023-03-08 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-42724:
-

Assignee: BingKun Pan

> Upgrade buf to v1.15.1
> --
>
> Key: SPARK-42724
> URL: https://issues.apache.org/jira/browse/SPARK-42724
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Connect
>Affects Versions: 3.4.1
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42626) Add Destructive Iterator for SparkResult

2023-03-08 Thread Tengfei Huang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698220#comment-17698220
 ] 

Tengfei Huang commented on SPARK-42626:
---

I will take a look! Thanks

> Add Destructive Iterator for SparkResult
> 
>
> Key: SPARK-42626
> URL: https://issues.apache.org/jira/browse/SPARK-42626
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> Add a destructive iterator to SparkResult. Instead of keeping everything in 
> memory for the life time of SparkResult object, clean it up as soon as we 
> know we are done with it. We can use this for Dataset.toLocalIterator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42554) Spark Connect Scala Client

2023-03-08 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698216#comment-17698216
 ] 

Yang Jie commented on SPARK-42554:
--

Friendly ping [~ivoson] , [~hvanhovell] has created some tickets related to 
Spark Connect here, feel free to pick up them if you are interested ~

> Spark Connect Scala Client
> --
>
> Key: SPARK-42554
> URL: https://issues.apache.org/jira/browse/SPARK-42554
> Project: Spark
>  Issue Type: Epic
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>
> This is the EPIC to track all the work for the Spark Connect Scala Client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42689) Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-08 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-42689:
---

Assignee: Mridul Muralidharan

> Allow ShuffleDriverComponent to declare if shuffle data is reliably stored
> --
>
> Key: SPARK-42689
> URL: https://issues.apache.org/jira/browse/SPARK-42689
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Mridul Muralidharan
>Assignee: Mridul Muralidharan
>Priority: Major
>
> Currently, if there is an executor node loss, we assume the shuffle data on 
> that node is also lost. This is not necessarily the case if there is a 
> shuffle component managing the shuffle data and reliably maintaining it (for 
> example, in distributed filesystem or in a disaggregated shuffle cluster).
> Downstream projects have patches to Apache Spark in order to workaround this 
> issue, for example Apache Celeborn has 
> [this|https://github.com/apache/incubator-celeborn/blob/main/assets/spark-patch/RSS_RDA_spark3.patch].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42689) Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-08 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-42689.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40307
[https://github.com/apache/spark/pull/40307]

> Allow ShuffleDriverComponent to declare if shuffle data is reliably stored
> --
>
> Key: SPARK-42689
> URL: https://issues.apache.org/jira/browse/SPARK-42689
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Mridul Muralidharan
>Assignee: Mridul Muralidharan
>Priority: Major
> Fix For: 3.5.0
>
>
> Currently, if there is an executor node loss, we assume the shuffle data on 
> that node is also lost. This is not necessarily the case if there is a 
> shuffle component managing the shuffle data and reliably maintaining it (for 
> example, in distributed filesystem or in a disaggregated shuffle cluster).
> Downstream projects have patches to Apache Spark in order to workaround this 
> issue, for example Apache Celeborn has 
> [this|https://github.com/apache/incubator-celeborn/blob/main/assets/spark-patch/RSS_RDA_spark3.patch].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42690) Implement CSV/JSON parsing funcions

2023-03-08 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-42690:
-

Assignee: Yang Jie

> Implement CSV/JSON parsing funcions
> ---
>
> Key: SPARK-42690
> URL: https://issues.apache.org/jira/browse/SPARK-42690
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Yang Jie
>Priority: Major
>
> Implement the following two methods in DataFrameReader:
>  
>  
> {code:java}
> /**
> * Loads a `Dataset[String]` storing JSON objects ( href="http://jsonlines.org/;>JSON Lines
> * text format or newline-delimited JSON) and returns the result as a 
> `DataFrame`.
> *
> * Unless the schema is specified using `schema` function, this function goes 
> through the
> * input once to determine the input schema.
> *
> * @param jsonDataset input Dataset with one JSON object per record
> * @since 3.4.0
> */
> def json(jsonDataset: Dataset[String]): DataFrame
> /**
> * Loads an `Dataset[String]` storing CSV rows and returns the result as a 
> `DataFrame`.
> *
> * If the schema is not specified using `schema` function and `inferSchema` 
> option is enabled,
> * this function goes through the input once to determine the input schema.
> *
> * If the schema is not specified using `schema` function and `inferSchema` 
> option is disabled,
> * it determines the columns as string types and it reads only the first line 
> to determine the
> * names and the number of fields.
> *
> * If the enforceSchema is set to `false`, only the CSV header in the first 
> line is checked
> * to conform specified or inferred schema.
> *
> * @note if `header` option is set to `true` when calling this API, all lines 
> same with
> * the header will be removed if exists.
> *
> * @param csvDataset input Dataset with one CSV row per record
> * @since 3.4.0
> */
> def csv(csvDataset: Dataset[String]): DataFrame
> {code}
>  
> For this we need a new message. We cannot use project because we don't know 
> the schema upfront.
>  
> {code:java}
> message Parse {
>   // (Required) Input relation to Parse. The input is expected to have single 
> text column.
>   Relation input = 1;
>   // (Required) The expected format of the text.
>   ParseFormat format = 2;
>   enum ParseFormat {
> PARSE_FORMAT_UNSPECIFIED = 0;
> PARSE_FORMAT_CSV = 1;
> PARSE_FORMAT_JSON = 2;
>   }
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42690) Implement CSV/JSON parsing funcions

2023-03-08 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-42690.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40332
[https://github.com/apache/spark/pull/40332]

> Implement CSV/JSON parsing funcions
> ---
>
> Key: SPARK-42690
> URL: https://issues.apache.org/jira/browse/SPARK-42690
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.4.0
>
>
> Implement the following two methods in DataFrameReader:
>  
>  
> {code:java}
> /**
> * Loads a `Dataset[String]` storing JSON objects ( href="http://jsonlines.org/;>JSON Lines
> * text format or newline-delimited JSON) and returns the result as a 
> `DataFrame`.
> *
> * Unless the schema is specified using `schema` function, this function goes 
> through the
> * input once to determine the input schema.
> *
> * @param jsonDataset input Dataset with one JSON object per record
> * @since 3.4.0
> */
> def json(jsonDataset: Dataset[String]): DataFrame
> /**
> * Loads an `Dataset[String]` storing CSV rows and returns the result as a 
> `DataFrame`.
> *
> * If the schema is not specified using `schema` function and `inferSchema` 
> option is enabled,
> * this function goes through the input once to determine the input schema.
> *
> * If the schema is not specified using `schema` function and `inferSchema` 
> option is disabled,
> * it determines the columns as string types and it reads only the first line 
> to determine the
> * names and the number of fields.
> *
> * If the enforceSchema is set to `false`, only the CSV header in the first 
> line is checked
> * to conform specified or inferred schema.
> *
> * @note if `header` option is set to `true` when calling this API, all lines 
> same with
> * the header will be removed if exists.
> *
> * @param csvDataset input Dataset with one CSV row per record
> * @since 3.4.0
> */
> def csv(csvDataset: Dataset[String]): DataFrame
> {code}
>  
> For this we need a new message. We cannot use project because we don't know 
> the schema upfront.
>  
> {code:java}
> message Parse {
>   // (Required) Input relation to Parse. The input is expected to have single 
> text column.
>   Relation input = 1;
>   // (Required) The expected format of the text.
>   ParseFormat format = 2;
>   enum ParseFormat {
> PARSE_FORMAT_UNSPECIFIED = 0;
> PARSE_FORMAT_CSV = 1;
> PARSE_FORMAT_JSON = 2;
>   }
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42725) Make LiteralExpression support array

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698207#comment-17698207
 ] 

Apache Spark commented on SPARK-42725:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40349

> Make LiteralExpression support array
> 
>
> Key: SPARK-42725
> URL: https://issues.apache.org/jira/browse/SPARK-42725
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42725) Make LiteralExpression support array

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42725:


Assignee: (was: Apache Spark)

> Make LiteralExpression support array
> 
>
> Key: SPARK-42725
> URL: https://issues.apache.org/jira/browse/SPARK-42725
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42725) Make LiteralExpression support array

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42725:


Assignee: Apache Spark

> Make LiteralExpression support array
> 
>
> Key: SPARK-42725
> URL: https://issues.apache.org/jira/browse/SPARK-42725
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42725) Make LiteralExpression support array

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698206#comment-17698206
 ] 

Apache Spark commented on SPARK-42725:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40349

> Make LiteralExpression support array
> 
>
> Key: SPARK-42725
> URL: https://issues.apache.org/jira/browse/SPARK-42725
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42725) Make LiteralExpression support array

2023-03-08 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-42725:
-

 Summary: Make LiteralExpression support array
 Key: SPARK-42725
 URL: https://issues.apache.org/jira/browse/SPARK-42725
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42701) Add the try_aes_decrypt() function

2023-03-08 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-42701.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40340
[https://github.com/apache/spark/pull/40340]

> Add the try_aes_decrypt() function
> --
>
> Key: SPARK-42701
> URL: https://issues.apache.org/jira/browse/SPARK-42701
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: starter
> Fix For: 3.5.0
>
>
> Add new function try_aes_decrypt(). The function aes_decrypt() fails w/ an 
> exception when it faces to a column value which it cannot decrypt. So, if a 
> column contains bad and good input, it is impossible to decrypt even good 
> input.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42717) Upgrade mysql-connector-java from 8.0.31 to 8.0.32

2023-03-08 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-42717:


Assignee: BingKun Pan

> Upgrade mysql-connector-java from 8.0.31 to 8.0.32
> --
>
> Key: SPARK-42717
> URL: https://issues.apache.org/jira/browse/SPARK-42717
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42717) Upgrade mysql-connector-java from 8.0.31 to 8.0.32

2023-03-08 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-42717.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40335
[https://github.com/apache/spark/pull/40335]

> Upgrade mysql-connector-java from 8.0.31 to 8.0.32
> --
>
> Key: SPARK-42717
> URL: https://issues.apache.org/jira/browse/SPARK-42717
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42697) /api/v1/applications return 0 for duration

2023-03-08 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-42697.
--
Fix Version/s: 3.3.3
   3.2.4
   3.4.0
   Resolution: Fixed

Issue resolved by pull request 40313
[https://github.com/apache/spark/pull/40313]

> /api/v1/applications return 0 for duration
> --
>
> Key: SPARK-42697
> URL: https://issues.apache.org/jira/browse/SPARK-42697
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.1.3, 3.2.3, 3.3.2, 3.4.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.3.3, 3.2.4, 3.4.0
>
>
> which should be total uptime



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42697) /api/v1/applications return 0 for duration

2023-03-08 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-42697:


Assignee: Kent Yao

> /api/v1/applications return 0 for duration
> --
>
> Key: SPARK-42697
> URL: https://issues.apache.org/jira/browse/SPARK-42697
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.1.3, 3.2.3, 3.3.2, 3.4.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> which should be total uptime



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42703) How to use Fair Scheduler Pools

2023-03-08 Thread LiJie2023 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698163#comment-17698163
 ] 

LiJie2023 commented on SPARK-42703:
---

Sorry, this "SPARK-42703" is also submitted by me. I haven't got the correct 
answer yet.
李杰
leedd1...@163.com
 回复的原邮件 

[ 
https://issues.apache.org/jira/browse/SPARK-42703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon 解决了 SPARK-42703。
-
解决结果: Invalid

How to use Fair Scheduler Pools
---

关键字: SPARK-42703
URL: https://issues.apache.org/jira/browse/SPARK-42703
项目: Spark
问题类型: Question
模块: Scheduler
影响版本: 3.2.3
报告人: LiJie2023
优先级: 重要
附件: image-2023-03-08-09-53-35-867.png


I have two questions to ask:
#   I wrote a demo referring to the official website, but it didn't meet my 
expectations. I don't know if there was a problem with my writing.I hope that 
when I use the following fairscheduler.xml, pool1 always performs tasks before 
pool2
#What is the relationship between "spark.scheduler.mode" and 
"{{{}schedulingMode{}}}" in fairscheduler.xml?
 
 
{code:java}
object MultiJobTest {
def main(args: Array[String]): Unit = {
val conf = new SparkConf()
conf.setAppName("test-pool").setMaster("local[1]")
conf.set("spark.scheduler.mode", "FAIR")
conf.set("spark.scheduler.allocation.file", 
"file:///D:/tmp/input/fairscheduler.xml")
val sparkContext = new SparkContext(conf)
val data: RDD[String] = sparkContext.textFile("file:///D:/tmp/input/input.txt")
val rdd = data.flatMap(_.split(","))
.map(x => (x(0), x(0)))
new Thread(() => {
sparkContext.setLocalProperty("spark.scheduler.pool", "pool1")
rdd.foreachAsync(x => {
println("1==start==" + new SimpleDateFormat("HH:mm:ss").format(new Date()))
Thread.sleep(1)
println("1==end==" + new SimpleDateFormat("HH:mm:ss").format(new Date()))
})
}).start()
new Thread(() => {
sparkContext.setLocalProperty("spark.scheduler.pool", "pool2")
rdd.foreachAsync(x => {
println("2==start==" + new SimpleDateFormat("HH:mm:ss").format(new Date()))
Thread.sleep(1)
println("2==end==" + new SimpleDateFormat("HH:mm:ss").format(new Date()))
})
}).start()
TimeUnit.MINUTES.sleep(2)
sparkContext.stop()
}
} {code}
 
fairscheduler.xml
 
{code:java}

  
FAIR
100
0
  
  
FAIR
1
0
  
 {code}
 
 
input.txt
 
{code:java}
aa bb {code}
 
 
 



--
这条信息是由Atlassian Jira发送的
(v8.20.10#820010)


> How to use Fair Scheduler Pools
> ---
>
> Key: SPARK-42703
> URL: https://issues.apache.org/jira/browse/SPARK-42703
> Project: Spark
>  Issue Type: Question
>  Components: Scheduler
>Affects Versions: 3.2.3
>Reporter: LiJie2023
>Priority: Major
> Attachments: image-2023-03-08-09-53-35-867.png
>
>
> I have two questions to ask:
>  #   I wrote a demo referring to the official website, but it didn't meet my 
> expectations. I don't know if there was a problem with my writing.I hope that 
> when I use the following fairscheduler.xml, pool1 always performs tasks 
> before pool2
>  #    What is the relationship between "spark.scheduler.mode" and 
> "{{{}schedulingMode{}}}" in fairscheduler.xml?
>  
>  
> {code:java}
> object MultiJobTest {
>   def main(args: Array[String]): Unit = {
> val conf = new SparkConf()
> conf.setAppName("test-pool").setMaster("local[1]")
> conf.set("spark.scheduler.mode", "FAIR")
> conf.set("spark.scheduler.allocation.file", 
> "file:///D:/tmp/input/fairscheduler.xml")
> val sparkContext = new SparkContext(conf)
> val data: RDD[String] = 
> sparkContext.textFile("file:///D:/tmp/input/input.txt")
> val rdd = data.flatMap(_.split(","))
>   .map(x => (x(0), x(0)))
> new Thread(() => {
>   sparkContext.setLocalProperty("spark.scheduler.pool", "pool1")
>   rdd.foreachAsync(x => {
> println("1==start==" + new 
> SimpleDateFormat("HH:mm:ss").format(new Date()))
> Thread.sleep(1)
> println("1==end==" + new SimpleDateFormat("HH:mm:ss").format(new 
> Date()))
>   })
> }).start()
> new Thread(() => {
>   sparkContext.setLocalProperty("spark.scheduler.pool", "pool2")
>   rdd.foreachAsync(x => {
> println("2==start==" + new 
> SimpleDateFormat("HH:mm:ss").format(new Date()))
> Thread.sleep(1)
> println("2==end==" + new SimpleDateFormat("HH:mm:ss").format(new 
> Date()))
>   })
> }).start()
> TimeUnit.MINUTES.sleep(2)
> sparkContext.stop()
>   }
> } {code}
>  
> fairscheduler.xml
>  
> {code:java}
> 
>   
>     FAIR
>     100
>     0
>   
>   
>     FAIR
>     1
>     0
>   
>  {code}
>  
>  
> input.txt
>  
> {code:java}
> aa bb {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (SPARK-42723) Support parser data type json "timestamp_ltz" as TimestampType

2023-03-08 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-42723.

Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40345
[https://github.com/apache/spark/pull/40345]

> Support parser data type json "timestamp_ltz" as TimestampType
> --
>
> Key: SPARK-42723
> URL: https://issues.apache.org/jira/browse/SPARK-42723
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42703) How to use Fair Scheduler Pools

2023-03-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42703.
--
Resolution: Invalid

> How to use Fair Scheduler Pools
> ---
>
> Key: SPARK-42703
> URL: https://issues.apache.org/jira/browse/SPARK-42703
> Project: Spark
>  Issue Type: Question
>  Components: Scheduler
>Affects Versions: 3.2.3
>Reporter: LiJie2023
>Priority: Major
> Attachments: image-2023-03-08-09-53-35-867.png
>
>
> I have two questions to ask:
>  #   I wrote a demo referring to the official website, but it didn't meet my 
> expectations. I don't know if there was a problem with my writing.I hope that 
> when I use the following fairscheduler.xml, pool1 always performs tasks 
> before pool2
>  #    What is the relationship between "spark.scheduler.mode" and 
> "{{{}schedulingMode{}}}" in fairscheduler.xml?
>  
>  
> {code:java}
> object MultiJobTest {
>   def main(args: Array[String]): Unit = {
> val conf = new SparkConf()
> conf.setAppName("test-pool").setMaster("local[1]")
> conf.set("spark.scheduler.mode", "FAIR")
> conf.set("spark.scheduler.allocation.file", 
> "file:///D:/tmp/input/fairscheduler.xml")
> val sparkContext = new SparkContext(conf)
> val data: RDD[String] = 
> sparkContext.textFile("file:///D:/tmp/input/input.txt")
> val rdd = data.flatMap(_.split(","))
>   .map(x => (x(0), x(0)))
> new Thread(() => {
>   sparkContext.setLocalProperty("spark.scheduler.pool", "pool1")
>   rdd.foreachAsync(x => {
> println("1==start==" + new 
> SimpleDateFormat("HH:mm:ss").format(new Date()))
> Thread.sleep(1)
> println("1==end==" + new SimpleDateFormat("HH:mm:ss").format(new 
> Date()))
>   })
> }).start()
> new Thread(() => {
>   sparkContext.setLocalProperty("spark.scheduler.pool", "pool2")
>   rdd.foreachAsync(x => {
> println("2==start==" + new 
> SimpleDateFormat("HH:mm:ss").format(new Date()))
> Thread.sleep(1)
> println("2==end==" + new SimpleDateFormat("HH:mm:ss").format(new 
> Date()))
>   })
> }).start()
> TimeUnit.MINUTES.sleep(2)
> sparkContext.stop()
>   }
> } {code}
>  
> fairscheduler.xml
>  
> {code:java}
> 
>   
>     FAIR
>     100
>     0
>   
>   
>     FAIR
>     1
>     0
>   
>  {code}
>  
>  
> input.txt
>  
> {code:java}
> aa bb {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42497) Support of pandas API on Spark for Spark Connect

2023-03-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42497:
-
Summary: Support of pandas API on Spark for Spark Connect  (was: Support of 
pandas API on Spark for Spark Connect.)

> Support of pandas API on Spark for Spark Connect
> 
>
> Key: SPARK-42497
> URL: https://issues.apache.org/jira/browse/SPARK-42497
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should enable `pandas API on Spark` on Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42711) build/sbt usage error messages about java-home

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698155#comment-17698155
 ] 

Apache Spark commented on SPARK-42711:
--

User 'liang3zy22' has created a pull request for this issue:
https://github.com/apache/spark/pull/40347

> build/sbt usage error messages about java-home
> --
>
> Key: SPARK-42711
> URL: https://issues.apache.org/jira/browse/SPARK-42711
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.2
>Reporter: Liang Yan
>Priority: Minor
>
> The build/sbt tool's usage information about java-home is wrong:
>   # java version (default: java from PATH, currently $(java -version 2>&1 | 
> grep version))
>   -java-home          alternate JAVA_HOME



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42711) build/sbt usage error messages about java-home

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42711:


Assignee: (was: Apache Spark)

> build/sbt usage error messages about java-home
> --
>
> Key: SPARK-42711
> URL: https://issues.apache.org/jira/browse/SPARK-42711
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.2
>Reporter: Liang Yan
>Priority: Minor
>
> The build/sbt tool's usage information about java-home is wrong:
>   # java version (default: java from PATH, currently $(java -version 2>&1 | 
> grep version))
>   -java-home          alternate JAVA_HOME



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42711) build/sbt usage error messages about java-home

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42711:


Assignee: Apache Spark

> build/sbt usage error messages about java-home
> --
>
> Key: SPARK-42711
> URL: https://issues.apache.org/jira/browse/SPARK-42711
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.2
>Reporter: Liang Yan
>Assignee: Apache Spark
>Priority: Minor
>
> The build/sbt tool's usage information about java-home is wrong:
>   # java version (default: java from PATH, currently $(java -version 2>&1 | 
> grep version))
>   -java-home          alternate JAVA_HOME



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42724) Upgrade buf to v1.15.1

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42724:


Assignee: (was: Apache Spark)

> Upgrade buf to v1.15.1
> --
>
> Key: SPARK-42724
> URL: https://issues.apache.org/jira/browse/SPARK-42724
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Connect
>Affects Versions: 3.4.1
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42724) Upgrade buf to v1.15.1

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698156#comment-17698156
 ] 

Apache Spark commented on SPARK-42724:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40348

> Upgrade buf to v1.15.1
> --
>
> Key: SPARK-42724
> URL: https://issues.apache.org/jira/browse/SPARK-42724
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Connect
>Affects Versions: 3.4.1
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42724) Upgrade buf to v1.15.1

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42724:


Assignee: Apache Spark

> Upgrade buf to v1.15.1
> --
>
> Key: SPARK-42724
> URL: https://issues.apache.org/jira/browse/SPARK-42724
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Connect
>Affects Versions: 3.4.1
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42724) Upgrade buf to v1.15.1

2023-03-08 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-42724:
---

 Summary: Upgrade buf to v1.15.1
 Key: SPARK-42724
 URL: https://issues.apache.org/jira/browse/SPARK-42724
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Connect
Affects Versions: 3.4.1
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42643) Register Java (aggregate) user-defined functions

2023-03-08 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42643:
-
Parent: SPARK-41661
Issue Type: Sub-task  (was: Improvement)

> Register Java (aggregate) user-defined functions
> 
>
> Key: SPARK-42643
> URL: https://issues.apache.org/jira/browse/SPARK-42643
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.4.0
>
>
> Implement `spark.udf.registerJavaFunction`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42722) Python Connect def schema() should not cache the schema

2023-03-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42722.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40343
[https://github.com/apache/spark/pull/40343]

> Python Connect def schema() should not cache the schema 
> 
>
> Key: SPARK-42722
> URL: https://issues.apache.org/jira/browse/SPARK-42722
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42480) Improve the performance of drop partitions

2023-03-08 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated SPARK-42480:
-
Fix Version/s: 3.4.0

> Improve the performance of drop partitions
> --
>
> Key: SPARK-42480
> URL: https://issues.apache.org/jira/browse/SPARK-42480
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
> Fix For: 3.4.0, 3.5.0
>
>
> Currently to drop the matching partitions, Spark will first get all matching 
> Partition objects from Hive metastore, and just use the partition values of 
> these Partition objects.
> We can get the matching partition names instead of the partition objects for 
> the following reasons:
> 1. we can also get partition values through a partition name (like a=1/b=2)
> 2. the byte size of partition name is much smaller than partition object, 
> which will help improve the performance of drop partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42480) Improve the performance of drop partitions

2023-03-08 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated SPARK-42480:
-
Fix Version/s: (was: 3.5.0)

> Improve the performance of drop partitions
> --
>
> Key: SPARK-42480
> URL: https://issues.apache.org/jira/browse/SPARK-42480
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently to drop the matching partitions, Spark will first get all matching 
> Partition objects from Hive metastore, and just use the partition values of 
> these Partition objects.
> We can get the matching partition names instead of the partition objects for 
> the following reasons:
> 1. we can also get partition values through a partition name (like a=1/b=2)
> 2. the byte size of partition name is much smaller than partition object, 
> which will help improve the performance of drop partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42480) Improve the performance of drop partitions

2023-03-08 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned SPARK-42480:


Assignee: Wechar

> Improve the performance of drop partitions
> --
>
> Key: SPARK-42480
> URL: https://issues.apache.org/jira/browse/SPARK-42480
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>
> Currently to drop the matching partitions, Spark will first get all matching 
> Partition objects from Hive metastore, and just use the partition values of 
> these Partition objects.
> We can get the matching partition names instead of the partition objects for 
> the following reasons:
> 1. we can also get partition values through a partition name (like a=1/b=2)
> 2. the byte size of partition name is much smaller than partition object, 
> which will help improve the performance of drop partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42480) Improve the performance of drop partitions

2023-03-08 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved SPARK-42480.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40069
[https://github.com/apache/spark/pull/40069]

> Improve the performance of drop partitions
> --
>
> Key: SPARK-42480
> URL: https://issues.apache.org/jira/browse/SPARK-42480
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
> Fix For: 3.5.0
>
>
> Currently to drop the matching partitions, Spark will first get all matching 
> Partition objects from Hive metastore, and just use the partition values of 
> these Partition objects.
> We can get the matching partition names instead of the partition objects for 
> the following reasons:
> 1. we can also get partition values through a partition name (like a=1/b=2)
> 2. the byte size of partition name is much smaller than partition object, 
> which will help improve the performance of drop partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42667) Spark Connect: newSession API

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698119#comment-17698119
 ] 

Apache Spark commented on SPARK-42667:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40346

> Spark Connect: newSession API
> -
>
> Key: SPARK-42667
> URL: https://issues.apache.org/jira/browse/SPARK-42667
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42723) Support parser data type json "timestamp_ltz" as TimestampType

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698096#comment-17698096
 ] 

Apache Spark commented on SPARK-42723:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40345

> Support parser data type json "timestamp_ltz" as TimestampType
> --
>
> Key: SPARK-42723
> URL: https://issues.apache.org/jira/browse/SPARK-42723
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42723) Support parser data type json "timestamp_ltz" as TimestampType

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42723:


Assignee: Apache Spark  (was: Gengliang Wang)

> Support parser data type json "timestamp_ltz" as TimestampType
> --
>
> Key: SPARK-42723
> URL: https://issues.apache.org/jira/browse/SPARK-42723
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42723) Support parser data type json "timestamp_ltz" as TimestampType

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42723:


Assignee: Gengliang Wang  (was: Apache Spark)

> Support parser data type json "timestamp_ltz" as TimestampType
> --
>
> Key: SPARK-42723
> URL: https://issues.apache.org/jira/browse/SPARK-42723
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42723) Support parser data type json "timestamp_ltz" as TimestampType

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698095#comment-17698095
 ] 

Apache Spark commented on SPARK-42723:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40345

> Support parser data type json "timestamp_ltz" as TimestampType
> --
>
> Key: SPARK-42723
> URL: https://issues.apache.org/jira/browse/SPARK-42723
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42723) Support parser data type json "timestamp_ltz" as TimestampType

2023-03-08 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-42723:
--

 Summary: Support parser data type json "timestamp_ltz" as 
TimestampType
 Key: SPARK-42723
 URL: https://issues.apache.org/jira/browse/SPARK-42723
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42656) Spark Connect Scala Client Shell Script

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698081#comment-17698081
 ] 

Apache Spark commented on SPARK-42656:
--

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/40344

> Spark Connect Scala Client Shell Script
> ---
>
> Key: SPARK-42656
> URL: https://issues.apache.org/jira/browse/SPARK-42656
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Adding a shell script to run scala client in a scala REPL to allow users to 
> connect to spark connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42722) Python Connect def schema() should not cache the schema

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698073#comment-17698073
 ] 

Apache Spark commented on SPARK-42722:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40343

> Python Connect def schema() should not cache the schema 
> 
>
> Key: SPARK-42722
> URL: https://issues.apache.org/jira/browse/SPARK-42722
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42722) Python Connect def schema() should not cache the schema

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42722:


Assignee: Apache Spark  (was: Rui Wang)

> Python Connect def schema() should not cache the schema 
> 
>
> Key: SPARK-42722
> URL: https://issues.apache.org/jira/browse/SPARK-42722
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42722) Python Connect def schema() should not cache the schema

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698072#comment-17698072
 ] 

Apache Spark commented on SPARK-42722:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40343

> Python Connect def schema() should not cache the schema 
> 
>
> Key: SPARK-42722
> URL: https://issues.apache.org/jira/browse/SPARK-42722
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42722) Python Connect def schema() should not cache the schema

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42722:


Assignee: Rui Wang  (was: Apache Spark)

> Python Connect def schema() should not cache the schema 
> 
>
> Key: SPARK-42722
> URL: https://issues.apache.org/jira/browse/SPARK-42722
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42722) Python Connect def schema() should not cache the schema

2023-03-08 Thread Rui Wang (Jira)
Rui Wang created SPARK-42722:


 Summary: Python Connect def schema() should not cache the schema 
 Key: SPARK-42722
 URL: https://issues.apache.org/jira/browse/SPARK-42722
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42721) Add an Interceptor to log RPCs in connect-server

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698057#comment-17698057
 ] 

Apache Spark commented on SPARK-42721:
--

User 'rangadi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40342

> Add an Interceptor to log RPCs in connect-server
> 
>
> Key: SPARK-42721
> URL: https://issues.apache.org/jira/browse/SPARK-42721
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Priority: Major
> Fix For: 3.5.0
>
>
> It would be useful to be able to log RPC to connect server during 
> development. It makes simpler to see the flow of messages. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42721) Add an Interceptor to log RPCs in connect-server

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42721:


Assignee: (was: Apache Spark)

> Add an Interceptor to log RPCs in connect-server
> 
>
> Key: SPARK-42721
> URL: https://issues.apache.org/jira/browse/SPARK-42721
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Priority: Major
> Fix For: 3.5.0
>
>
> It would be useful to be able to log RPC to connect server during 
> development. It makes simpler to see the flow of messages. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42721) Add an Interceptor to log RPCs in connect-server

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42721:


Assignee: Apache Spark

> Add an Interceptor to log RPCs in connect-server
> 
>
> Key: SPARK-42721
> URL: https://issues.apache.org/jira/browse/SPARK-42721
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.5.0
>
>
> It would be useful to be able to log RPC to connect server during 
> development. It makes simpler to see the flow of messages. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42721) Add an Interceptor to log RPCs in connect-server

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17698055#comment-17698055
 ] 

Apache Spark commented on SPARK-42721:
--

User 'rangadi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40342

> Add an Interceptor to log RPCs in connect-server
> 
>
> Key: SPARK-42721
> URL: https://issues.apache.org/jira/browse/SPARK-42721
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Priority: Major
> Fix For: 3.5.0
>
>
> It would be useful to be able to log RPC to connect server during 
> development. It makes simpler to see the flow of messages. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42721) Add an Interceptor to log RPCs in connect-server

2023-03-08 Thread Raghu Angadi (Jira)
Raghu Angadi created SPARK-42721:


 Summary: Add an Interceptor to log RPCs in connect-server
 Key: SPARK-42721
 URL: https://issues.apache.org/jira/browse/SPARK-42721
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Raghu Angadi
 Fix For: 3.5.0


It would be useful to be able to log RPC to connect server during development. 
It makes simpler to see the flow of messages. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42689) Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42689:
--
Affects Version/s: 3.5.0
   (was: 3.1.0)
   (was: 3.2.0)
   (was: 3.3.0)
   (was: 3.4.0)

> Allow ShuffleDriverComponent to declare if shuffle data is reliably stored
> --
>
> Key: SPARK-42689
> URL: https://issues.apache.org/jira/browse/SPARK-42689
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Mridul Muralidharan
>Priority: Major
>
> Currently, if there is an executor node loss, we assume the shuffle data on 
> that node is also lost. This is not necessarily the case if there is a 
> shuffle component managing the shuffle data and reliably maintaining it (for 
> example, in distributed filesystem or in a disaggregated shuffle cluster).
> Downstream projects have patches to Apache Spark in order to workaround this 
> issue, for example Apache Celeborn has 
> [this|https://github.com/apache/incubator-celeborn/blob/main/assets/spark-patch/RSS_RDA_spark3.patch].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42709) Do not rely on __file__

2023-03-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42709.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40328
[https://github.com/apache/spark/pull/40328]

> Do not rely on __file__
> ---
>
> Key: SPARK-42709
> URL: https://issues.apache.org/jira/browse/SPARK-42709
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> We have a lot of places using __file__ which is actually optional. We 
> shouldn't reply on them



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42709) Do not rely on __file__

2023-03-08 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42709:
-

Assignee: Hyukjin Kwon

> Do not rely on __file__
> ---
>
> Key: SPARK-42709
> URL: https://issues.apache.org/jira/browse/SPARK-42709
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> We have a lot of places using __file__ which is actually optional. We 
> shouldn't reply on them



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42715) NegativeArraySizeException by too many datas read from ORC file

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697999#comment-17697999
 ] 

Apache Spark commented on SPARK-42715:
--

User 'chong0929' has created a pull request for this issue:
https://github.com/apache/spark/pull/40341

> NegativeArraySizeException by too many datas read from ORC file
> ---
>
> Key: SPARK-42715
> URL: https://issues.apache.org/jira/browse/SPARK-42715
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: XiaoLong Wu
>Priority: Minor
>
> If need more friendly exception msg about how to avoid this exception? Like 
> when we catch this expetion, told user can reduce the value about 
> spark.sql.orc.columnarReaderBatchSize;
> In the current version, for batch reading of orc files, we use the function 
> OrcColumnarBatchReader.nextBatch() to do this and depends on 
> [ORC|https://github.com/apache/orc](version:1.8.2) to completed data copy, in 
> ORC relevant code is as follows:
> {code:java}
> private static byte[] commonReadByteArrays(InStream stream, IntegerReader 
> lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) throws IOException {
>   // Read lengths
>   scratchlcv.isRepeating = result.isRepeating;
>   scratchlcv.noNulls = result.noNulls;
>   scratchlcv.isNull = result.isNull;  // Notice we are replacing the isNull 
> vector here...
>   lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);
>   int totalLength = 0;
>   if (!scratchlcv.isRepeating) {
> for (int i = 0; i < batchSize; i++) {
>   if (!scratchlcv.isNull[i]) {
> totalLength += (int) scratchlcv.vector[i];
>   }
> }
>   } else {
> if (!scratchlcv.isNull[0]) {
>   totalLength = (int) (batchSize * scratchlcv.vector[0]);
> }
>   }
>   // Read all the strings for this batch
>   byte[] allBytes = new byte[totalLength];
>   int offset = 0;
>   int len = totalLength;
>   while (len > 0) {
> int bytesRead = stream.read(allBytes, offset, len);
> if (bytesRead < 0) {
>   throw new EOFException("Can't finish byte read from " + stream);
> }
> len -= bytesRead;
> offset += bytesRead;
>   }
>   return allBytes;
> } {code}
>  As shown above, totalLength as a Long type param is used to mark the data 
> size. If the data size too big to over max_int, converting to int will lead 
> to value overflow and throws the following exception:
> {code:java}
> Caused by: java.lang.NegativeArraySizeException
>     at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1998)
>     at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:2021)
>     at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:2119)
>     at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1962)
>     at 
> org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:65)
>     at 
> org.apache.orc.impl.reader.tree.StructBatchReader.nextBatchForLevel(StructBatchReader.java:100)
>     at 
> org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:77)
>     at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1371)
>     at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197)
>     at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99)
>     at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:274)
>     ... 20 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42715) NegativeArraySizeException by too many datas read from ORC file

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42715:


Assignee: Apache Spark

> NegativeArraySizeException by too many datas read from ORC file
> ---
>
> Key: SPARK-42715
> URL: https://issues.apache.org/jira/browse/SPARK-42715
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: XiaoLong Wu
>Assignee: Apache Spark
>Priority: Minor
>
> If need more friendly exception msg about how to avoid this exception? Like 
> when we catch this expetion, told user can reduce the value about 
> spark.sql.orc.columnarReaderBatchSize;
> In the current version, for batch reading of orc files, we use the function 
> OrcColumnarBatchReader.nextBatch() to do this and depends on 
> [ORC|https://github.com/apache/orc](version:1.8.2) to completed data copy, in 
> ORC relevant code is as follows:
> {code:java}
> private static byte[] commonReadByteArrays(InStream stream, IntegerReader 
> lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) throws IOException {
>   // Read lengths
>   scratchlcv.isRepeating = result.isRepeating;
>   scratchlcv.noNulls = result.noNulls;
>   scratchlcv.isNull = result.isNull;  // Notice we are replacing the isNull 
> vector here...
>   lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);
>   int totalLength = 0;
>   if (!scratchlcv.isRepeating) {
> for (int i = 0; i < batchSize; i++) {
>   if (!scratchlcv.isNull[i]) {
> totalLength += (int) scratchlcv.vector[i];
>   }
> }
>   } else {
> if (!scratchlcv.isNull[0]) {
>   totalLength = (int) (batchSize * scratchlcv.vector[0]);
> }
>   }
>   // Read all the strings for this batch
>   byte[] allBytes = new byte[totalLength];
>   int offset = 0;
>   int len = totalLength;
>   while (len > 0) {
> int bytesRead = stream.read(allBytes, offset, len);
> if (bytesRead < 0) {
>   throw new EOFException("Can't finish byte read from " + stream);
> }
> len -= bytesRead;
> offset += bytesRead;
>   }
>   return allBytes;
> } {code}
>  As shown above, totalLength as a Long type param is used to mark the data 
> size. If the data size too big to over max_int, converting to int will lead 
> to value overflow and throws the following exception:
> {code:java}
> Caused by: java.lang.NegativeArraySizeException
>     at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1998)
>     at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:2021)
>     at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:2119)
>     at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1962)
>     at 
> org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:65)
>     at 
> org.apache.orc.impl.reader.tree.StructBatchReader.nextBatchForLevel(StructBatchReader.java:100)
>     at 
> org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:77)
>     at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1371)
>     at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197)
>     at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99)
>     at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:274)
>     ... 20 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42715) NegativeArraySizeException by too many datas read from ORC file

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42715:


Assignee: (was: Apache Spark)

> NegativeArraySizeException by too many datas read from ORC file
> ---
>
> Key: SPARK-42715
> URL: https://issues.apache.org/jira/browse/SPARK-42715
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: XiaoLong Wu
>Priority: Minor
>
> If need more friendly exception msg about how to avoid this exception? Like 
> when we catch this expetion, told user can reduce the value about 
> spark.sql.orc.columnarReaderBatchSize;
> In the current version, for batch reading of orc files, we use the function 
> OrcColumnarBatchReader.nextBatch() to do this and depends on 
> [ORC|https://github.com/apache/orc](version:1.8.2) to completed data copy, in 
> ORC relevant code is as follows:
> {code:java}
> private static byte[] commonReadByteArrays(InStream stream, IntegerReader 
> lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) throws IOException {
>   // Read lengths
>   scratchlcv.isRepeating = result.isRepeating;
>   scratchlcv.noNulls = result.noNulls;
>   scratchlcv.isNull = result.isNull;  // Notice we are replacing the isNull 
> vector here...
>   lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);
>   int totalLength = 0;
>   if (!scratchlcv.isRepeating) {
> for (int i = 0; i < batchSize; i++) {
>   if (!scratchlcv.isNull[i]) {
> totalLength += (int) scratchlcv.vector[i];
>   }
> }
>   } else {
> if (!scratchlcv.isNull[0]) {
>   totalLength = (int) (batchSize * scratchlcv.vector[0]);
> }
>   }
>   // Read all the strings for this batch
>   byte[] allBytes = new byte[totalLength];
>   int offset = 0;
>   int len = totalLength;
>   while (len > 0) {
> int bytesRead = stream.read(allBytes, offset, len);
> if (bytesRead < 0) {
>   throw new EOFException("Can't finish byte read from " + stream);
> }
> len -= bytesRead;
> offset += bytesRead;
>   }
>   return allBytes;
> } {code}
>  As shown above, totalLength as a Long type param is used to mark the data 
> size. If the data size too big to over max_int, converting to int will lead 
> to value overflow and throws the following exception:
> {code:java}
> Caused by: java.lang.NegativeArraySizeException
>     at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1998)
>     at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:2021)
>     at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:2119)
>     at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1962)
>     at 
> org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:65)
>     at 
> org.apache.orc.impl.reader.tree.StructBatchReader.nextBatchForLevel(StructBatchReader.java:100)
>     at 
> org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:77)
>     at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1371)
>     at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197)
>     at 
> org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99)
>     at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:274)
>     ... 20 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42720) Refactor the withSequenceColumn

2023-03-08 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-42720:
---

 Summary: Refactor the withSequenceColumn
 Key: SPARK-42720
 URL: https://issues.apache.org/jira/browse/SPARK-42720
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42701) Add the try_aes_decrypt() function

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42701:


Assignee: Max Gekk  (was: Apache Spark)

> Add the try_aes_decrypt() function
> --
>
> Key: SPARK-42701
> URL: https://issues.apache.org/jira/browse/SPARK-42701
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: starter
>
> Add new function try_aes_decrypt(). The function aes_decrypt() fails w/ an 
> exception when it faces to a column value which it cannot decrypt. So, if a 
> column contains bad and good input, it is impossible to decrypt even good 
> input.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42701) Add the try_aes_decrypt() function

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42701:


Assignee: Apache Spark  (was: Max Gekk)

> Add the try_aes_decrypt() function
> --
>
> Key: SPARK-42701
> URL: https://issues.apache.org/jira/browse/SPARK-42701
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>  Labels: starter
>
> Add new function try_aes_decrypt(). The function aes_decrypt() fails w/ an 
> exception when it faces to a column value which it cannot decrypt. So, if a 
> column contains bad and good input, it is impossible to decrypt even good 
> input.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42701) Add the try_aes_decrypt() function

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697923#comment-17697923
 ] 

Apache Spark commented on SPARK-42701:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/40340

> Add the try_aes_decrypt() function
> --
>
> Key: SPARK-42701
> URL: https://issues.apache.org/jira/browse/SPARK-42701
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: starter
>
> Add new function try_aes_decrypt(). The function aes_decrypt() fails w/ an 
> exception when it faces to a column value which it cannot decrypt. So, if a 
> column contains bad and good input, it is impossible to decrypt even good 
> input.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42701) Add the try_aes_decrypt() function

2023-03-08 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-42701:


Assignee: Max Gekk

> Add the try_aes_decrypt() function
> --
>
> Key: SPARK-42701
> URL: https://issues.apache.org/jira/browse/SPARK-42701
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: starter
>
> Add new function try_aes_decrypt(). The function aes_decrypt() fails w/ an 
> exception when it faces to a column value which it cannot decrypt. So, if a 
> column contains bad and good input, it is impossible to decrypt even good 
> input.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42701) Add the try_aes_decrypt() function

2023-03-08 Thread Max Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697903#comment-17697903
 ] 

Max Gekk commented on SPARK-42701:
--

I am working on this.

> Add the try_aes_decrypt() function
> --
>
> Key: SPARK-42701
> URL: https://issues.apache.org/jira/browse/SPARK-42701
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: starter
>
> Add new function try_aes_decrypt(). The function aes_decrypt() fails w/ an 
> exception when it faces to a column value which it cannot decrypt. So, if a 
> column contains bad and good input, it is impossible to decrypt even good 
> input.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42719) `MapOutputTracker#getMapLocation` should respect `spark.shuffle.reduceLocality.enabled`

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42719:


Assignee: (was: Apache Spark)

> `MapOutputTracker#getMapLocation` should respect  
> `spark.shuffle.reduceLocality.enabled`
> 
>
> Key: SPARK-42719
> URL: https://issues.apache.org/jira/browse/SPARK-42719
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: He Qi
>Priority: Major
>
> Discuss as [https://github.com/apache/spark/pull/40307]
> {{getPreferredLocations}} in {{ShuffledRowRDD}} should return {{Nil}} at the 
> very beginning in case {{spark.shuffle.reduceLocality.enabled = false}} 
> (conceptually).
> This logic is pushed into MapOutputTracker though - and 
> {{getPreferredLocationsForShuffle}} honors 
> {{spark.shuffle.reduceLocality.enabled}} - but {{getMapLocation}} does not.
> So the fix would be to fix {{getMapLocation}} to honor the parameter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42719) `MapOutputTracker#getMapLocation` should respect `spark.shuffle.reduceLocality.enabled`

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697891#comment-17697891
 ] 

Apache Spark commented on SPARK-42719:
--

User 'jerqi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40339

> `MapOutputTracker#getMapLocation` should respect  
> `spark.shuffle.reduceLocality.enabled`
> 
>
> Key: SPARK-42719
> URL: https://issues.apache.org/jira/browse/SPARK-42719
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: He Qi
>Priority: Major
>
> Discuss as [https://github.com/apache/spark/pull/40307]
> {{getPreferredLocations}} in {{ShuffledRowRDD}} should return {{Nil}} at the 
> very beginning in case {{spark.shuffle.reduceLocality.enabled = false}} 
> (conceptually).
> This logic is pushed into MapOutputTracker though - and 
> {{getPreferredLocationsForShuffle}} honors 
> {{spark.shuffle.reduceLocality.enabled}} - but {{getMapLocation}} does not.
> So the fix would be to fix {{getMapLocation}} to honor the parameter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42719) `MapOutputTracker#getMapLocation` should respect `spark.shuffle.reduceLocality.enabled`

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42719:


Assignee: Apache Spark

> `MapOutputTracker#getMapLocation` should respect  
> `spark.shuffle.reduceLocality.enabled`
> 
>
> Key: SPARK-42719
> URL: https://issues.apache.org/jira/browse/SPARK-42719
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: He Qi
>Assignee: Apache Spark
>Priority: Major
>
> Discuss as [https://github.com/apache/spark/pull/40307]
> {{getPreferredLocations}} in {{ShuffledRowRDD}} should return {{Nil}} at the 
> very beginning in case {{spark.shuffle.reduceLocality.enabled = false}} 
> (conceptually).
> This logic is pushed into MapOutputTracker though - and 
> {{getPreferredLocationsForShuffle}} honors 
> {{spark.shuffle.reduceLocality.enabled}} - but {{getMapLocation}} does not.
> So the fix would be to fix {{getMapLocation}} to honor the parameter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42684) v2 catalog should not allow column default value by default

2023-03-08 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-42684:
---

Assignee: Wenchen Fan

> v2 catalog should not allow column default value by default
> ---
>
> Key: SPARK-42684
> URL: https://issues.apache.org/jira/browse/SPARK-42684
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42684) v2 catalog should not allow column default value by default

2023-03-08 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42684.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40299
[https://github.com/apache/spark/pull/40299]

> v2 catalog should not allow column default value by default
> ---
>
> Key: SPARK-42684
> URL: https://issues.apache.org/jira/browse/SPARK-42684
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42719) `MapOutputTracker#getMapLocation` should respect `spark.shuffle.reduceLocality.enabled`

2023-03-08 Thread He Qi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Qi updated SPARK-42719:
--
Summary: `MapOutputTracker#getMapLocation` should respect  
`spark.shuffle.reduceLocality.enabled`  (was: 
`MapOutputTracker#getPreferredLocations` should respect  
`spark.shuffle.reduceLocality.enabled`)

> `MapOutputTracker#getMapLocation` should respect  
> `spark.shuffle.reduceLocality.enabled`
> 
>
> Key: SPARK-42719
> URL: https://issues.apache.org/jira/browse/SPARK-42719
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: He Qi
>Priority: Major
>
> Discuss as [https://github.com/apache/spark/pull/40307]
> {{getPreferredLocations}} in {{ShuffledRowRDD}} should return {{Nil}} at the 
> very beginning in case {{spark.shuffle.reduceLocality.enabled = false}} 
> (conceptually).
> This logic is pushed into MapOutputTracker though - and 
> {{getPreferredLocationsForShuffle}} honors 
> {{spark.shuffle.reduceLocality.enabled}} - but {{getMapLocation}} does not.
> So the fix would be to fix {{getMapLocation}} to honor the parameter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42719) `Map#getPreferredLocations` should respect `spark.shuffle.reduceLocality.enabled`

2023-03-08 Thread He Qi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Qi updated SPARK-42719:
--
Summary: `Map#getPreferredLocations` should respect  
`spark.shuffle.reduceLocality.enabled`  (was: 
`ShuffledRowRdd#getPreferredLocations` should respect to 
`spark.shuffle.reduceLocality.enabled`)

> `Map#getPreferredLocations` should respect  
> `spark.shuffle.reduceLocality.enabled`
> --
>
> Key: SPARK-42719
> URL: https://issues.apache.org/jira/browse/SPARK-42719
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: He Qi
>Priority: Major
>
> Discuss as [https://github.com/apache/spark/pull/40307]
> {{getPreferredLocations}} in {{ShuffledRowRDD}} should return {{Nil}} at the 
> very beginning in case {{spark.shuffle.reduceLocality.enabled = false}} 
> (conceptually).
> This logic is pushed into MapOutputTracker though - and 
> {{getPreferredLocationsForShuffle}} honors 
> {{spark.shuffle.reduceLocality.enabled}} - but {{getMapLocation}} does not.
> So the fix would be to fix {{getMapLocation}} to honor the parameter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42719) `MapOutputTracker#getPreferredLocations` should respect `spark.shuffle.reduceLocality.enabled`

2023-03-08 Thread He Qi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Qi updated SPARK-42719:
--
Summary: `MapOutputTracker#getPreferredLocations` should respect  
`spark.shuffle.reduceLocality.enabled`  (was: `Map#getPreferredLocations` 
should respect  `spark.shuffle.reduceLocality.enabled`)

> `MapOutputTracker#getPreferredLocations` should respect  
> `spark.shuffle.reduceLocality.enabled`
> ---
>
> Key: SPARK-42719
> URL: https://issues.apache.org/jira/browse/SPARK-42719
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: He Qi
>Priority: Major
>
> Discuss as [https://github.com/apache/spark/pull/40307]
> {{getPreferredLocations}} in {{ShuffledRowRDD}} should return {{Nil}} at the 
> very beginning in case {{spark.shuffle.reduceLocality.enabled = false}} 
> (conceptually).
> This logic is pushed into MapOutputTracker though - and 
> {{getPreferredLocationsForShuffle}} honors 
> {{spark.shuffle.reduceLocality.enabled}} - but {{getMapLocation}} does not.
> So the fix would be to fix {{getMapLocation}} to honor the parameter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42718) Upgrade rocksdbjni to 7.10.2

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42718:


Assignee: Apache Spark

> Upgrade rocksdbjni to 7.10.2
> 
>
> Key: SPARK-42718
> URL: https://issues.apache.org/jira/browse/SPARK-42718
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> https://github.com/facebook/rocksdb/releases/tag/v7.10.2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42718) Upgrade rocksdbjni to 7.10.2

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42718:


Assignee: (was: Apache Spark)

> Upgrade rocksdbjni to 7.10.2
> 
>
> Key: SPARK-42718
> URL: https://issues.apache.org/jira/browse/SPARK-42718
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> https://github.com/facebook/rocksdb/releases/tag/v7.10.2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42718) Upgrade rocksdbjni to 7.10.2

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697875#comment-17697875
 ] 

Apache Spark commented on SPARK-42718:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40337

> Upgrade rocksdbjni to 7.10.2
> 
>
> Key: SPARK-42718
> URL: https://issues.apache.org/jira/browse/SPARK-42718
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> https://github.com/facebook/rocksdb/releases/tag/v7.10.2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42719) `ShuffledRowRdd#getPreferredLocations` should respect to `spark.shuffle.reduceLocality.enabled`

2023-03-08 Thread He Qi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Qi updated SPARK-42719:
--
Description: 
Discuss as [https://github.com/apache/spark/pull/40307]

{{getPreferredLocations}} in {{ShuffledRowRDD}} should return {{Nil}} at the 
very beginning in case {{spark.shuffle.reduceLocality.enabled = false}} 
(conceptually).

This logic is pushed into MapOutputTracker though - and 
{{getPreferredLocationsForShuffle}} honors 
{{spark.shuffle.reduceLocality.enabled}} - but {{getMapLocation}} does not.

So the fix would be to fix {{getMapLocation}} to honor the parameter.

> `ShuffledRowRdd#getPreferredLocations` should respect to 
> `spark.shuffle.reduceLocality.enabled`
> ---
>
> Key: SPARK-42719
> URL: https://issues.apache.org/jira/browse/SPARK-42719
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: He Qi
>Priority: Major
>
> Discuss as [https://github.com/apache/spark/pull/40307]
> {{getPreferredLocations}} in {{ShuffledRowRDD}} should return {{Nil}} at the 
> very beginning in case {{spark.shuffle.reduceLocality.enabled = false}} 
> (conceptually).
> This logic is pushed into MapOutputTracker though - and 
> {{getPreferredLocationsForShuffle}} honors 
> {{spark.shuffle.reduceLocality.enabled}} - but {{getMapLocation}} does not.
> So the fix would be to fix {{getMapLocation}} to honor the parameter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42719) `ShuffledRowRdd#getPreferredLocations` should respect to `spark.shuffle.reduceLocality.enabled`

2023-03-08 Thread He Qi (Jira)
He Qi created SPARK-42719:
-

 Summary: `ShuffledRowRdd#getPreferredLocations` should respect to 
`spark.shuffle.reduceLocality.enabled`
 Key: SPARK-42719
 URL: https://issues.apache.org/jira/browse/SPARK-42719
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: He Qi






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42706) Document the Spark SQL error classes in user-facing documentation.

2023-03-08 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42706:

Summary: Document the Spark SQL error classes in user-facing documentation. 
 (was: List the error class to user-facing documentation.)

> Document the Spark SQL error classes in user-facing documentation.
> --
>
> Key: SPARK-42706
> URL: https://issues.apache.org/jira/browse/SPARK-42706
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We need to have an error class list to user facing documents.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42706) List the error class to user-facing documentation.

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42706:


Assignee: (was: Apache Spark)

> List the error class to user-facing documentation.
> --
>
> Key: SPARK-42706
> URL: https://issues.apache.org/jira/browse/SPARK-42706
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We need to have an error class list to user facing documents.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42706) List the error class to user-facing documentation.

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42706:


Assignee: Apache Spark

> List the error class to user-facing documentation.
> --
>
> Key: SPARK-42706
> URL: https://issues.apache.org/jira/browse/SPARK-42706
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> We need to have an error class list to user facing documents.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42706) List the error class to user-facing documentation.

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697863#comment-17697863
 ] 

Apache Spark commented on SPARK-42706:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40336

> List the error class to user-facing documentation.
> --
>
> Key: SPARK-42706
> URL: https://issues.apache.org/jira/browse/SPARK-42706
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We need to have an error class list to user facing documents.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42718) Upgrade rocksdbjni to 7.10.2

2023-03-08 Thread Yang Jie (Jira)
Yang Jie created SPARK-42718:


 Summary: Upgrade rocksdbjni to 7.10.2
 Key: SPARK-42718
 URL: https://issues.apache.org/jira/browse/SPARK-42718
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.5.0
Reporter: Yang Jie


https://github.com/facebook/rocksdb/releases/tag/v7.10.2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42717) Upgrade mysql-connector-java from 8.0.31 to 8.0.32

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697850#comment-17697850
 ] 

Apache Spark commented on SPARK-42717:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40335

> Upgrade mysql-connector-java from 8.0.31 to 8.0.32
> --
>
> Key: SPARK-42717
> URL: https://issues.apache.org/jira/browse/SPARK-42717
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42717) Upgrade mysql-connector-java from 8.0.31 to 8.0.32

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697851#comment-17697851
 ] 

Apache Spark commented on SPARK-42717:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40335

> Upgrade mysql-connector-java from 8.0.31 to 8.0.32
> --
>
> Key: SPARK-42717
> URL: https://issues.apache.org/jira/browse/SPARK-42717
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42717) Upgrade mysql-connector-java from 8.0.31 to 8.0.32

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42717:


Assignee: Apache Spark

> Upgrade mysql-connector-java from 8.0.31 to 8.0.32
> --
>
> Key: SPARK-42717
> URL: https://issues.apache.org/jira/browse/SPARK-42717
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42717) Upgrade mysql-connector-java from 8.0.31 to 8.0.32

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42717:


Assignee: (was: Apache Spark)

> Upgrade mysql-connector-java from 8.0.31 to 8.0.32
> --
>
> Key: SPARK-42717
> URL: https://issues.apache.org/jira/browse/SPARK-42717
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42717) Upgrade mysql-connector-java from 8.0.31 to 8.0.32

2023-03-08 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-42717:
---

 Summary: Upgrade mysql-connector-java from 8.0.31 to 8.0.32
 Key: SPARK-42717
 URL: https://issues.apache.org/jira/browse/SPARK-42717
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.5.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42716) DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per partition

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697847#comment-17697847
 ] 

Apache Spark commented on SPARK-42716:
--

User 'EnricoMi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40334

> DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per 
> partition
> --
>
> Key: SPARK-42716
> URL: https://issues.apache.org/jira/browse/SPARK-42716
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.4.0, 3.4.1
>Reporter: Enrico Minack
>Priority: Major
>
> From Spark 3.0.0 until 3.2.3, a DataSourceV2 could report its partitioning as 
> {{KeyGroupedPartitioning}} via {{SupportsReportPartitioning}}, even if 
> multiple keys belong to a partition.
> With SPARK-37377, only if all partitions implement {{HasPartitionKey}}, the 
> partition information reported through {{SupportsReportPartitioning}} is 
> considered by catalyst. But this limits the number of keys per partition to 1.
> Spark should continue to support the more general situation of 
> {{KeyGroupedPartitioning}} with multiple keys per partition, like 
> {{HashPartitioning}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42716) DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per partition

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42716:


Assignee: (was: Apache Spark)

> DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per 
> partition
> --
>
> Key: SPARK-42716
> URL: https://issues.apache.org/jira/browse/SPARK-42716
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.4.0, 3.4.1
>Reporter: Enrico Minack
>Priority: Major
>
> From Spark 3.0.0 until 3.2.3, a DataSourceV2 could report its partitioning as 
> {{KeyGroupedPartitioning}} via {{SupportsReportPartitioning}}, even if 
> multiple keys belong to a partition.
> With SPARK-37377, only if all partitions implement {{HasPartitionKey}}, the 
> partition information reported through {{SupportsReportPartitioning}} is 
> considered by catalyst. But this limits the number of keys per partition to 1.
> Spark should continue to support the more general situation of 
> {{KeyGroupedPartitioning}} with multiple keys per partition, like 
> {{HashPartitioning}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42716) DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per partition

2023-03-08 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42716:


Assignee: Apache Spark

> DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per 
> partition
> --
>
> Key: SPARK-42716
> URL: https://issues.apache.org/jira/browse/SPARK-42716
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.4.0, 3.4.1
>Reporter: Enrico Minack
>Assignee: Apache Spark
>Priority: Major
>
> From Spark 3.0.0 until 3.2.3, a DataSourceV2 could report its partitioning as 
> {{KeyGroupedPartitioning}} via {{SupportsReportPartitioning}}, even if 
> multiple keys belong to a partition.
> With SPARK-37377, only if all partitions implement {{HasPartitionKey}}, the 
> partition information reported through {{SupportsReportPartitioning}} is 
> considered by catalyst. But this limits the number of keys per partition to 1.
> Spark should continue to support the more general situation of 
> {{KeyGroupedPartitioning}} with multiple keys per partition, like 
> {{HashPartitioning}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42716) DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per partition

2023-03-08 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697845#comment-17697845
 ] 

Apache Spark commented on SPARK-42716:
--

User 'EnricoMi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40334

> DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per 
> partition
> --
>
> Key: SPARK-42716
> URL: https://issues.apache.org/jira/browse/SPARK-42716
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.4.0, 3.4.1
>Reporter: Enrico Minack
>Priority: Major
>
> From Spark 3.0.0 until 3.2.3, a DataSourceV2 could report its partitioning as 
> {{KeyGroupedPartitioning}} via {{SupportsReportPartitioning}}, even if 
> multiple keys belong to a partition.
> With SPARK-37377, only if all partitions implement {{HasPartitionKey}}, the 
> partition information reported through {{SupportsReportPartitioning}} is 
> considered by catalyst. But this limits the number of keys per partition to 1.
> Spark should continue to support the more general situation of 
> {{KeyGroupedPartitioning}} with multiple keys per partition, like 
> {{HashPartitioning}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42716) DataSourceV2 cannot report KeyGroupedPartitioning with multiple keys per partition

2023-03-08 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-42716:
-

 Summary: DataSourceV2 cannot report KeyGroupedPartitioning with 
multiple keys per partition
 Key: SPARK-42716
 URL: https://issues.apache.org/jira/browse/SPARK-42716
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.2, 3.3.1, 3.3.0, 3.4.0, 3.4.1
Reporter: Enrico Minack


>From Spark 3.0.0 until 3.2.3, a DataSourceV2 could report its partitioning as 
>{{KeyGroupedPartitioning}} via {{SupportsReportPartitioning}}, even if 
>multiple keys belong to a partition.

With SPARK-37377, only if all partitions implement {{HasPartitionKey}}, the 
partition information reported through {{SupportsReportPartitioning}} is 
considered by catalyst. But this limits the number of keys per partition to 1.

Spark should continue to support the more general situation of 
{{KeyGroupedPartitioning}} with multiple keys per partition, like 
{{HashPartitioning}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42713) Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference

2023-03-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42713.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40331
[https://github.com/apache/spark/pull/40331]

> Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference
> 
>
> Key: SPARK-42713
> URL: https://issues.apache.org/jira/browse/SPARK-42713
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42713) Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference

2023-03-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42713:


Assignee: Ruifeng Zheng

> Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference
> 
>
> Key: SPARK-42713
> URL: https://issues.apache.org/jira/browse/SPARK-42713
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42266) Local mode should work with IPython

2023-03-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42266.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40327
[https://github.com/apache/spark/pull/40327]

> Local mode should work with IPython
> ---
>
> Key: SPARK-42266
> URL: https://issues.apache.org/jira/browse/SPARK-42266
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> (spark_dev) ➜  spark git:(master) bin/pyspark --remote "local[*]"
> Python 3.9.15 (main, Nov 24 2022, 08:28:41) 
> Type 'copyright', 'credits' or 'license' for more information
> IPython 8.9.0 -- An enhanced Interactive Python. Type '?' for help.
> /Users/ruifeng.zheng/Dev/spark/python/pyspark/shell.py:45: UserWarning: 
> Failed to initialize Spark session.
>   warnings.warn("Failed to initialize Spark session.")
> Traceback (most recent call last):
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/shell.py", line 40, in 
> 
> spark = SparkSession.builder.getOrCreate()
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/session.py", line 
> 429, in getOrCreate
> from pyspark.sql.connect.session import SparkSession as RemoteSparkSession
>   File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/__init__.py", line 
> 21, in 
> from pyspark.sql.connect.dataframe import DataFrame  # noqa: F401
>   File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/dataframe.py", 
> line 35, in 
> import pandas
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/__init__.py", 
> line 29, in 
> from pyspark.pandas.missing.general_functions import 
> MissingPandasLikeGeneralFunctions
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/__init__.py", 
> line 34, in 
> require_minimum_pandas_version()
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/pandas/utils.py", 
> line 37, in require_minimum_pandas_version
> if LooseVersion(pandas.__version__) < 
> LooseVersion(minimum_pandas_version):
> AttributeError: partially initialized module 'pandas' has no attribute 
> '__version__' (most likely due to a circular import)
> [TerminalIPythonApp] WARNING | Unknown error in handling PYTHONSTARTUP file 
> /Users/ruifeng.zheng/Dev/spark//python/pyspark/shell.py:
> ---
> AttributeErrorTraceback (most recent call last)
> File ~/Dev/spark/python/pyspark/shell.py:40
>  38 try:
>  39 # Creates pyspark.sql.connect.SparkSession.
> ---> 40 spark = SparkSession.builder.getOrCreate()
>  41 except Exception:
> File ~/Dev/spark/python/pyspark/sql/session.py:429, in 
> SparkSession.Builder.getOrCreate(self)
> 428 with SparkContext._lock:
> --> 429 from pyspark.sql.connect.session import SparkSession as 
> RemoteSparkSession
> 431 if (
> 432 SparkContext._active_spark_context is None
> 433 and SparkSession._instantiatedSession is None
> 434 ):
> File ~/Dev/spark/python/pyspark/sql/connect/__init__.py:21
>  18 """Currently Spark Connect is very experimental and the APIs to 
> interact with
>  19 Spark through this API are can be changed at any time without 
> warning."""
> ---> 21 from pyspark.sql.connect.dataframe import DataFrame  # noqa: F401
>  22 from pyspark.sql.pandas.utils import (
>  23 require_minimum_pandas_version,
>  24 require_minimum_pyarrow_version,
>  25 require_minimum_grpc_version,
>  26 )
> File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:35
>  34 import random
> ---> 35 import pandas
>  36 import json
> File ~/Dev/spark/python/pyspark/pandas/__init__.py:29
>  27 from typing import Any
> ---> 29 from pyspark.pandas.missing.general_functions import 
> MissingPandasLikeGeneralFunctions
>  30 from pyspark.pandas.missing.scalars import MissingPandasLikeScalars
> File ~/Dev/spark/python/pyspark/pandas/__init__.py:34
>  33 try:
> ---> 34 require_minimum_pandas_version()
>  35 require_minimum_pyarrow_version()
> File ~/Dev/spark/python/pyspark/sql/pandas/utils.py:37, in 
> require_minimum_pandas_version()
>  34 raise ImportError(
>  35 "Pandas >= %s must be installed; however, " "it was not 
> found." % minimum_pandas_version
>  36 ) from raised_error
> ---> 37 if LooseVersion(pandas.__version__) < 
> LooseVersion(minimum_pandas_version):
>  38 raise ImportError(
>  39 "Pandas >= %s must be installed; however, "
>  40 "your version was %s." % 

[jira] [Assigned] (SPARK-42266) Local mode should work with IPython

2023-03-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42266:


Assignee: Hyukjin Kwon

> Local mode should work with IPython
> ---
>
> Key: SPARK-42266
> URL: https://issues.apache.org/jira/browse/SPARK-42266
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Hyukjin Kwon
>Priority: Major
>
> {code:java}
> (spark_dev) ➜  spark git:(master) bin/pyspark --remote "local[*]"
> Python 3.9.15 (main, Nov 24 2022, 08:28:41) 
> Type 'copyright', 'credits' or 'license' for more information
> IPython 8.9.0 -- An enhanced Interactive Python. Type '?' for help.
> /Users/ruifeng.zheng/Dev/spark/python/pyspark/shell.py:45: UserWarning: 
> Failed to initialize Spark session.
>   warnings.warn("Failed to initialize Spark session.")
> Traceback (most recent call last):
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/shell.py", line 40, in 
> 
> spark = SparkSession.builder.getOrCreate()
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/session.py", line 
> 429, in getOrCreate
> from pyspark.sql.connect.session import SparkSession as RemoteSparkSession
>   File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/__init__.py", line 
> 21, in 
> from pyspark.sql.connect.dataframe import DataFrame  # noqa: F401
>   File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/dataframe.py", 
> line 35, in 
> import pandas
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/__init__.py", 
> line 29, in 
> from pyspark.pandas.missing.general_functions import 
> MissingPandasLikeGeneralFunctions
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/__init__.py", 
> line 34, in 
> require_minimum_pandas_version()
>   File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/pandas/utils.py", 
> line 37, in require_minimum_pandas_version
> if LooseVersion(pandas.__version__) < 
> LooseVersion(minimum_pandas_version):
> AttributeError: partially initialized module 'pandas' has no attribute 
> '__version__' (most likely due to a circular import)
> [TerminalIPythonApp] WARNING | Unknown error in handling PYTHONSTARTUP file 
> /Users/ruifeng.zheng/Dev/spark//python/pyspark/shell.py:
> ---
> AttributeErrorTraceback (most recent call last)
> File ~/Dev/spark/python/pyspark/shell.py:40
>  38 try:
>  39 # Creates pyspark.sql.connect.SparkSession.
> ---> 40 spark = SparkSession.builder.getOrCreate()
>  41 except Exception:
> File ~/Dev/spark/python/pyspark/sql/session.py:429, in 
> SparkSession.Builder.getOrCreate(self)
> 428 with SparkContext._lock:
> --> 429 from pyspark.sql.connect.session import SparkSession as 
> RemoteSparkSession
> 431 if (
> 432 SparkContext._active_spark_context is None
> 433 and SparkSession._instantiatedSession is None
> 434 ):
> File ~/Dev/spark/python/pyspark/sql/connect/__init__.py:21
>  18 """Currently Spark Connect is very experimental and the APIs to 
> interact with
>  19 Spark through this API are can be changed at any time without 
> warning."""
> ---> 21 from pyspark.sql.connect.dataframe import DataFrame  # noqa: F401
>  22 from pyspark.sql.pandas.utils import (
>  23 require_minimum_pandas_version,
>  24 require_minimum_pyarrow_version,
>  25 require_minimum_grpc_version,
>  26 )
> File ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:35
>  34 import random
> ---> 35 import pandas
>  36 import json
> File ~/Dev/spark/python/pyspark/pandas/__init__.py:29
>  27 from typing import Any
> ---> 29 from pyspark.pandas.missing.general_functions import 
> MissingPandasLikeGeneralFunctions
>  30 from pyspark.pandas.missing.scalars import MissingPandasLikeScalars
> File ~/Dev/spark/python/pyspark/pandas/__init__.py:34
>  33 try:
> ---> 34 require_minimum_pandas_version()
>  35 require_minimum_pyarrow_version()
> File ~/Dev/spark/python/pyspark/sql/pandas/utils.py:37, in 
> require_minimum_pandas_version()
>  34 raise ImportError(
>  35 "Pandas >= %s must be installed; however, " "it was not 
> found." % minimum_pandas_version
>  36 ) from raised_error
> ---> 37 if LooseVersion(pandas.__version__) < 
> LooseVersion(minimum_pandas_version):
>  38 raise ImportError(
>  39 "Pandas >= %s must be installed; however, "
>  40 "your version was %s." % (minimum_pandas_version, 
> pandas.__version__)
>  41 )
> AttributeError: partially initialized module 'pandas' has no attribute 
> 

[jira] [Resolved] (SPARK-42712) Improve docstring of mapInPandas and mapInArrow

2023-03-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42712.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40330
[https://github.com/apache/spark/pull/40330]

> Improve docstring of mapInPandas and mapInArrow
> ---
>
> Key: SPARK-42712
> URL: https://issues.apache.org/jira/browse/SPARK-42712
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.4.0
>
>
> We'd better call out they are not scalar.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42712) Improve docstring of mapInPandas and mapInArrow

2023-03-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42712:


Assignee: Xinrong Meng

> Improve docstring of mapInPandas and mapInArrow
> ---
>
> Key: SPARK-42712
> URL: https://issues.apache.org/jira/browse/SPARK-42712
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> We'd better call out they are not scalar.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42715) NegativeArraySizeException by too many datas read from ORC file

2023-03-08 Thread XiaoLong Wu (Jira)
XiaoLong Wu created SPARK-42715:
---

 Summary: NegativeArraySizeException by too many datas read from 
ORC file
 Key: SPARK-42715
 URL: https://issues.apache.org/jira/browse/SPARK-42715
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.2
Reporter: XiaoLong Wu


If need more friendly exception msg about how to avoid this exception? Like 
when we catch this expetion, told user can reduce the value about 
spark.sql.orc.columnarReaderBatchSize;

In the current version, for batch reading of orc files, we use the function 
OrcColumnarBatchReader.nextBatch() to do this and depends on 
[ORC|https://github.com/apache/orc](version:1.8.2) to completed data copy, in 
ORC relevant code is as follows:
{code:java}
private static byte[] commonReadByteArrays(InStream stream, IntegerReader 
lengths,
LongColumnVector scratchlcv,
BytesColumnVector result, final int batchSize) throws IOException {
  // Read lengths
  scratchlcv.isRepeating = result.isRepeating;
  scratchlcv.noNulls = result.noNulls;
  scratchlcv.isNull = result.isNull;  // Notice we are replacing the isNull 
vector here...
  lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);
  int totalLength = 0;
  if (!scratchlcv.isRepeating) {
for (int i = 0; i < batchSize; i++) {
  if (!scratchlcv.isNull[i]) {
totalLength += (int) scratchlcv.vector[i];
  }
}
  } else {
if (!scratchlcv.isNull[0]) {
  totalLength = (int) (batchSize * scratchlcv.vector[0]);
}
  }

  // Read all the strings for this batch
  byte[] allBytes = new byte[totalLength];
  int offset = 0;
  int len = totalLength;
  while (len > 0) {
int bytesRead = stream.read(allBytes, offset, len);
if (bytesRead < 0) {
  throw new EOFException("Can't finish byte read from " + stream);
}
len -= bytesRead;
offset += bytesRead;
  }

  return allBytes;
} {code}
 As shown above, totalLength as a Long type param is used to mark the data 
size. If the data size too big to over max_int, converting to int will lead to 
value overflow and throws the following exception:
{code:java}
Caused by: java.lang.NegativeArraySizeException
    at 
org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1998)
    at 
org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:2021)
    at 
org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:2119)
    at 
org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1962)
    at 
org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:65)
    at 
org.apache.orc.impl.reader.tree.StructBatchReader.nextBatchForLevel(StructBatchReader.java:100)
    at 
org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:77)
    at 
org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1371)
    at 
org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextBatch(OrcColumnarBatchReader.java:197)
    at 
org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.nextKeyValue(OrcColumnarBatchReader.java:99)
    at 
org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
    at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:116)
    at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:274)
    ... 20 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-42623) parameter markers not blocked in DDL

2023-03-08 Thread zzzzming95 (Jira)


[ https://issues.apache.org/jira/browse/SPARK-42623 ]


ming95 deleted comment on SPARK-42623:


was (Author: zing):
i can try to fix this issue

> parameter markers not blocked in DDL
> 
>
> Key: SPARK-42623
> URL: https://issues.apache.org/jira/browse/SPARK-42623
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> The parameterized query code does not block DDL statements from referencing 
> parameter markers.
> E.g. a 
>  
> {code:java}
> scala> spark.sql(sqlText = "CREATE VIEW v1 AS SELECT current_timestamp() + 
> :later as stamp, :x * :x AS square", args = Map("later" -> "INTERVAL'3' 
> HOUR", "x" -> "15.0")).show()
> ++
> ||
> ++
> ++
> {code}
> It appears we have some protection that fails us when the view is invoked:
>  
> {code:java}
> scala> spark.sql(sqlText = "SELECT * FROM v1", args = Map("later" -> 
> "INTERVAL'3' HOUR", "x" -> "15.0")).show()
> org.apache.spark.sql.AnalysisException: [UNBOUND_SQL_PARAMETER] Found the 
> unbound parameter: `later`. Please, fix `args` and provide a mapping of the 
> parameter to a SQL literal.; line 1 pos 29
> {code}
> Right now I think affected are:
> * DEFAULT definition
> * VIEW definition
> but any other future standard expression popping up is at risk, such as SQL 
> Functions, or GENERATED COLUMN.
> CREATE TABLE AS is debatable, since it it executes the query at definition 
> only.
> For simplicity I propose to block the feature from ANY DDL statement (CREATE, 
> ALTER).
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >