[jira] [Commented] (SPARK-40460) Streaming metrics is zero when select _metadata

2022-09-18 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606433#comment-17606433
 ] 

Jungtaek Lim commented on SPARK-40460:
--

[~yaohua]
Just to clarify, streaming metadata column for DSv1 seems to be introduced in 
Spark 3.3. https://issues.apache.org/jira/browse/SPARK-38323 
Do I understand correctly? If then affect versions don't seem to be correct. 

> Streaming metrics is zero when select _metadata
> ---
>
> Key: SPARK-40460
> URL: https://issues.apache.org/jira/browse/SPARK-40460
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Yaohua Zhao
>Assignee: Yaohua Zhao
>Priority: Major
> Fix For: 3.4.0
>
>
> Streaming metrics report all 0 (`processedRowsPerSecond`, etc) when selecting 
> `_metadata` column. Because the logical plan from the batch and the actual 
> planned logical are mismatched: 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala#L348]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40460) Streaming metrics is zero when select _metadata

2022-09-18 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-40460.
--
Fix Version/s: 3.4.0
 Assignee: Yaohua Zhao
   Resolution: Fixed

Issue resolved via https://github.com/apache/spark/pull/37905

> Streaming metrics is zero when select _metadata
> ---
>
> Key: SPARK-40460
> URL: https://issues.apache.org/jira/browse/SPARK-40460
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Yaohua Zhao
>Assignee: Yaohua Zhao
>Priority: Major
> Fix For: 3.4.0
>
>
> Streaming metrics report all 0 (`processedRowsPerSecond`, etc) when selecting 
> `_metadata` column. Because the logical plan from the batch and the actual 
> planned logical are mismatched: 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala#L348]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40482) Revert SPARK-24544 Print actual failure cause when look up function failed

2022-09-18 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-40482.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37896
[https://github.com/apache/spark/pull/37896]

> Revert SPARK-24544 Print actual failure cause when look up function failed
> --
>
> Key: SPARK-40482
> URL: https://issues.apache.org/jira/browse/SPARK-40482
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Trivial
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40482) Revert SPARK-24544 Print actual failure cause when look up function failed

2022-09-18 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-40482:
---

Assignee: Wenchen Fan

> Revert SPARK-24544 Print actual failure cause when look up function failed
> --
>
> Key: SPARK-40482
> URL: https://issues.apache.org/jira/browse/SPARK-40482
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Wenchen Fan
>Priority: Trivial
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40367) Total size of serialized results of 3730 tasks (64.0 GB) is bigger than spark.driver.maxResultSize (64.0 GB)

2022-09-18 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606414#comment-17606414
 ] 

Senthil Kumar commented on SPARK-40367:
---

Hi [~jackyjfhu] 

 

Check if you are sending bytes/rows which are more than 
"spark.driver.maxResultSize". If so, you need to keep increasing 
"spark.driver.maxResultSize" untill it is fixing this issue. But while 
increasing spark.driver.maxResultSize you should be careful that it should not 
exceed driver-memory.

 

_Note: driver-memory > spark.driver.maxResultSize > row/bytes sent to driver_

>  Total size of serialized results of 3730 tasks (64.0 GB) is bigger than 
> spark.driver.maxResultSize (64.0 GB)
> -
>
> Key: SPARK-40367
> URL: https://issues.apache.org/jira/browse/SPARK-40367
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: jackyjfhu
>Priority: Blocker
>
>  I use this 
> code:spark.sql("xx").selectExpr(spark.table(target).columns:_*).write.mode("overwrite").insertInto(target),I
>  get an error
>  
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Total size of serialized results of 3730 tasks (64.0 GB) is bigger than 
> spark.driver.maxResultSize (64.0 GB)
>     at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1609)
>     at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1597)
>     at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1596)
>     at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>     at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1596)
>     at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
>     at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
>     at scala.Option.foreach(Option.scala:257)
>     at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
>     at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1830)
>     at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1779)
>     at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1768)
>     at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>     at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2074)
>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
>     at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939)
>     at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>     at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>     at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
>     at org.apache.spark.rdd.RDD.collect(RDD.scala:938)
>     at 
> org.apache.spark.sql.execution.SparkPlan.executeCollectIterator(SparkPlan.scala:304)
>     at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:76)
>     at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:73)
>     at 
> org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:97)
>     at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:72)
>     at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:72)
>     at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>     at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
> --conf spark.driver.maxResultSize=64g
> --conf spark.sql.broadcastTimeout=36000
> -conf spark.sql.autoBroadcastJoinThreshold=204857600 
> --conf spark.memory.offHeap.enabled=true
> --conf spark.memory.offHeap.size=4g
> --num-executors 500
> 

[jira] [Updated] (SPARK-40474) Infer columns with mixed date and timestamp as String in CSV schema inference

2022-09-18 Thread Xiaonan Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaonan Yang updated SPARK-40474:
-
Description: 
In this ticket https://issues.apache.org/jira/browse/SPARK-39469, we introduced 
the support of date type in CSV schema inference. The schema inference behavior 
on date time columns now is:
 * For a column only containing dates, we will infer it as Date type
 * For a column only containing timestamps, we will infer it as Timestamp type
 * For a column containing a mixture of dates and timestamps, we will infer it 
as Timestamp type

However, we found that we are too ambitious on the last scenario, to support 
which we have introduced much complexity in code and caused a lot of 
performance concerns. Thus, we want to simplify the behavior of the last 
scenario as:
 * For a column containing a mixture of dates and timestamps, we will infer it 
as String type

  was:
In this ticket, we introduced the support of date type in CSV schema inference. 
The schema inference behavior on date time columns now is:
 * For a column only containing dates, we will infer it as Date type
 * For a column only containing timestamps, we will infer it as Timestamp type
 * For a column containing a mixture of dates and timestamps, we will infer it 
as Timestamp type

However, we found that we are too ambitious on the last scenario, to support 
which we have introduced much complexity in code and caused a lot of 
performance concerns. Thus, we want to simplify the behavior of the last 
scenario as:
 * For a column containing a mixture of dates and timestamps, we will infer it 
as String type


> Infer columns with mixed date and timestamp as String in CSV schema inference
> -
>
> Key: SPARK-40474
> URL: https://issues.apache.org/jira/browse/SPARK-40474
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Xiaonan Yang
>Priority: Major
>
> In this ticket https://issues.apache.org/jira/browse/SPARK-39469, we 
> introduced the support of date type in CSV schema inference. The schema 
> inference behavior on date time columns now is:
>  * For a column only containing dates, we will infer it as Date type
>  * For a column only containing timestamps, we will infer it as Timestamp type
>  * For a column containing a mixture of dates and timestamps, we will infer 
> it as Timestamp type
> However, we found that we are too ambitious on the last scenario, to support 
> which we have introduced much complexity in code and caused a lot of 
> performance concerns. Thus, we want to simplify the behavior of the last 
> scenario as:
>  * For a column containing a mixture of dates and timestamps, we will infer 
> it as String type



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40483) Add `CONNECT` label

2022-09-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40483:


Assignee: Apache Spark

> Add `CONNECT` label
> ---
>
> Key: SPARK-40483
> URL: https://issues.apache.org/jira/browse/SPARK-40483
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Project Infra
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40483) Add `CONNECT` label

2022-09-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40483:


Assignee: (was: Apache Spark)

> Add `CONNECT` label
> ---
>
> Key: SPARK-40483
> URL: https://issues.apache.org/jira/browse/SPARK-40483
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Project Infra
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40483) Add `CONNECT` label

2022-09-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606404#comment-17606404
 ] 

Apache Spark commented on SPARK-40483:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/37925

> Add `CONNECT` label
> ---
>
> Key: SPARK-40483
> URL: https://issues.apache.org/jira/browse/SPARK-40483
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Project Infra
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40483) Add `CONNECT` label

2022-09-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-40483:
-
Parent: SPARK-39375
Issue Type: Sub-task  (was: Improvement)

> Add `CONNECT` label
> ---
>
> Key: SPARK-40483
> URL: https://issues.apache.org/jira/browse/SPARK-40483
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Project Infra
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40483) Add `CONNECT` label

2022-09-18 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-40483:


 Summary: Add `CONNECT` label
 Key: SPARK-40483
 URL: https://issues.apache.org/jira/browse/SPARK-40483
 Project: Spark
  Issue Type: Improvement
  Components: Connect, Project Infra
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40472) Improve pyspark.sql.function example experience

2022-09-18 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606398#comment-17606398
 ] 

Hyukjin Kwon commented on SPARK-40472:
--

I was thinking about this too but maybe it's fine as is because we can assume 
that users are visiting the page of a package, and they would likely know that 
they would need to import the current package they are visiting.

> Improve pyspark.sql.function example experience
> ---
>
> Key: SPARK-40472
> URL: https://issues.apache.org/jira/browse/SPARK-40472
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: deshanxiao
>Priority: Minor
>
> There are many exanple in pyspark.sql.function:
> {code:java}
>     Examples
>     
>     >>> df = spark.range(1)
>     >>> df.select(lit(5).alias('height'), df.id).show()
>     +--+---+
>     |height| id|
>     +--+---+
>     |     5|  0|
>     +--+---+ {code}
> We can add import statements so that the user can directly run it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40474) Infer columns with mixed date and timestamp as String in CSV schema inference

2022-09-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-40474:
-
Fix Version/s: (was: 3.4.0)

> Infer columns with mixed date and timestamp as String in CSV schema inference
> -
>
> Key: SPARK-40474
> URL: https://issues.apache.org/jira/browse/SPARK-40474
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Xiaonan Yang
>Priority: Major
>
> In this ticket, we introduced the support of date type in CSV schema 
> inference. The schema inference behavior on date time columns now is:
>  * For a column only containing dates, we will infer it as Date type
>  * For a column only containing timestamps, we will infer it as Timestamp type
>  * For a column containing a mixture of dates and timestamps, we will infer 
> it as Timestamp type
> However, we found that we are too ambitious on the last scenario, to support 
> which we have introduced much complexity in code and caused a lot of 
> performance concerns. Thus, we want to simplify the behavior of the last 
> scenario as:
>  * For a column containing a mixture of dates and timestamps, we will infer 
> it as String type



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40404) Fix the wrong description related to `spark.shuffle.service.db` in the document

2022-09-18 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-40404.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37853
[https://github.com/apache/spark/pull/37853]

> Fix the wrong description related to `spark.shuffle.service.db` in the 
> document
> ---
>
> Key: SPARK-40404
> URL: https://issues.apache.org/jira/browse/SPARK-40404
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>
> From the context from pr of SPARK-17321, YarnShuffleService will persist data 
> into Level/RocksDB when Yarn NM recovery is enabled.  This behavior is not 
> controlled by `spark.shuffle.service.db.enabled` and is not always enabled.  
> So the description of `spark.shuffle.service.db.enabled` in 
> `spark-standalone.md` is misleading
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40404) Fix the wrong description related to `spark.shuffle.service.db` in the document

2022-09-18 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-40404:
-

Assignee: Yang Jie

> Fix the wrong description related to `spark.shuffle.service.db` in the 
> document
> ---
>
> Key: SPARK-40404
> URL: https://issues.apache.org/jira/browse/SPARK-40404
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>
> From the context from pr of SPARK-17321, YarnShuffleService will persist data 
> into Level/RocksDB when Yarn NM recovery is enabled.  This behavior is not 
> controlled by `spark.shuffle.service.db.enabled` and is not always enabled.  
> So the description of `spark.shuffle.service.db.enabled` in 
> `spark-standalone.md` is misleading
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40482) Revert SPARK-24544 Print actual failure cause when look up function failed

2022-09-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40482:


Assignee: (was: Apache Spark)

> Revert SPARK-24544 Print actual failure cause when look up function failed
> --
>
> Key: SPARK-40482
> URL: https://issues.apache.org/jira/browse/SPARK-40482
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40482) Revert SPARK-24544 Print actual failure cause when look up function failed

2022-09-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40482:


Assignee: Apache Spark

> Revert SPARK-24544 Print actual failure cause when look up function failed
> --
>
> Key: SPARK-40482
> URL: https://issues.apache.org/jira/browse/SPARK-40482
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40482) Revert SPARK-24544 Print actual failure cause when look up function failed

2022-09-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606383#comment-17606383
 ] 

Apache Spark commented on SPARK-40482:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/37896

> Revert SPARK-24544 Print actual failure cause when look up function failed
> --
>
> Key: SPARK-40482
> URL: https://issues.apache.org/jira/browse/SPARK-40482
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40482) Revert SPARK-24544 Print actual failure cause when look up function failed

2022-09-18 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-40482:
-

 Summary: Revert SPARK-24544 Print actual failure cause when look 
up function failed
 Key: SPARK-40482
 URL: https://issues.apache.org/jira/browse/SPARK-40482
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40424) Refactor ChromeUIHistoryServerSuite to test rocksdb

2022-09-18 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-40424:
-

Assignee: Yang Jie

> Refactor ChromeUIHistoryServerSuite to test rocksdb
> ---
>
> Key: SPARK-40424
> URL: https://issues.apache.org/jira/browse/SPARK-40424
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>
> ChromeUIHistoryServerSuite only test leveldb backend now



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40424) Refactor ChromeUIHistoryServerSuite to test rocksdb

2022-09-18 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-40424.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37878
[https://github.com/apache/spark/pull/37878]

> Refactor ChromeUIHistoryServerSuite to test rocksdb
> ---
>
> Key: SPARK-40424
> URL: https://issues.apache.org/jira/browse/SPARK-40424
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>
> ChromeUIHistoryServerSuite only test leveldb backend now



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40468) Column pruning is not handled correctly in CSV when _corrupt_record is used

2022-09-18 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-40468:
--
Labels: correctness  (was: )

> Column pruning is not handled correctly in CSV when _corrupt_record is used
> ---
>
> Key: SPARK-40468
> URL: https://issues.apache.org/jira/browse/SPARK-40468
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.2.2, 3.4.0
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
>  Labels: correctness
> Fix For: 3.4.0, 3.3.2
>
>
> I have found that depending on the name of the corrupt record in CSV, the 
> field is populated incorrectly. Here is an example:
> {code:java}
> 1,a > /tmp/file.csv
> ===
> val df = spark.read
>   .schema("c1 int, c2 string, x string, _corrupt_record string")
>   .csv("file:/tmp/file.csv")
>   .withColumn("x", lit("A"))
> Result:
> +---+---+---+---+
> |c1 |c2 |x  |_corrupt_record|
> +---+---+---+---+
> |1  |a  |A  |1,a            |
> +---+---+---+---+{code}
>  
> However, if you rename the {{_corrupt_record}} column to something else, the 
> result is different:
> {code:java}
> val df = spark.read 
>   .option("columnNameCorruptRecord", "corrupt_record")
>   .schema("c1 int, c2 string, x string, corrupt_record string") 
>   .csv("file:/tmp/file.csv") .withColumn("x", lit("A")) 
> Result:
> +---+---+---+--+
> |c1 |c2 |x  |corrupt_record|
> +---+---+---+--+
> |1  |a  |A  |null          |
> +---+---+---+--+{code}
>  
> This is due to inconsistency in CSVFileFormat, when enabling columnPruning, 
> we check SQLConf option for corrupt records but CSV reader relies on 
> {{columnNameCorruptRecord}} option instead.
> Also, this disables column pruning which used to work in Spark version prior 
> to 
> https://github.com/apache/spark/commit/959694271e30879c944d7fd5de2740571012460a.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40481) Ignore stage fetch failure caused by decommissioned executor

2022-09-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606359#comment-17606359
 ] 

Apache Spark commented on SPARK-40481:
--

User 'warrenzhu25' has created a pull request for this issue:
https://github.com/apache/spark/pull/37924

> Ignore stage fetch failure caused by decommissioned executor
> 
>
> Key: SPARK-40481
> URL: https://issues.apache.org/jira/browse/SPARK-40481
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Zhongwei Zhu
>Priority: Minor
>
> When executor decommission is enabled, there would be many stage failure 
> caused by FetchFailed from decommissioned executor, further causing whole 
> job's failure. It would be better not to count such failure in 
> `spark.stage.maxConsecutiveAttempts`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40481) Ignore stage fetch failure caused by decommissioned executor

2022-09-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40481:


Assignee: Apache Spark

> Ignore stage fetch failure caused by decommissioned executor
> 
>
> Key: SPARK-40481
> URL: https://issues.apache.org/jira/browse/SPARK-40481
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Zhongwei Zhu
>Assignee: Apache Spark
>Priority: Minor
>
> When executor decommission is enabled, there would be many stage failure 
> caused by FetchFailed from decommissioned executor, further causing whole 
> job's failure. It would be better not to count such failure in 
> `spark.stage.maxConsecutiveAttempts`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40481) Ignore stage fetch failure caused by decommissioned executor

2022-09-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606358#comment-17606358
 ] 

Apache Spark commented on SPARK-40481:
--

User 'warrenzhu25' has created a pull request for this issue:
https://github.com/apache/spark/pull/37924

> Ignore stage fetch failure caused by decommissioned executor
> 
>
> Key: SPARK-40481
> URL: https://issues.apache.org/jira/browse/SPARK-40481
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Zhongwei Zhu
>Priority: Minor
>
> When executor decommission is enabled, there would be many stage failure 
> caused by FetchFailed from decommissioned executor, further causing whole 
> job's failure. It would be better not to count such failure in 
> `spark.stage.maxConsecutiveAttempts`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40481) Ignore stage fetch failure caused by decommissioned executor

2022-09-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40481:


Assignee: (was: Apache Spark)

> Ignore stage fetch failure caused by decommissioned executor
> 
>
> Key: SPARK-40481
> URL: https://issues.apache.org/jira/browse/SPARK-40481
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Zhongwei Zhu
>Priority: Minor
>
> When executor decommission is enabled, there would be many stage failure 
> caused by FetchFailed from decommissioned executor, further causing whole 
> job's failure. It would be better not to count such failure in 
> `spark.stage.maxConsecutiveAttempts`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40481) Ignore stage fetch failure caused by decommissioned executor

2022-09-18 Thread Zhongwei Zhu (Jira)
Zhongwei Zhu created SPARK-40481:


 Summary: Ignore stage fetch failure caused by decommissioned 
executor
 Key: SPARK-40481
 URL: https://issues.apache.org/jira/browse/SPARK-40481
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: Zhongwei Zhu


When executor decommission is enabled, there would be many stage failure caused 
by FetchFailed from decommissioned executor, further causing whole job's 
failure. It would be better not to count such failure in 
`spark.stage.maxConsecutiveAttempts`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40334) Implement `GroupBy.prod`.

2022-09-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606345#comment-17606345
 ] 

Apache Spark commented on SPARK-40334:
--

User 'ayudovin' has created a pull request for this issue:
https://github.com/apache/spark/pull/37923

> Implement `GroupBy.prod`.
> -
>
> Key: SPARK-40334
> URL: https://issues.apache.org/jira/browse/SPARK-40334
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Artsiom Yudovin
>Priority: Major
>
> We should implement `GroupBy.prod` for increasing pandas API coverage.
> pandas docs: 
> https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.prod.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40334) Implement `GroupBy.prod`.

2022-09-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40334:


Assignee: Apache Spark  (was: Artsiom Yudovin)

> Implement `GroupBy.prod`.
> -
>
> Key: SPARK-40334
> URL: https://issues.apache.org/jira/browse/SPARK-40334
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> We should implement `GroupBy.prod` for increasing pandas API coverage.
> pandas docs: 
> https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.prod.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40334) Implement `GroupBy.prod`.

2022-09-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40334:


Assignee: Artsiom Yudovin  (was: Apache Spark)

> Implement `GroupBy.prod`.
> -
>
> Key: SPARK-40334
> URL: https://issues.apache.org/jira/browse/SPARK-40334
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Artsiom Yudovin
>Priority: Major
>
> We should implement `GroupBy.prod` for increasing pandas API coverage.
> pandas docs: 
> https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.prod.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40480) Remove push-based shuffle data after query finished

2022-09-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606314#comment-17606314
 ] 

Apache Spark commented on SPARK-40480:
--

User 'wankunde' has created a pull request for this issue:
https://github.com/apache/spark/pull/37922

> Remove push-based shuffle data after query finished
> ---
>
> Key: SPARK-40480
> URL: https://issues.apache.org/jira/browse/SPARK-40480
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.3.0
>Reporter: Wan Kun
>Priority: Major
>
> Now spark will only cleanup shuffle data files except push-based shuffle 
> files.
> In our production cluster, push-based shuffle service will create too many 
> shuffle merge data files as there are several spark thrift server.
> Could we cleanup the merged data files after the query finished?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40480) Remove push-based shuffle data after query finished

2022-09-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40480:


Assignee: (was: Apache Spark)

> Remove push-based shuffle data after query finished
> ---
>
> Key: SPARK-40480
> URL: https://issues.apache.org/jira/browse/SPARK-40480
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.3.0
>Reporter: Wan Kun
>Priority: Major
>
> Now spark will only cleanup shuffle data files except push-based shuffle 
> files.
> In our production cluster, push-based shuffle service will create too many 
> shuffle merge data files as there are several spark thrift server.
> Could we cleanup the merged data files after the query finished?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40480) Remove push-based shuffle data after query finished

2022-09-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606313#comment-17606313
 ] 

Apache Spark commented on SPARK-40480:
--

User 'wankunde' has created a pull request for this issue:
https://github.com/apache/spark/pull/37922

> Remove push-based shuffle data after query finished
> ---
>
> Key: SPARK-40480
> URL: https://issues.apache.org/jira/browse/SPARK-40480
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.3.0
>Reporter: Wan Kun
>Priority: Major
>
> Now spark will only cleanup shuffle data files except push-based shuffle 
> files.
> In our production cluster, push-based shuffle service will create too many 
> shuffle merge data files as there are several spark thrift server.
> Could we cleanup the merged data files after the query finished?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40480) Remove push-based shuffle data after query finished

2022-09-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40480:


Assignee: Apache Spark

> Remove push-based shuffle data after query finished
> ---
>
> Key: SPARK-40480
> URL: https://issues.apache.org/jira/browse/SPARK-40480
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.3.0
>Reporter: Wan Kun
>Assignee: Apache Spark
>Priority: Major
>
> Now spark will only cleanup shuffle data files except push-based shuffle 
> files.
> In our production cluster, push-based shuffle service will create too many 
> shuffle merge data files as there are several spark thrift server.
> Could we cleanup the merged data files after the query finished?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40480) Remove push-based shuffle data after query finished

2022-09-18 Thread Wan Kun (Jira)
Wan Kun created SPARK-40480:
---

 Summary: Remove push-based shuffle data after query finished
 Key: SPARK-40480
 URL: https://issues.apache.org/jira/browse/SPARK-40480
 Project: Spark
  Issue Type: Bug
  Components: Shuffle
Affects Versions: 3.3.0
Reporter: Wan Kun


Now spark will only cleanup shuffle data files except push-based shuffle files.
In our production cluster, push-based shuffle service will create too many 
shuffle merge data files as there are several spark thrift server.
Could we cleanup the merged data files after the query finished?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40479) Migrate unexpected input type error to an error class

2022-09-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606256#comment-17606256
 ] 

Apache Spark commented on SPARK-40479:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/37921

> Migrate unexpected input type error to an error class
> -
>
> Key: SPARK-40479
> URL: https://issues.apache.org/jira/browse/SPARK-40479
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the function ExpectsInputTypes.checkInputDataTypes onto 
> DataTypeMismatch and introduce new error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40479) Migrate unexpected input type error to an error class

2022-09-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606257#comment-17606257
 ] 

Apache Spark commented on SPARK-40479:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/37921

> Migrate unexpected input type error to an error class
> -
>
> Key: SPARK-40479
> URL: https://issues.apache.org/jira/browse/SPARK-40479
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the function ExpectsInputTypes.checkInputDataTypes onto 
> DataTypeMismatch and introduce new error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40479) Migrate unexpected input type error to an error class

2022-09-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40479:


Assignee: Apache Spark

> Migrate unexpected input type error to an error class
> -
>
> Key: SPARK-40479
> URL: https://issues.apache.org/jira/browse/SPARK-40479
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Migrate the function ExpectsInputTypes.checkInputDataTypes onto 
> DataTypeMismatch and introduce new error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40479) Migrate unexpected input type error to an error class

2022-09-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40479:


Assignee: (was: Apache Spark)

> Migrate unexpected input type error to an error class
> -
>
> Key: SPARK-40479
> URL: https://issues.apache.org/jira/browse/SPARK-40479
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the function ExpectsInputTypes.checkInputDataTypes onto 
> DataTypeMismatch and introduce new error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40479) Migrate unexpected input type error to an error class

2022-09-18 Thread Max Gekk (Jira)
Max Gekk created SPARK-40479:


 Summary: Migrate unexpected input type error to an error class
 Key: SPARK-40479
 URL: https://issues.apache.org/jira/browse/SPARK-40479
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk


Migrate the function ExpectsInputTypes.checkInputDataTypes onto 
DataTypeMismatch and introduce new error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39512) Document the Spark Docker container release process

2022-09-18 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-39512.
-
Fix Version/s: 3.4.0
 Assignee: Holden Karau
   Resolution: Fixed

> Document the Spark Docker container release process
> ---
>
> Key: SPARK-39512
> URL: https://issues.apache.org/jira/browse/SPARK-39512
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org