date:20220329

[jira] [Created] (SPARK-38681) Support nested generic case classes

2022-03-29 Thread Emil Ejbyfeldt (Jira)

Emil Ejbyfeldt created SPARK-38681:
--

 Summary: Support nested generic case classes
 Key: SPARK-38681
 URL: https://issues.apache.org/jira/browse/SPARK-38681
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0, 3.4.0
Reporter: Emil Ejbyfeldt


Spark fail to derive schemas when using nested case class with generic 
parameters. 

Example

{code:java}
case class GenericData[A](
genericField: A)
{code}

Will derive a correct schema for `GenericData[Int]` but if the classes are 
nested e.g.

{code:java}
case class NestedGeneric[T](
  generic: GenericData[T])
{code}

it will fail to derive a schema for `NestedGeneric[Int]`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38682) Complex calculations with lead to driver oom

2022-03-29 Thread JacobZheng (Jira)

JacobZheng created SPARK-38682:
--

 Summary: Complex calculations with  lead to driver oom
 Key: SPARK-38682
 URL: https://issues.apache.org/jira/browse/SPARK-38682
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: JacobZheng


My spark job is working fine in version 3.0.1. After I upgraded to 3.2, the 
driver would hang during runtime due to oom. The dump file shows that the 
stageMetrics in SQLAppStatusListener are taking up a lot of memory.I'm 
wondering if it's related to the SPARK-33016 change, or if the execution plan 
change has created more stages, causing the driver to run out of memory, or 
some other reason.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38682) Complex calculations with lead to driver oom

2022-03-29 Thread JacobZheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JacobZheng updated SPARK-38682:
---
Attachment: screenshot-1.png

> Complex calculations with  lead to driver oom
> -
>
> Key: SPARK-38682
> URL: https://issues.apache.org/jira/browse/SPARK-38682
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: JacobZheng
>Priority: Major
> Attachments: 20220329164645.jpg, screenshot-1.png
>
>
> My spark job is working fine in version 3.0.1. After I upgraded to 3.2, the 
> driver would hang during runtime due to oom. The dump file shows that the 
> stageMetrics in SQLAppStatusListener are taking up a lot of memory.I'm 
> wondering if it's related to the SPARK-33016 change, or if the execution plan 
> change has created more stages, causing the driver to run out of memory, or 
> some other reason.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38682) Complex calculations with lead to driver oom

2022-03-29 Thread JacobZheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JacobZheng updated SPARK-38682:
---
Attachment: 20220329164645.jpg

> Complex calculations with  lead to driver oom
> -
>
> Key: SPARK-38682
> URL: https://issues.apache.org/jira/browse/SPARK-38682
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: JacobZheng
>Priority: Major
> Attachments: 20220329164645.jpg, screenshot-1.png
>
>
> My spark job is working fine in version 3.0.1. After I upgraded to 3.2, the 
> driver would hang during runtime due to oom. The dump file shows that the 
> stageMetrics in SQLAppStatusListener are taking up a lot of memory.I'm 
> wondering if it's related to the SPARK-33016 change, or if the execution plan 
> change has created more stages, causing the driver to run out of memory, or 
> some other reason.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38682) Complex calculations with lead to driver oom

2022-03-29 Thread JacobZheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JacobZheng updated SPARK-38682:
---
Description: 
My spark job is working fine in version 3.0.1. After I upgraded to 3.2, the 
driver would hang during runtime due to oom. The dump file shows that the 
stageMetrics in SQLAppStatusListener are taking up a lot of memory.I'm 
wondering if it's related to the SPARK-33016 change, or if the execution plan 
change has created more stages, causing the driver to run out of memory, or 
some other reason.
 !screenshot-1.png! 


  was:
My spark job is working fine in version 3.0.1. After I upgraded to 3.2, the 
driver would hang during runtime due to oom. The dump file shows that the 
stageMetrics in SQLAppStatusListener are taking up a lot of memory.I'm 
wondering if it's related to the SPARK-33016 change, or if the execution plan 
change has created more stages, causing the driver to run out of memory, or 
some other reason.



> Complex calculations with  lead to driver oom
> -
>
> Key: SPARK-38682
> URL: https://issues.apache.org/jira/browse/SPARK-38682
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: JacobZheng
>Priority: Major
> Attachments: 20220329164645.jpg, screenshot-1.png
>
>
> My spark job is working fine in version 3.0.1. After I upgraded to 3.2, the 
> driver would hang during runtime due to oom. The dump file shows that the 
> stageMetrics in SQLAppStatusListener are taking up a lot of memory.I'm 
> wondering if it's related to the SPARK-33016 change, or if the execution plan 
> change has created more stages, causing the driver to run out of memory, or 
> some other reason.
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38682) Complex calculations with lead to driver oom

2022-03-29 Thread JacobZheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JacobZheng updated SPARK-38682:
---
Attachment: (was: 20220329164645.jpg)

> Complex calculations with  lead to driver oom
> -
>
> Key: SPARK-38682
> URL: https://issues.apache.org/jira/browse/SPARK-38682
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: JacobZheng
>Priority: Major
> Attachments: screenshot-1.png
>
>
> My spark job is working fine in version 3.0.1. After I upgraded to 3.2, the 
> driver would hang during runtime due to oom. The dump file shows that the 
> stageMetrics in SQLAppStatusListener are taking up a lot of memory.I'm 
> wondering if it's related to the SPARK-33016 change, or if the execution plan 
> change has created more stages, causing the driver to run out of memory, or 
> some other reason.
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38683) It is unnecessary to release the ShuffleManagedBufferIterator or ShuffleChunkManagedBufferIterator or ManagedBufferIterator buffers when the client channel's connection

2022-03-29 Thread weixiuli (Jira)

weixiuli created SPARK-38683:


 Summary: It is unnecessary to release the 
ShuffleManagedBufferIterator or ShuffleChunkManagedBufferIterator or 
ManagedBufferIterator buffers when the client channel's connection is terminated
 Key: SPARK-38683
 URL: https://issues.apache.org/jira/browse/SPARK-38683
 Project: Spark
  Issue Type: Bug
  Components: Shuffle
Affects Versions: 3.2.1, 3.2.0, 3.1.2, 3.1.1, 3.1.0
Reporter: weixiuli


 It is unnecessary to release the ShuffleManagedBufferIterator or 
ShuffleChunkManagedBufferIterator or ManagedBufferIterator buffers when the 
client channel's connection is terminated, to reduce I/O operations and improve 
performance for the External Shuffle Service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38684) Stream-stream outer join has a possible correctness issue due to weakly read consistent on outer iterators

2022-03-29 Thread Jungtaek Lim (Jira)

Jungtaek Lim created SPARK-38684:


 Summary: Stream-stream outer join has a possible correctness issue 
due to weakly read consistent on outer iterators
 Key: SPARK-38684
 URL: https://issues.apache.org/jira/browse/SPARK-38684
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 3.2.1, 3.3.0
Reporter: Jungtaek Lim


We figured out stream-stream join has the same issue with SPARK-38320 on the 
appended iterators. Since the root cause is same as SPARK-38320, this is only 
reproducible with RocksDB state store provider, but even with HDFS backed state 
store provider, it is not guaranteed by interface contract hence may depend on 
the JVM vendor, version, etc.

I can easily construct the scenario of “data loss” in state store.

Condition follows:
 * Use stream-stream time interval outer join

 ** left outer join has an issue on left side, right outer join has an issue on 
right side, full outer join has an issue on both sides

 * At batch N, produce row(s) on the problematic side which are non-late

 * At the same batch (batch N), some row(s) on the problematic side should be 
evicted by watermark condition

When the condition is fulfilled, out of sync happens with keyToNumValues 
between state and the iterator in evict phase. If eviction of the row happens 
for the grouping key (updating keyToNumValues), the eviction phase “overwrites” 
keyToNumValues in the state as the value it calculates.

Given that the eviction phase “do not know” about the new rows (keyToNumValues 
is out of sync), effectively discarding all rows from the state being added in 
the batch N.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38684) Stream-stream outer join has a possible correctness issue due to weakly read consistent on outer iterators

2022-03-29 Thread Jungtaek Lim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513949#comment-17513949
 ] 

Jungtaek Lim commented on SPARK-38684:
--

Will submit a PR sooner.

> Stream-stream outer join has a possible correctness issue due to weakly read 
> consistent on outer iterators
> --
>
> Key: SPARK-38684
> URL: https://issues.apache.org/jira/browse/SPARK-38684
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Jungtaek Lim
>Priority: Blocker
>  Labels: correctness
>
> We figured out stream-stream join has the same issue with SPARK-38320 on the 
> appended iterators. Since the root cause is same as SPARK-38320, this is only 
> reproducible with RocksDB state store provider, but even with HDFS backed 
> state store provider, it is not guaranteed by interface contract hence may 
> depend on the JVM vendor, version, etc.
> I can easily construct the scenario of “data loss” in state store.
> Condition follows:
>  * Use stream-stream time interval outer join
>  ** left outer join has an issue on left side, right outer join has an issue 
> on right side, full outer join has an issue on both sides
>  * At batch N, produce row(s) on the problematic side which are non-late
>  * At the same batch (batch N), some row(s) on the problematic side should be 
> evicted by watermark condition
> When the condition is fulfilled, out of sync happens with keyToNumValues 
> between state and the iterator in evict phase. If eviction of the row happens 
> for the grouping key (updating keyToNumValues), the eviction phase 
> “overwrites” keyToNumValues in the state as the value it calculates.
> Given that the eviction phase “do not know” about the new rows 
> (keyToNumValues is out of sync), effectively discarding all rows from the 
> state being added in the batch N.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38561) Add doc for "Customized Kubernetes Schedulers"

2022-03-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513962#comment-17513962
 ] 

Apache Spark commented on SPARK-38561:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36001

> Add doc for "Customized Kubernetes Schedulers"
> --
>
> Key: SPARK-38561
> URL: https://issues.apache.org/jira/browse/SPARK-38561
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, Kubernetes
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38683) It is unnecessary to release the ShuffleManagedBufferIterator or ShuffleChunkManagedBufferIterator or ManagedBufferIterator buffers when the client channel's connection

2022-03-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38683:


Assignee: Apache Spark

> It is unnecessary to release the ShuffleManagedBufferIterator or 
> ShuffleChunkManagedBufferIterator or ManagedBufferIterator buffers when the 
> client channel's connection is terminated
> --
>
> Key: SPARK-38683
> URL: https://issues.apache.org/jira/browse/SPARK-38683
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.2.1
>Reporter: weixiuli
>Assignee: Apache Spark
>Priority: Major
>
>  It is unnecessary to release the ShuffleManagedBufferIterator or 
> ShuffleChunkManagedBufferIterator or ManagedBufferIterator buffers when the 
> client channel's connection is terminated, to reduce I/O operations and 
> improve performance for the External Shuffle Service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38683) It is unnecessary to release the ShuffleManagedBufferIterator or ShuffleChunkManagedBufferIterator or ManagedBufferIterator buffers when the client channel's connection

2022-03-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38683:


Assignee: (was: Apache Spark)

> It is unnecessary to release the ShuffleManagedBufferIterator or 
> ShuffleChunkManagedBufferIterator or ManagedBufferIterator buffers when the 
> client channel's connection is terminated
> --
>
> Key: SPARK-38683
> URL: https://issues.apache.org/jira/browse/SPARK-38683
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.2.1
>Reporter: weixiuli
>Priority: Major
>
>  It is unnecessary to release the ShuffleManagedBufferIterator or 
> ShuffleChunkManagedBufferIterator or ManagedBufferIterator buffers when the 
> client channel's connection is terminated, to reduce I/O operations and 
> improve performance for the External Shuffle Service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38683) It is unnecessary to release the ShuffleManagedBufferIterator or ShuffleChunkManagedBufferIterator or ManagedBufferIterator buffers when the client channel's connectio

2022-03-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513963#comment-17513963
 ] 

Apache Spark commented on SPARK-38683:
--

User 'weixiuli' has created a pull request for this issue:
https://github.com/apache/spark/pull/36000

> It is unnecessary to release the ShuffleManagedBufferIterator or 
> ShuffleChunkManagedBufferIterator or ManagedBufferIterator buffers when the 
> client channel's connection is terminated
> --
>
> Key: SPARK-38683
> URL: https://issues.apache.org/jira/browse/SPARK-38683
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.2.1
>Reporter: weixiuli
>Priority: Major
>
>  It is unnecessary to release the ShuffleManagedBufferIterator or 
> ShuffleChunkManagedBufferIterator or ManagedBufferIterator buffers when the 
> client channel's connection is terminated, to reduce I/O operations and 
> improve performance for the External Shuffle Service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38684) Stream-stream outer join has a possible correctness issue due to weakly read consistent on outer iterators

2022-03-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38684:


Assignee: (was: Apache Spark)

> Stream-stream outer join has a possible correctness issue due to weakly read 
> consistent on outer iterators
> --
>
> Key: SPARK-38684
> URL: https://issues.apache.org/jira/browse/SPARK-38684
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Jungtaek Lim
>Priority: Blocker
>  Labels: correctness
>
> We figured out stream-stream join has the same issue with SPARK-38320 on the 
> appended iterators. Since the root cause is same as SPARK-38320, this is only 
> reproducible with RocksDB state store provider, but even with HDFS backed 
> state store provider, it is not guaranteed by interface contract hence may 
> depend on the JVM vendor, version, etc.
> I can easily construct the scenario of “data loss” in state store.
> Condition follows:
>  * Use stream-stream time interval outer join
>  ** left outer join has an issue on left side, right outer join has an issue 
> on right side, full outer join has an issue on both sides
>  * At batch N, produce row(s) on the problematic side which are non-late
>  * At the same batch (batch N), some row(s) on the problematic side should be 
> evicted by watermark condition
> When the condition is fulfilled, out of sync happens with keyToNumValues 
> between state and the iterator in evict phase. If eviction of the row happens 
> for the grouping key (updating keyToNumValues), the eviction phase 
> “overwrites” keyToNumValues in the state as the value it calculates.
> Given that the eviction phase “do not know” about the new rows 
> (keyToNumValues is out of sync), effectively discarding all rows from the 
> state being added in the batch N.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38684) Stream-stream outer join has a possible correctness issue due to weakly read consistent on outer iterators

2022-03-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17513994#comment-17513994
 ] 

Apache Spark commented on SPARK-38684:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/36002

> Stream-stream outer join has a possible correctness issue due to weakly read 
> consistent on outer iterators
> --
>
> Key: SPARK-38684
> URL: https://issues.apache.org/jira/browse/SPARK-38684
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Jungtaek Lim
>Priority: Blocker
>  Labels: correctness
>
> We figured out stream-stream join has the same issue with SPARK-38320 on the 
> appended iterators. Since the root cause is same as SPARK-38320, this is only 
> reproducible with RocksDB state store provider, but even with HDFS backed 
> state store provider, it is not guaranteed by interface contract hence may 
> depend on the JVM vendor, version, etc.
> I can easily construct the scenario of “data loss” in state store.
> Condition follows:
>  * Use stream-stream time interval outer join
>  ** left outer join has an issue on left side, right outer join has an issue 
> on right side, full outer join has an issue on both sides
>  * At batch N, produce row(s) on the problematic side which are non-late
>  * At the same batch (batch N), some row(s) on the problematic side should be 
> evicted by watermark condition
> When the condition is fulfilled, out of sync happens with keyToNumValues 
> between state and the iterator in evict phase. If eviction of the row happens 
> for the grouping key (updating keyToNumValues), the eviction phase 
> “overwrites” keyToNumValues in the state as the value it calculates.
> Given that the eviction phase “do not know” about the new rows 
> (keyToNumValues is out of sync), effectively discarding all rows from the 
> state being added in the batch N.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38684) Stream-stream outer join has a possible correctness issue due to weakly read consistent on outer iterators

2022-03-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38684:


Assignee: Apache Spark

> Stream-stream outer join has a possible correctness issue due to weakly read 
> consistent on outer iterators
> --
>
> Key: SPARK-38684
> URL: https://issues.apache.org/jira/browse/SPARK-38684
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Jungtaek Lim
>Assignee: Apache Spark
>Priority: Blocker
>  Labels: correctness
>
> We figured out stream-stream join has the same issue with SPARK-38320 on the 
> appended iterators. Since the root cause is same as SPARK-38320, this is only 
> reproducible with RocksDB state store provider, but even with HDFS backed 
> state store provider, it is not guaranteed by interface contract hence may 
> depend on the JVM vendor, version, etc.
> I can easily construct the scenario of “data loss” in state store.
> Condition follows:
>  * Use stream-stream time interval outer join
>  ** left outer join has an issue on left side, right outer join has an issue 
> on right side, full outer join has an issue on both sides
>  * At batch N, produce row(s) on the problematic side which are non-late
>  * At the same batch (batch N), some row(s) on the problematic side should be 
> evicted by watermark condition
> When the condition is fulfilled, out of sync happens with keyToNumValues 
> between state and the iterator in evict phase. If eviction of the row happens 
> for the grouping key (updating keyToNumValues), the eviction phase 
> “overwrites” keyToNumValues in the state as the value it calculates.
> Given that the eviction phase “do not know” about the new rows 
> (keyToNumValues is out of sync), effectively discarding all rows from the 
> state being added in the batch N.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38682) Complex calculations with lead to driver oom

2022-03-29 Thread JacobZheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JacobZheng updated SPARK-38682:
---
Description: 
My spark job is working fine in version 3.0.1. After I upgraded to 3.2, the 
driver would hang during runtime due to oom. The dump file shows that the 
stageMetrics in SQLAppStatusListener are taking up a lot of memory.I'm 
wondering if it's related to the SPARK-33016 change, or if the execution plan 
change has created more tasks, causing the driver to run out of memory, or some 
other reason.
 !screenshot-1.png! 


  was:
My spark job is working fine in version 3.0.1. After I upgraded to 3.2, the 
driver would hang during runtime due to oom. The dump file shows that the 
stageMetrics in SQLAppStatusListener are taking up a lot of memory.I'm 
wondering if it's related to the SPARK-33016 change, or if the execution plan 
change has created more stages, causing the driver to run out of memory, or 
some other reason.
 !screenshot-1.png! 



> Complex calculations with  lead to driver oom
> -
>
> Key: SPARK-38682
> URL: https://issues.apache.org/jira/browse/SPARK-38682
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: JacobZheng
>Priority: Major
> Attachments: screenshot-1.png
>
>
> My spark job is working fine in version 3.0.1. After I upgraded to 3.2, the 
> driver would hang during runtime due to oom. The dump file shows that the 
> stageMetrics in SQLAppStatusListener are taking up a lot of memory.I'm 
> wondering if it's related to the SPARK-33016 change, or if the execution plan 
> change has created more tasks, causing the driver to run out of memory, or 
> some other reason.
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38670) Add offset commit time to streaming query listener

2022-03-29 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-38670.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 35985
[https://github.com/apache/spark/pull/35985]

> Add offset commit time to streaming query listener
> --
>
> Key: SPARK-38670
> URL: https://issues.apache.org/jira/browse/SPARK-38670
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.2.1
>Reporter: Boyang Jerry Peng
>Assignee: Boyang Jerry Peng
>Priority: Major
> Fix For: 3.4.0
>
>
> A major portion of the batch duration is committing offsets at the end of the 
> micro-batch.  The timing for this operation is missing from the durationMs 
> metrics.  Lets add this metric to have a more complete picture of where the 
> time is going during the processing of a micro-batch



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38670) Add offset commit time to streaming query listener

2022-03-29 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-38670:


Assignee: Boyang Jerry Peng

> Add offset commit time to streaming query listener
> --
>
> Key: SPARK-38670
> URL: https://issues.apache.org/jira/browse/SPARK-38670
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.2.1
>Reporter: Boyang Jerry Peng
>Assignee: Boyang Jerry Peng
>Priority: Major
>
> A major portion of the batch duration is committing offsets at the end of the 
> micro-batch.  The timing for this operation is missing from the durationMs 
> metrics.  Lets add this metric to have a more complete picture of where the 
> time is going during the processing of a micro-batch



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38685) Improve the implement of percentile_cont

2022-03-29 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-38685:
--

 Summary: Improve the implement of percentile_cont
 Key: SPARK-38685
 URL: https://issues.apache.org/jira/browse/SPARK-38685
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38685) Improve the implement of percentile_cont

2022-03-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38685:


Assignee: Apache Spark

> Improve the implement of percentile_cont
> 
>
> Key: SPARK-38685
> URL: https://issues.apache.org/jira/browse/SPARK-38685
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38685) Improve the implement of percentile_cont

2022-03-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38685:


Assignee: (was: Apache Spark)

> Improve the implement of percentile_cont
> 
>
> Key: SPARK-38685
> URL: https://issues.apache.org/jira/browse/SPARK-38685
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38685) Improve the implement of percentile_cont

2022-03-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514062#comment-17514062
 ] 

Apache Spark commented on SPARK-38685:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/36003

> Improve the implement of percentile_cont
> 
>
> Key: SPARK-38685
> URL: https://issues.apache.org/jira/browse/SPARK-38685
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38685) Improve the implement of percentile_cont

2022-03-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514060#comment-17514060
 ] 

Apache Spark commented on SPARK-38685:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/36003

> Improve the implement of percentile_cont
> 
>
> Key: SPARK-38685
> URL: https://issues.apache.org/jira/browse/SPARK-38685
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38681) Support nested generic case classes

2022-03-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514064#comment-17514064
 ] 

Apache Spark commented on SPARK-38681:
--

User 'eejbyfeldt' has created a pull request for this issue:
https://github.com/apache/spark/pull/36004

> Support nested generic case classes
> ---
>
> Key: SPARK-38681
> URL: https://issues.apache.org/jira/browse/SPARK-38681
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Emil Ejbyfeldt
>Priority: Major
>
> Spark fail to derive schemas when using nested case class with generic 
> parameters. 
> Example
> {code:java}
> case class GenericData[A](
> genericField: A)
> {code}
> Will derive a correct schema for `GenericData[Int]` but if the classes are 
> nested e.g.
> {code:java}
> case class NestedGeneric[T](
>   generic: GenericData[T])
> {code}
> it will fail to derive a schema for `NestedGeneric[Int]`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38681) Support nested generic case classes

2022-03-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38681:


Assignee: (was: Apache Spark)

> Support nested generic case classes
> ---
>
> Key: SPARK-38681
> URL: https://issues.apache.org/jira/browse/SPARK-38681
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Emil Ejbyfeldt
>Priority: Major
>
> Spark fail to derive schemas when using nested case class with generic 
> parameters. 
> Example
> {code:java}
> case class GenericData[A](
> genericField: A)
> {code}
> Will derive a correct schema for `GenericData[Int]` but if the classes are 
> nested e.g.
> {code:java}
> case class NestedGeneric[T](
>   generic: GenericData[T])
> {code}
> it will fail to derive a schema for `NestedGeneric[Int]`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38681) Support nested generic case classes

2022-03-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38681:


Assignee: Apache Spark

> Support nested generic case classes
> ---
>
> Key: SPARK-38681
> URL: https://issues.apache.org/jira/browse/SPARK-38681
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Emil Ejbyfeldt
>Assignee: Apache Spark
>Priority: Major
>
> Spark fail to derive schemas when using nested case class with generic 
> parameters. 
> Example
> {code:java}
> case class GenericData[A](
> genericField: A)
> {code}
> Will derive a correct schema for `GenericData[Int]` but if the classes are 
> nested e.g.
> {code:java}
> case class NestedGeneric[T](
>   generic: GenericData[T])
> {code}
> it will fail to derive a schema for `NestedGeneric[Int]`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38674) Remove useless deduplicate in SubqueryBroadcastExec

2022-03-29 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-38674.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 35989
[https://github.com/apache/spark/pull/35989]

> Remove useless deduplicate in SubqueryBroadcastExec
> ---
>
> Key: SPARK-38674
> URL: https://issues.apache.org/jira/browse/SPARK-38674
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> Distinct performance:
> https://github.com/apache/spark/pull/29642#discussion_r511606498



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38674) Remove useless deduplicate in SubqueryBroadcastExec

2022-03-29 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-38674:
---

Assignee: Yuming Wang

> Remove useless deduplicate in SubqueryBroadcastExec
> ---
>
> Key: SPARK-38674
> URL: https://issues.apache.org/jira/browse/SPARK-38674
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> Distinct performance:
> https://github.com/apache/spark/pull/29642#discussion_r511606498



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38506) Push partial aggregation through join

2022-03-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38506:


Assignee: Apache Spark

> Push partial aggregation through join
> -
>
> Key: SPARK-38506
> URL: https://issues.apache.org/jira/browse/SPARK-38506
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> Please see 
> https://docs.teradata.com/r/Teradata-VantageTM-SQL-Request-and-Transaction-Processing/March-2019/Join-Planning-and-Optimization/Partial-GROUP-BY-Block-Optimization
>  for more details.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38506) Push partial aggregation through join

2022-03-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38506:


Assignee: (was: Apache Spark)

> Push partial aggregation through join
> -
>
> Key: SPARK-38506
> URL: https://issues.apache.org/jira/browse/SPARK-38506
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Major
>
> Please see 
> https://docs.teradata.com/r/Teradata-VantageTM-SQL-Request-and-Transaction-Processing/March-2019/Join-Planning-and-Optimization/Partial-GROUP-BY-Block-Optimization
>  for more details.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38506) Push partial aggregation through join

2022-03-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514133#comment-17514133
 ] 

Apache Spark commented on SPARK-38506:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/36005

> Push partial aggregation through join
> -
>
> Key: SPARK-38506
> URL: https://issues.apache.org/jira/browse/SPARK-38506
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Major
>
> Please see 
> https://docs.teradata.com/r/Teradata-VantageTM-SQL-Request-and-Transaction-Processing/March-2019/Join-Planning-and-Optimization/Partial-GROUP-BY-Block-Optimization
>  for more details.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38562) Add doc for Volcano scheduler

2022-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-38562.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 35870
[https://github.com/apache/spark/pull/35870]

> Add doc for Volcano scheduler
> -
>
> Key: SPARK-38562
> URL: https://issues.apache.org/jira/browse/SPARK-38562
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, Kubernetes
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38562) Add doc for Volcano scheduler

2022-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-38562:
-

Assignee: Yikun Jiang

> Add doc for Volcano scheduler
> -
>
> Key: SPARK-38562
> URL: https://issues.apache.org/jira/browse/SPARK-38562
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, Kubernetes
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38562) Add doc for Volcano scheduler

2022-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38562:
--
Fix Version/s: 3.3.0
   (was: 3.4.0)

> Add doc for Volcano scheduler
> -
>
> Key: SPARK-38562
> URL: https://issues.apache.org/jira/browse/SPARK-38562
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, Kubernetes
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38686) Implement `keep` parameter of `(Index/MultiIndex).drop_duplicates`

2022-03-29 Thread Xinrong Meng (Jira)

Xinrong Meng created SPARK-38686:


 Summary: Implement `keep` parameter of 
`(Index/MultiIndex).drop_duplicates`
 Key: SPARK-38686
 URL: https://issues.apache.org/jira/browse/SPARK-38686
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Xinrong Meng


Implement `keep` parameter of `(Index/MultiIndex).drop_duplicates`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37982) Use error classes in the execution errors related to unsupported input type

2022-03-29 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-37982.
--
Fix Version/s: 3.4.0
   (was: 3.3.0)
   Resolution: Fixed

Issue resolved by pull request 35274
[https://github.com/apache/spark/pull/35274]

> Use error classes in the execution errors related to unsupported input type
> ---
>
> Key: SPARK-37982
> URL: https://issues.apache.org/jira/browse/SPARK-37982
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: leesf
>Assignee: leesf
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37982) Use error classes in the execution errors related to unsupported input type

2022-03-29 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-37982:


Assignee: leesf

> Use error classes in the execution errors related to unsupported input type
> ---
>
> Key: SPARK-37982
> URL: https://issues.apache.org/jira/browse/SPARK-37982
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: leesf
>Assignee: leesf
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38686) Implement `keep` parameter of `(Index/MultiIndex).drop_duplicates`

2022-03-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514275#comment-17514275
 ] 

Apache Spark commented on SPARK-38686:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/36006

> Implement `keep` parameter of `(Index/MultiIndex).drop_duplicates`
> --
>
> Key: SPARK-38686
> URL: https://issues.apache.org/jira/browse/SPARK-38686
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement `keep` parameter of `(Index/MultiIndex).drop_duplicates`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38686) Implement `keep` parameter of `(Index/MultiIndex).drop_duplicates`

2022-03-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38686:


Assignee: (was: Apache Spark)

> Implement `keep` parameter of `(Index/MultiIndex).drop_duplicates`
> --
>
> Key: SPARK-38686
> URL: https://issues.apache.org/jira/browse/SPARK-38686
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement `keep` parameter of `(Index/MultiIndex).drop_duplicates`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38686) Implement `keep` parameter of `(Index/MultiIndex).drop_duplicates`

2022-03-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38686:


Assignee: Apache Spark

> Implement `keep` parameter of `(Index/MultiIndex).drop_duplicates`
> --
>
> Key: SPARK-38686
> URL: https://issues.apache.org/jira/browse/SPARK-38686
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Implement `keep` parameter of `(Index/MultiIndex).drop_duplicates`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38687) Use error classes in the compilation errors of generators

2022-03-29 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-38687:
-
Affects Version/s: 3.4.0
   (was: 3.3.0)

> Use error classes in the compilation errors of generators
> -
>
> Key: SPARK-38687
> URL: https://issues.apache.org/jira/browse/SPARK-38687
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the following errors in QueryCompilationErrors:
> * windowSpecificationNotDefinedError
> * windowAggregateFunctionWithFilterNotSupportedError
> * windowFunctionInsideAggregateFunctionNotAllowedError
> * expressionWithoutWindowExpressionError
> * expressionWithMultiWindowExpressionsError
> * windowFunctionNotAllowedError
> * cannotSpecifyWindowFrameError
> * windowFrameNotMatchRequiredFrameError
> * windowFunctionWithWindowFrameNotOrderedError
> * multiTimeWindowExpressionsNotSupportedError
> * sessionWindowGapDurationDataTypeError
> * invalidLiteralForWindowDurationError
> * emptyWindowExpressionError
> * foundDifferentWindowFunctionTypeError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryCompilationErrorsSuite.
> *Feel free to split this to sub-tasks.*



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38687) Use error classes in the compilation errors of generators

2022-03-29 Thread Max Gekk (Jira)

Max Gekk created SPARK-38687:


 Summary: Use error classes in the compilation errors of generators
 Key: SPARK-38687
 URL: https://issues.apache.org/jira/browse/SPARK-38687
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Max Gekk


Migrate the following errors in QueryCompilationErrors:
* windowSpecificationNotDefinedError
* windowAggregateFunctionWithFilterNotSupportedError
* windowFunctionInsideAggregateFunctionNotAllowedError
* expressionWithoutWindowExpressionError
* expressionWithMultiWindowExpressionsError
* windowFunctionNotAllowedError
* cannotSpecifyWindowFrameError
* windowFrameNotMatchRequiredFrameError
* windowFunctionWithWindowFrameNotOrderedError
* multiTimeWindowExpressionsNotSupportedError
* sessionWindowGapDurationDataTypeError
* invalidLiteralForWindowDurationError
* emptyWindowExpressionError
* foundDifferentWindowFunctionTypeError

onto use error classes. Throw an implementation of SparkThrowable. Also write a 
test per every error in QueryCompilationErrorsSuite.

*Feel free to split this to sub-tasks.*



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38687) Use error classes in the compilation errors of generators

2022-03-29 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-38687:
-
Description: 
Migrate the following errors in QueryCompilationErrors:
* nestedGeneratorError
* moreThanOneGeneratorError
* generatorOutsideSelectError
* generatorNotExpectedError

onto use error classes. Throw an implementation of SparkThrowable. Also write a 
test per every error in QueryCompilationErrorsSuite.

  was:
Migrate the following errors in QueryCompilationErrors:
* windowSpecificationNotDefinedError
* windowAggregateFunctionWithFilterNotSupportedError
* windowFunctionInsideAggregateFunctionNotAllowedError
* expressionWithoutWindowExpressionError
* expressionWithMultiWindowExpressionsError
* windowFunctionNotAllowedError
* cannotSpecifyWindowFrameError
* windowFrameNotMatchRequiredFrameError
* windowFunctionWithWindowFrameNotOrderedError
* multiTimeWindowExpressionsNotSupportedError
* sessionWindowGapDurationDataTypeError
* invalidLiteralForWindowDurationError
* emptyWindowExpressionError
* foundDifferentWindowFunctionTypeError

onto use error classes. Throw an implementation of SparkThrowable. Also write a 
test per every error in QueryCompilationErrorsSuite.

*Feel free to split this to sub-tasks.*


> Use error classes in the compilation errors of generators
> -
>
> Key: SPARK-38687
> URL: https://issues.apache.org/jira/browse/SPARK-38687
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the following errors in QueryCompilationErrors:
> * nestedGeneratorError
> * moreThanOneGeneratorError
> * generatorOutsideSelectError
> * generatorNotExpectedError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38688) Use error classes in the compilation errors of deserializer

2022-03-29 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-38688:
-
Description: 
Migrate the following errors in QueryCompilationErrors:
* dataTypeMismatchForDeserializerError
* fieldNumberMismatchForDeserializerError

onto use error classes. Throw an implementation of SparkThrowable. Also write a 
test per every error in QueryCompilationErrorsSuite.

  was:
Migrate the following errors in QueryCompilationErrors:
* nestedGeneratorError
* moreThanOneGeneratorError
* generatorOutsideSelectError
* generatorNotExpectedError

onto use error classes. Throw an implementation of SparkThrowable. Also write a 
test per every error in QueryCompilationErrorsSuite.


> Use error classes in the compilation errors of deserializer
> ---
>
> Key: SPARK-38688
> URL: https://issues.apache.org/jira/browse/SPARK-38688
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the following errors in QueryCompilationErrors:
> * dataTypeMismatchForDeserializerError
> * fieldNumberMismatchForDeserializerError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38688) Use error classes in the compilation errors of deserializer

2022-03-29 Thread Max Gekk (Jira)

Max Gekk created SPARK-38688:


 Summary: Use error classes in the compilation errors of 
deserializer
 Key: SPARK-38688
 URL: https://issues.apache.org/jira/browse/SPARK-38688
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk


Migrate the following errors in QueryCompilationErrors:
* nestedGeneratorError
* moreThanOneGeneratorError
* generatorOutsideSelectError
* generatorNotExpectedError

onto use error classes. Throw an implementation of SparkThrowable. Also write a 
test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38689) Use error classes in the compilation errors of not allowed DESC PARTITION

2022-03-29 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-38689:
-
Affects Version/s: 3.4.0
   (was: 3.3.0)

> Use error classes in the compilation errors of not allowed DESC PARTITION
> -
>
> Key: SPARK-38689
> URL: https://issues.apache.org/jira/browse/SPARK-38689
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the following errors in QueryCompilationErrors:
> * unsupportedIfNotExistsError
> * nonPartitionColError
> * missingStaticPartitionColumn
> * alterV2TableSetLocationWithPartitionNotSupportedError
> * invalidPartitionSpecError
> * partitionNotSpecifyLocationUriError
> * describeDoesNotSupportPartitionForV2TablesError
> * tableDoesNotSupportPartitionManagementError
> * tableDoesNotSupportAtomicPartitionManagementError
> * alterTableRecoverPartitionsNotSupportedForV2TablesError
> * partitionColumnNotSpecifiedError
> * invalidPartitionColumnError
> * multiplePartitionColumnValuesSpecifiedError
> * cannotUseDataTypeForPartitionColumnError
> * cannotUseAllColumnsForPartitionColumnsError
> * partitionColumnNotFoundInSchemaError
> * mismatchedTablePartitionColumnError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38689) Use error classes in the compilation errors of not allowed DESC PARTITION

2022-03-29 Thread Max Gekk (Jira)

Max Gekk created SPARK-38689:


 Summary: Use error classes in the compilation errors of not 
allowed DESC PARTITION
 Key: SPARK-38689
 URL: https://issues.apache.org/jira/browse/SPARK-38689
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Max Gekk


Migrate the following errors in QueryCompilationErrors:
* unsupportedIfNotExistsError
* nonPartitionColError
* missingStaticPartitionColumn
* alterV2TableSetLocationWithPartitionNotSupportedError
* invalidPartitionSpecError
* partitionNotSpecifyLocationUriError
* describeDoesNotSupportPartitionForV2TablesError
* tableDoesNotSupportPartitionManagementError
* tableDoesNotSupportAtomicPartitionManagementError
* alterTableRecoverPartitionsNotSupportedForV2TablesError
* partitionColumnNotSpecifiedError
* invalidPartitionColumnError
* multiplePartitionColumnValuesSpecifiedError
* cannotUseDataTypeForPartitionColumnError
* cannotUseAllColumnsForPartitionColumnsError
* partitionColumnNotFoundInSchemaError
* mismatchedTablePartitionColumnError

onto use error classes. Throw an implementation of SparkThrowable. Also write a 
test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38689) Use error classes in the compilation errors of not allowed DESC PARTITION

2022-03-29 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-38689:
-
Description: 
Migrate the following errors in QueryCompilationErrors:
* descPartitionNotAllowedOnTempView
* descPartitionNotAllowedOnView

onto use error classes. Throw an implementation of SparkThrowable. Also write a 
test per every error in QueryCompilationErrorsSuite.

  was:
Migrate the following errors in QueryCompilationErrors:
* unsupportedIfNotExistsError
* nonPartitionColError
* missingStaticPartitionColumn
* alterV2TableSetLocationWithPartitionNotSupportedError
* invalidPartitionSpecError
* partitionNotSpecifyLocationUriError
* describeDoesNotSupportPartitionForV2TablesError
* tableDoesNotSupportPartitionManagementError
* tableDoesNotSupportAtomicPartitionManagementError
* alterTableRecoverPartitionsNotSupportedForV2TablesError
* partitionColumnNotSpecifiedError
* invalidPartitionColumnError
* multiplePartitionColumnValuesSpecifiedError
* cannotUseDataTypeForPartitionColumnError
* cannotUseAllColumnsForPartitionColumnsError
* partitionColumnNotFoundInSchemaError
* mismatchedTablePartitionColumnError

onto use error classes. Throw an implementation of SparkThrowable. Also write a 
test per every error in QueryCompilationErrorsSuite.


> Use error classes in the compilation errors of not allowed DESC PARTITION
> -
>
> Key: SPARK-38689
> URL: https://issues.apache.org/jira/browse/SPARK-38689
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the following errors in QueryCompilationErrors:
> * descPartitionNotAllowedOnTempView
> * descPartitionNotAllowedOnView
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38689) Use error classes in the compilation errors of not allowed DESC PARTITION

2022-03-29 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-38689:
-
Description: 
Migrate the following errors in QueryCompilationErrors:
* descPartitionNotAllowedOnTempView
* descPartitionNotAllowedOnView
* descPartitionNotAllowedOnViewError

onto use error classes. Throw an implementation of SparkThrowable. Also write a 
test per every error in QueryCompilationErrorsSuite.

  was:
Migrate the following errors in QueryCompilationErrors:
* descPartitionNotAllowedOnTempView
* descPartitionNotAllowedOnView

onto use error classes. Throw an implementation of SparkThrowable. Also write a 
test per every error in QueryCompilationErrorsSuite.


> Use error classes in the compilation errors of not allowed DESC PARTITION
> -
>
> Key: SPARK-38689
> URL: https://issues.apache.org/jira/browse/SPARK-38689
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the following errors in QueryCompilationErrors:
> * descPartitionNotAllowedOnTempView
> * descPartitionNotAllowedOnView
> * descPartitionNotAllowedOnViewError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38690) Use error classes in the compilation errors of SHOW CREATE TABLE

2022-03-29 Thread Max Gekk (Jira)

Max Gekk created SPARK-38690:


 Summary: Use error classes in the compilation errors of SHOW 
CREATE TABLE
 Key: SPARK-38690
 URL: https://issues.apache.org/jira/browse/SPARK-38690
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk


Migrate the following errors in QueryCompilationErrors:
* descPartitionNotAllowedOnTempView
* descPartitionNotAllowedOnView
* descPartitionNotAllowedOnViewError

onto use error classes. Throw an implementation of SparkThrowable. Also write a 
test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38690) Use error classes in the compilation errors of SHOW CREATE TABLE

2022-03-29 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-38690:
-
Description: 
Migrate the following errors in QueryCompilationErrors:
* showCreateTableAsSerdeNotSupportedForV2TablesError
* showCreateTableNotSupportedOnTempView
* showCreateTableFailToExecuteUnsupportedFeatureError
* showCreateTableNotSupportTransactionalHiveTableError
* showCreateTableFailToExecuteUnsupportedConfError
* showCreateTableAsSerdeNotAllowedOnSparkDataSourceTableError
* showCreateTableOrViewFailToExecuteUnsupportedFeatureError

onto use error classes. Throw an implementation of SparkThrowable. Also write a 
test per every error in QueryCompilationErrorsSuite.

  was:
Migrate the following errors in QueryCompilationErrors:
* descPartitionNotAllowedOnTempView
* descPartitionNotAllowedOnView
* descPartitionNotAllowedOnViewError

onto use error classes. Throw an implementation of SparkThrowable. Also write a 
test per every error in QueryCompilationErrorsSuite.


> Use error classes in the compilation errors of SHOW CREATE TABLE
> 
>
> Key: SPARK-38690
> URL: https://issues.apache.org/jira/browse/SPARK-38690
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the following errors in QueryCompilationErrors:
> * showCreateTableAsSerdeNotSupportedForV2TablesError
> * showCreateTableNotSupportedOnTempView
> * showCreateTableFailToExecuteUnsupportedFeatureError
> * showCreateTableNotSupportTransactionalHiveTableError
> * showCreateTableFailToExecuteUnsupportedConfError
> * showCreateTableAsSerdeNotAllowedOnSparkDataSourceTableError
> * showCreateTableOrViewFailToExecuteUnsupportedFeatureError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38691) Use error classes in the compilation errors of column/attr resolving

2022-03-29 Thread Max Gekk (Jira)

Max Gekk created SPARK-38691:


 Summary: Use error classes in the compilation errors of 
column/attr resolving
 Key: SPARK-38691
 URL: https://issues.apache.org/jira/browse/SPARK-38691
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk


Migrate the following errors in QueryCompilationErrors:
* showCreateTableAsSerdeNotSupportedForV2TablesError
* showCreateTableNotSupportedOnTempView
* showCreateTableFailToExecuteUnsupportedFeatureError
* showCreateTableNotSupportTransactionalHiveTableError
* showCreateTableFailToExecuteUnsupportedConfError
* showCreateTableAsSerdeNotAllowedOnSparkDataSourceTableError
* showCreateTableOrViewFailToExecuteUnsupportedFeatureError

onto use error classes. Throw an implementation of SparkThrowable. Also write a 
test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38691) Use error classes in the compilation errors of column/attr resolving

2022-03-29 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-38691:
-
Description: 
Migrate the following errors in QueryCompilationErrors:
* cannotResolveUserSpecifiedColumnsError
* cannotResolveStarExpandGivenInputColumnsError
* cannotResolveAttributeError
* cannotResolveColumnGivenInputColumnsError
* cannotResolveColumnNameAmongAttributesError
* cannotResolveColumnNameAmongFieldsError

onto use error classes. Throw an implementation of SparkThrowable. Also write a 
test per every error in QueryCompilationErrorsSuite.

  was:
Migrate the following errors in QueryCompilationErrors:
* showCreateTableAsSerdeNotSupportedForV2TablesError
* showCreateTableNotSupportedOnTempView
* showCreateTableFailToExecuteUnsupportedFeatureError
* showCreateTableNotSupportTransactionalHiveTableError
* showCreateTableFailToExecuteUnsupportedConfError
* showCreateTableAsSerdeNotAllowedOnSparkDataSourceTableError
* showCreateTableOrViewFailToExecuteUnsupportedFeatureError

onto use error classes. Throw an implementation of SparkThrowable. Also write a 
test per every error in QueryCompilationErrorsSuite.


> Use error classes in the compilation errors of column/attr resolving
> 
>
> Key: SPARK-38691
> URL: https://issues.apache.org/jira/browse/SPARK-38691
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the following errors in QueryCompilationErrors:
> * cannotResolveUserSpecifiedColumnsError
> * cannotResolveStarExpandGivenInputColumnsError
> * cannotResolveAttributeError
> * cannotResolveColumnGivenInputColumnsError
> * cannotResolveColumnNameAmongAttributesError
> * cannotResolveColumnNameAmongFieldsError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38692) Use error classes in the compilation errors of function args

2022-03-29 Thread Max Gekk (Jira)

Max Gekk created SPARK-38692:


 Summary: Use error classes in the compilation errors of function 
args
 Key: SPARK-38692
 URL: https://issues.apache.org/jira/browse/SPARK-38692
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk


Migrate the following errors in QueryCompilationErrors:
* cannotResolveUserSpecifiedColumnsError
* cannotResolveStarExpandGivenInputColumnsError
* cannotResolveAttributeError
* cannotResolveColumnGivenInputColumnsError
* cannotResolveColumnNameAmongAttributesError
* cannotResolveColumnNameAmongFieldsError

onto use error classes. Throw an implementation of SparkThrowable. Also write a 
test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38692) Use error classes in the compilation errors of function args

2022-03-29 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-38692:
-
Description: 
Migrate the following errors in QueryCompilationErrors:
* invalidFunctionArgumentsError
* invalidFunctionArgumentNumberError
* functionAcceptsOnlyOneArgumentError
* secondArgumentNotDoubleLiteralError
* functionCannotProcessInputError
* v2FunctionInvalidInputTypeLengthError
* secondArgumentInFunctionIsNotBooleanLiteralError

onto use error classes. Throw an implementation of SparkThrowable. Also write a 
test per every error in QueryCompilationErrorsSuite.

  was:
Migrate the following errors in QueryCompilationErrors:
* cannotResolveUserSpecifiedColumnsError
* cannotResolveStarExpandGivenInputColumnsError
* cannotResolveAttributeError
* cannotResolveColumnGivenInputColumnsError
* cannotResolveColumnNameAmongAttributesError
* cannotResolveColumnNameAmongFieldsError

onto use error classes. Throw an implementation of SparkThrowable. Also write a 
test per every error in QueryCompilationErrorsSuite.


> Use error classes in the compilation errors of function args
> 
>
> Key: SPARK-38692
> URL: https://issues.apache.org/jira/browse/SPARK-38692
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Migrate the following errors in QueryCompilationErrors:
> * invalidFunctionArgumentsError
> * invalidFunctionArgumentNumberError
> * functionAcceptsOnlyOneArgumentError
> * secondArgumentNotDoubleLiteralError
> * functionCannotProcessInputError
> * v2FunctionInvalidInputTypeLengthError
> * secondArgumentInFunctionIsNotBooleanLiteralError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryCompilationErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38693) Spark does not use SessionManager

2022-03-29 Thread Brad Solomon (Jira)

Brad Solomon created SPARK-38693:


 Summary: Spark does not use SessionManager
 Key: SPARK-38693
 URL: https://issues.apache.org/jira/browse/SPARK-38693
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.2.1
Reporter: Brad Solomon


Spark's failure to use a `SessionManager` causes 
`java.lang.IllegalStateException: No SessionManager` that prevents Spark UI 
from being used with 
[`org.keycloak.adapters.servlet.KeycloakOIDCFilter`](https://www.keycloak.org/docs/latest/securing_apps/#_servlet_filter_adapter)
 as the `spark.ui.filters` class.

 

Sample logs:

 

```

spark_1 | 22/03/29 18:43:24 INFO KeycloakDeployment: Loaded URLs from 
http://REDACTED/auth/realms/master/.well-known/openid-configuration
spark_1 | 22/03/29 18:43:24 WARN HttpChannel: /
spark_1 | java.lang.IllegalStateException: No SessionManager

```

 

Configuration:

 

```

spark.ui.filters=org.keycloak.adapters.servlet.KeycloakOIDCFilter
spark.acls.enable=true
spark.admin.acls=*
spark.ui.view.acls=*
spark.org.keycloak.adapters.servlet.KeycloakOIDCFilter.param.keycloak.config.file=/opt/bitnami/spark/conf/spark-keycloak.json

```

 

This exception emanates from Jetty:

 

[https://github.com/eclipse/jetty.project/blob/ae5c8e34e7dd4f5cce5f649e48469ba3bbc51d91/jetty-server/src/main/java/org/eclipse/jetty/server/Request.java#L1524]

 

It appears that Spark's `ServletContextHandler` has the ability to use a 
`SessionManager` but doesn't. This seems to be a blocker that prevents 
integration with Keycloak entirely.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38693) Spark does not use SessionManager

2022-03-29 Thread Brad Solomon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brad Solomon updated SPARK-38693:
-
Description: 
Spark's failure to use a `SessionManager` causes 
`java.lang.IllegalStateException: No SessionManager` that prevents Spark UI 
from being used with 
[`org.keycloak.adapters.servlet.KeycloakOIDCFilter`]([https://www.keycloak.org/docs/latest/securing_apps/#_servlet_filter_adapter])
 as the `spark.ui.filters` class.

 

Sample logs:

 
{code:java}
spark_1 | 22/03/29 18:43:24 INFO KeycloakDeployment: Loaded URLs from 
http://REDACTED/auth/realms/master/.well-known/openid-configuration
spark_1 | 22/03/29 18:43:24 WARN HttpChannel: /
spark_1 | java.lang.IllegalStateException: No SessionManager{code}
 

Configuration:

 

 
{code:java}
spark.ui.filters=org.keycloak.adapters.servlet.KeycloakOIDCFilter
spark.acls.enable=true
spark.admin.acls=*
spark.ui.view.acls=*
spark.org.keycloak.adapters.servlet.KeycloakOIDCFilter.param.keycloak.config.file=/opt/bitnami/spark/conf/spark-keycloak.json
 
{code}
 

This exception emanates from Jetty:

 

[https://github.com/eclipse/jetty.project/blob/ae5c8e34e7dd4f5cce5f649e48469ba3bbc51d91/jetty-server/src/main/java/org/eclipse/jetty/server/Request.java#L1524]

 

It appears that Spark's `ServletContextHandler` has the ability to use a 
`SessionManager` but doesn't. This seems to be a blocker that prevents 
integration with Keycloak entirely.

  was:
Spark's failure to use a `SessionManager` causes 
`java.lang.IllegalStateException: No SessionManager` that prevents Spark UI 
from being used with 
[`org.keycloak.adapters.servlet.KeycloakOIDCFilter`](https://www.keycloak.org/docs/latest/securing_apps/#_servlet_filter_adapter)
 as the `spark.ui.filters` class.

 

Sample logs:

 

```

spark_1 | 22/03/29 18:43:24 INFO KeycloakDeployment: Loaded URLs from 
http://REDACTED/auth/realms/master/.well-known/openid-configuration
spark_1 | 22/03/29 18:43:24 WARN HttpChannel: /
spark_1 | java.lang.IllegalStateException: No SessionManager

```

 

Configuration:

 

```

spark.ui.filters=org.keycloak.adapters.servlet.KeycloakOIDCFilter
spark.acls.enable=true
spark.admin.acls=*
spark.ui.view.acls=*
spark.org.keycloak.adapters.servlet.KeycloakOIDCFilter.param.keycloak.config.file=/opt/bitnami/spark/conf/spark-keycloak.json

```

 

This exception emanates from Jetty:

 

[https://github.com/eclipse/jetty.project/blob/ae5c8e34e7dd4f5cce5f649e48469ba3bbc51d91/jetty-server/src/main/java/org/eclipse/jetty/server/Request.java#L1524]

 

It appears that Spark's `ServletContextHandler` has the ability to use a 
`SessionManager` but doesn't. This seems to be a blocker that prevents 
integration with Keycloak entirely.


> Spark does not use SessionManager
> -
>
> Key: SPARK-38693
> URL: https://issues.apache.org/jira/browse/SPARK-38693
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.1
>Reporter: Brad Solomon
>Priority: Blocker
>
> Spark's failure to use a `SessionManager` causes 
> `java.lang.IllegalStateException: No SessionManager` that prevents Spark UI 
> from being used with 
> [`org.keycloak.adapters.servlet.KeycloakOIDCFilter`]([https://www.keycloak.org/docs/latest/securing_apps/#_servlet_filter_adapter])
>  as the `spark.ui.filters` class.
>  
> Sample logs:
>  
> {code:java}
> spark_1 | 22/03/29 18:43:24 INFO KeycloakDeployment: Loaded URLs from 
> http://REDACTED/auth/realms/master/.well-known/openid-configuration
> spark_1 | 22/03/29 18:43:24 WARN HttpChannel: /
> spark_1 | java.lang.IllegalStateException: No SessionManager{code}
>  
> Configuration:
>  
>  
> {code:java}
> spark.ui.filters=org.keycloak.adapters.servlet.KeycloakOIDCFilter
> spark.acls.enable=true
> spark.admin.acls=*
> spark.ui.view.acls=*
> spark.org.keycloak.adapters.servlet.KeycloakOIDCFilter.param.keycloak.config.file=/opt/bitnami/spark/conf/spark-keycloak.json
>  
> {code}
>  
> This exception emanates from Jetty:
>  
> [https://github.com/eclipse/jetty.project/blob/ae5c8e34e7dd4f5cce5f649e48469ba3bbc51d91/jetty-server/src/main/java/org/eclipse/jetty/server/Request.java#L1524]
>  
> It appears that Spark's `ServletContextHandler` has the ability to use a 
> `SessionManager` but doesn't. This seems to be a blocker that prevents 
> integration with Keycloak entirely.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38693) Spark does not use SessionManager

2022-03-29 Thread Brad Solomon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brad Solomon updated SPARK-38693:
-
Description: 
Spark's failure to use a `SessionManager` causes 
`java.lang.IllegalStateException: No SessionManager` that prevents Spark UI 
from being used with 
[org.keycloak.adapters.servlet.KeycloakOIDCFilter|[https://www.keycloak.org/docs/latest/securing_apps/#_servlet_filter_adapter]]
 as the `spark.ui.filters` class.

 

Sample logs:

 
{code:java}
spark_1 | 22/03/29 18:43:24 INFO KeycloakDeployment: Loaded URLs from 
http://REDACTED/auth/realms/master/.well-known/openid-configuration
spark_1 | 22/03/29 18:43:24 WARN HttpChannel: /
spark_1 | java.lang.IllegalStateException: No SessionManager{code}
 

Configuration:

 

 
{code:java}
spark.ui.filters=org.keycloak.adapters.servlet.KeycloakOIDCFilter
spark.acls.enable=true
spark.admin.acls=*
spark.ui.view.acls=*
spark.org.keycloak.adapters.servlet.KeycloakOIDCFilter.param.keycloak.config.file=/opt/bitnami/spark/conf/spark-keycloak.json
 
{code}
 

This exception emanates from Jetty:

 

[https://github.com/eclipse/jetty.project/blob/ae5c8e34e7dd4f5cce5f649e48469ba3bbc51d91/jetty-server/src/main/java/org/eclipse/jetty/server/Request.java#L1524]

 

It appears that Spark's `ServletContextHandler` has the ability to use a 
`SessionManager` but doesn't. This seems to be a blocker that prevents 
integration with Keycloak entirely.

  was:
Spark's failure to use a `SessionManager` causes 
`java.lang.IllegalStateException: No SessionManager` that prevents Spark UI 
from being used with 
[`org.keycloak.adapters.servlet.KeycloakOIDCFilter`]([https://www.keycloak.org/docs/latest/securing_apps/#_servlet_filter_adapter])
 as the `spark.ui.filters` class.

 

Sample logs:

 
{code:java}
spark_1 | 22/03/29 18:43:24 INFO KeycloakDeployment: Loaded URLs from 
http://REDACTED/auth/realms/master/.well-known/openid-configuration
spark_1 | 22/03/29 18:43:24 WARN HttpChannel: /
spark_1 | java.lang.IllegalStateException: No SessionManager{code}
 

Configuration:

 

 
{code:java}
spark.ui.filters=org.keycloak.adapters.servlet.KeycloakOIDCFilter
spark.acls.enable=true
spark.admin.acls=*
spark.ui.view.acls=*
spark.org.keycloak.adapters.servlet.KeycloakOIDCFilter.param.keycloak.config.file=/opt/bitnami/spark/conf/spark-keycloak.json
 
{code}
 

This exception emanates from Jetty:

 

[https://github.com/eclipse/jetty.project/blob/ae5c8e34e7dd4f5cce5f649e48469ba3bbc51d91/jetty-server/src/main/java/org/eclipse/jetty/server/Request.java#L1524]

 

It appears that Spark's `ServletContextHandler` has the ability to use a 
`SessionManager` but doesn't. This seems to be a blocker that prevents 
integration with Keycloak entirely.


> Spark does not use SessionManager
> -
>
> Key: SPARK-38693
> URL: https://issues.apache.org/jira/browse/SPARK-38693
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.1
>Reporter: Brad Solomon
>Priority: Blocker
>
> Spark's failure to use a `SessionManager` causes 
> `java.lang.IllegalStateException: No SessionManager` that prevents Spark UI 
> from being used with 
> [org.keycloak.adapters.servlet.KeycloakOIDCFilter|[https://www.keycloak.org/docs/latest/securing_apps/#_servlet_filter_adapter]]
>  as the `spark.ui.filters` class.
>  
> Sample logs:
>  
> {code:java}
> spark_1 | 22/03/29 18:43:24 INFO KeycloakDeployment: Loaded URLs from 
> http://REDACTED/auth/realms/master/.well-known/openid-configuration
> spark_1 | 22/03/29 18:43:24 WARN HttpChannel: /
> spark_1 | java.lang.IllegalStateException: No SessionManager{code}
>  
> Configuration:
>  
>  
> {code:java}
> spark.ui.filters=org.keycloak.adapters.servlet.KeycloakOIDCFilter
> spark.acls.enable=true
> spark.admin.acls=*
> spark.ui.view.acls=*
> spark.org.keycloak.adapters.servlet.KeycloakOIDCFilter.param.keycloak.config.file=/opt/bitnami/spark/conf/spark-keycloak.json
>  
> {code}
>  
> This exception emanates from Jetty:
>  
> [https://github.com/eclipse/jetty.project/blob/ae5c8e34e7dd4f5cce5f649e48469ba3bbc51d91/jetty-server/src/main/java/org/eclipse/jetty/server/Request.java#L1524]
>  
> It appears that Spark's `ServletContextHandler` has the ability to use a 
> `SessionManager` but doesn't. This seems to be a blocker that prevents 
> integration with Keycloak entirely.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-38693) Spark does not use SessionManager

2022-03-29 Thread Brad Solomon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brad Solomon updated SPARK-38693:
-
Description: 
Spark's failure to use a `SessionManager` causes 
`java.lang.IllegalStateException: No SessionManager` that prevents Spark UI 
from being used with 
[org.keycloak.adapters.servlet.KeycloakOIDCFilter|#_servlet_filter_adapter]] as 
the `spark.ui.filters` class.

 

Sample logs:

 
{code:java}
spark_1 | 22/03/29 18:43:24 INFO KeycloakDeployment: Loaded URLs from 
http://REDACTED/auth/realms/master/.well-known/openid-configuration
spark_1 | 22/03/29 18:43:24 WARN HttpChannel: /
spark_1 | java.lang.IllegalStateException: No SessionManager{code}
 

Configuration:

 

 
{code:java}
spark.ui.filters=org.keycloak.adapters.servlet.KeycloakOIDCFilter
spark.acls.enable=true
spark.admin.acls=*
spark.ui.view.acls=*
spark.org.keycloak.adapters.servlet.KeycloakOIDCFilter.param.keycloak.config.file=/opt/bitnami/spark/conf/spark-keycloak.json
 
{code}
 

Above `spark-keycloak.json` contains configuration generated in the Keycloak 
admin console. We can see that Spark gets as far as allowing the 
KeycloakOIDCFilter class to read this file and initiate communication with 
keycloak.

 

This IllegalStateException exception emanates from Jetty:

 

[https://github.com/eclipse/jetty.project/blob/ae5c8e34e7dd4f5cce5f649e48469ba3bbc51d91/jetty-server/src/main/java/org/eclipse/jetty/server/Request.java#L1524]

 

It appears that Spark's `ServletContextHandler` has the ability to use a 
`SessionManager` but doesn't. This seems to be a blocker that prevents 
integration with Keycloak entirely.

  was:
Spark's failure to use a `SessionManager` causes 
`java.lang.IllegalStateException: No SessionManager` that prevents Spark UI 
from being used with 
[org.keycloak.adapters.servlet.KeycloakOIDCFilter|[https://www.keycloak.org/docs/latest/securing_apps/#_servlet_filter_adapter]]
 as the `spark.ui.filters` class.

 

Sample logs:

 
{code:java}
spark_1 | 22/03/29 18:43:24 INFO KeycloakDeployment: Loaded URLs from 
http://REDACTED/auth/realms/master/.well-known/openid-configuration
spark_1 | 22/03/29 18:43:24 WARN HttpChannel: /
spark_1 | java.lang.IllegalStateException: No SessionManager{code}
 

Configuration:

 

 
{code:java}
spark.ui.filters=org.keycloak.adapters.servlet.KeycloakOIDCFilter
spark.acls.enable=true
spark.admin.acls=*
spark.ui.view.acls=*
spark.org.keycloak.adapters.servlet.KeycloakOIDCFilter.param.keycloak.config.file=/opt/bitnami/spark/conf/spark-keycloak.json
 
{code}
 

This exception emanates from Jetty:

 

[https://github.com/eclipse/jetty.project/blob/ae5c8e34e7dd4f5cce5f649e48469ba3bbc51d91/jetty-server/src/main/java/org/eclipse/jetty/server/Request.java#L1524]

 

It appears that Spark's `ServletContextHandler` has the ability to use a 
`SessionManager` but doesn't. This seems to be a blocker that prevents 
integration with Keycloak entirely.


> Spark does not use SessionManager
> -
>
> Key: SPARK-38693
> URL: https://issues.apache.org/jira/browse/SPARK-38693
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.1
>Reporter: Brad Solomon
>Priority: Blocker
>
> Spark's failure to use a `SessionManager` causes 
> `java.lang.IllegalStateException: No SessionManager` that prevents Spark UI 
> from being used with 
> [org.keycloak.adapters.servlet.KeycloakOIDCFilter|#_servlet_filter_adapter]] 
> as the `spark.ui.filters` class.
>  
> Sample logs:
>  
> {code:java}
> spark_1 | 22/03/29 18:43:24 INFO KeycloakDeployment: Loaded URLs from 
> http://REDACTED/auth/realms/master/.well-known/openid-configuration
> spark_1 | 22/03/29 18:43:24 WARN HttpChannel: /
> spark_1 | java.lang.IllegalStateException: No SessionManager{code}
>  
> Configuration:
>  
>  
> {code:java}
> spark.ui.filters=org.keycloak.adapters.servlet.KeycloakOIDCFilter
> spark.acls.enable=true
> spark.admin.acls=*
> spark.ui.view.acls=*
> spark.org.keycloak.adapters.servlet.KeycloakOIDCFilter.param.keycloak.config.file=/opt/bitnami/spark/conf/spark-keycloak.json
>  
> {code}
>  
> Above `spark-keycloak.json` contains configuration generated in the Keycloak 
> admin console. We can see that Spark gets as far as allowing the 
> KeycloakOIDCFilter class to read this file and initiate communication with 
> keycloak.
>  
> This IllegalStateException exception emanates from Jetty:
>  
> [https://github.com/eclipse/jetty.project/blob/ae5c8e34e7dd4f5cce5f649e48469ba3bbc51d91/jetty-server/src/main/java/org/eclipse/jetty/server/Request.java#L1524]
>  
> It appears that Spark's `ServletContextHandler` has the ability to use a 
> `SessionManager` but doesn't. This seems to be a blocker that prevents 
> integration with Keycloak entirely.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---

[jira] [Updated] (SPARK-35803) Spark SQL does not support creating views using DataSource v2 based data sources

2022-03-29 Thread David Rabinowitz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Rabinowitz updated SPARK-35803:
-
Issue Type: Bug  (was: New Feature)

> Spark SQL does not support creating views using DataSource v2 based data 
> sources
> 
>
> Key: SPARK-35803
> URL: https://issues.apache.org/jira/browse/SPARK-35803
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.1.2
>Reporter: David Rabinowitz
>Assignee: Pablo Langa Blanco
>Priority: Major
> Fix For: 3.3.0
>
>
> When a temporary view is created in Spark SQL using an external data source, 
> Spark then tries to create the relevant relation using 
> DataSource.resolveRelation() method. Unlike DataFrameReader.load(), 
> resolveRelation() does not check if the provided DataSource implements the 
> DataSourceV2 interface and instead tries to use the RelationProvider trait in 
> order to generate the Relation.
> Furthermore, DataSourceV2Relation is not a subclass of BaseRelation, so it 
> cannot be used in resolveRelation().
> Last, I tried to implement the RelationProvider trait in my Java 
> implementation of DataSourceV2, but the match inside resolveRelation() did 
> not detect it as RelationProvider.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38652) K8S IT Test DepsTestsSuite blocks with PathIOException in hadoop-aws-3.3.2

2022-03-29 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514343#comment-17514343
 ] 

Dongjoon Hyun commented on SPARK-38652:
---

Any update, [~dcoliversun]?

> K8S IT Test DepsTestsSuite blocks with PathIOException in hadoop-aws-3.3.2
> --
>
> Key: SPARK-38652
> URL: https://issues.apache.org/jira/browse/SPARK-38652
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.3.0
>Reporter: qian
>Priority: Major
>
> DepsTestsSuite in k8s IT test is blocked with PathIOException in 
> hadoop-aws-3.3.2. Exception Message is as follow
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: Uploading file 
> /Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar
>  failed...
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:332)
> 
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:277)
> 
> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) 
>
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)   
>  
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)  
>   
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
> at scala.collection.TraversableLike.map(TraversableLike.scala:286)
> at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
> at scala.collection.AbstractTraversable.map(Traversable.scala:108)
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:275)
> 
> at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:187)
>
> at scala.collection.immutable.List.foreach(List.scala:431)
> at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:178)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$5(KubernetesDriverBuilder.scala:86)
> at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
> 
> at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)   
>  
> at scala.collection.immutable.List.foldLeft(List.scala:91)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:84)
> 
> at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:104)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:248)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:242)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2738)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:242)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:214)
> 
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
> 
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) 
>
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)  
>   
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: 
> org.apache.spark.SparkException: Error uploading file 
> spark-examples_2.12-3.4.0-SNAPSHOT.jar
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:355)
> 
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:328)
> 
> ... 30 more
> Caused by: org.apache.hadoop.fs.PathIOException: `Cannot get relative path 
> for 
> URI:file:///Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar':
>  Input/output error
> at 
> org.apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.getFinalPath(CopyFromLocalOperation.java:365)
> 
> at 
> org.apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.uploadSourceFromFS(CopyFromLocalOpera

[jira] [Commented] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

2022-03-29 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514364#comment-17514364
 ] 

Dongjoon Hyun commented on SPARK-33349:
---

Hi, All. Could you try to use the latest one because this area is moving 
rapidly.
- The latest Apache Spark is 3.2.1 with `kubernetes-client 5.4.1`.
- In addition, Apache Spark 3.3.0 is currently under testing to build release 
candidate with `Kubernetes-client 5.12.1`.

> ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
> --
>
> Key: SPARK-33349
> URL: https://issues.apache.org/jira/browse/SPARK-33349
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.1, 3.0.2, 3.1.0
>Reporter: Nicola Bova
>Priority: Critical
>
> I launch my spark application with the 
> [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator]
>  with the following yaml file:
> {code:yaml}
> apiVersion: sparkoperator.k8s.io/v1beta2
> kind: SparkApplication
> metadata:
>    name: spark-kafka-streamer-test
>    namespace: kafka2hdfs
> spec: 
>    type: Scala
>    mode: cluster
>    image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
>    imagePullPolicy: Always
>    timeToLiveSeconds: 259200
>    mainClass: path.to.my.class.KafkaStreamer
>    mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
>    sparkVersion: 3.0.1
>    restartPolicy:
>  type: Always
>    sparkConf:
>  "spark.kafka.consumer.cache.capacity": "8192"
>  "spark.kubernetes.memoryOverheadFactor": "0.3"
>    deps:
>    jars:
>  - my
>  - jar
>  - list
>    hadoopConfigMap: hdfs-config
>    driver:
>  cores: 4
>  memory: 12g
>  labels:
>    version: 3.0.1
>  serviceAccount: default
>  javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
>   executor:
>  instances: 4
>     cores: 4
>     memory: 16g
>     labels:
>   version: 3.0.1
>     javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
> {code}
>  I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart 
> the watcher when we receive a version changed from 
> k8s"|https://github.com/apache/spark/pull/29533] patch.
> This is the driver log:
> {code}
> 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> ... // my app log, it's a structured streaming app reading from kafka and 
> writing to hdfs
> 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed (this is expected if the application is shutting down.)
> io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
> version: 1574101276 (1574213896)
>  at 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
>  at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
>  at 
> okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
>  at 
> okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
>  at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
>  at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
>  at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
>  at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>  at java.base/java.lang.Thread.run(Unknown Source)
> {code}
> The error above appears after roughly 50 minutes.
> After the exception above, no more logs are produced and the app hangs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38652) K8S IT Test DepsTestsSuite blocks with PathIOException in hadoop-aws-3.3.2

2022-03-29 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514379#comment-17514379
 ] 

Dongjoon Hyun commented on SPARK-38652:
---

BTW, [~dcoliversun]. K8s IT itself doesn't fail in both Apache Spark `master` 
branch and `branch-3.3` in my environment. Do you mean the test case fails when 
you do `spark-submit`?
{code}
$ build/sbt -Psparkr -Pkubernetes -Pvolcano -Pkubernetes-integration-tests 
-Dtest.exclude.tags=minikube 
-Dspark.kubernetes.test.deployMode=docker-for-desktop 
"kubernetes-integration-tests/test"
...
[info] KubernetesSuite:
[info] - Run SparkPi with no resources (8 seconds, 527 milliseconds)
[info] - Run SparkPi with no resources & statefulset allocation (8 seconds, 323 
milliseconds)
[info] - Run SparkPi with a very long application name. (8 seconds, 386 
milliseconds)
[info] - Use SparkLauncher.NO_RESOURCE (8 seconds, 425 milliseconds)
[info] - Run SparkPi with a master URL without a scheme. (8 seconds, 385 
milliseconds)
[info] - Run SparkPi with an argument. (8 seconds, 328 milliseconds)
[info] - Run SparkPi with custom labels, annotations, and environment 
variables. (8 seconds, 384 milliseconds)
[info] - All pods have the same service account by default (8 seconds, 342 
milliseconds)
[info] - Run extraJVMOptions check on driver (4 seconds, 327 milliseconds)
[info] - Run SparkRemoteFileTest using a remote data file (8 seconds, 429 
milliseconds)
...
{code}

> K8S IT Test DepsTestsSuite blocks with PathIOException in hadoop-aws-3.3.2
> --
>
> Key: SPARK-38652
> URL: https://issues.apache.org/jira/browse/SPARK-38652
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.3.0
>Reporter: qian
>Priority: Major
>
> DepsTestsSuite in k8s IT test is blocked with PathIOException in 
> hadoop-aws-3.3.2. Exception Message is as follow
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: Uploading file 
> /Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar
>  failed...
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:332)
> 
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:277)
> 
> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) 
>
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)   
>  
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)  
>   
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
> at scala.collection.TraversableLike.map(TraversableLike.scala:286)
> at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
> at scala.collection.AbstractTraversable.map(Traversable.scala:108)
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:275)
> 
> at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:187)
>
> at scala.collection.immutable.List.foreach(List.scala:431)
> at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:178)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$5(KubernetesDriverBuilder.scala:86)
> at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
> 
> at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)   
>  
> at scala.collection.immutable.List.foldLeft(List.scala:91)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:84)
> 
> at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:104)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:248)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:242)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2738)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:242)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:214)
> 
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
> 
> at org.apache.spark.deploy.SparkSub

[jira] [Commented] (SPARK-38652) K8S IT Test DepsTestsSuite blocks with PathIOException in hadoop-aws-3.3.2

2022-03-29 Thread qian (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514390#comment-17514390
 ] 

qian commented on SPARK-38652:
--

[~dongjoon] Hi. DepsTestsSuite has tests as follow
 * Launcher client dependencies
 * SPARK-33615: Launcher client archives
 * SPARK-33748: Launcher python client respecting PYSPARK_PYTHON
 * ...

spark-submit command is used by these tests. So, I think DepsTestsSuite blocks.

Could you please check these tests run? Maybe `-Dtest.exclude.tags` option 
doesn't need `minikube` value.

> K8S IT Test DepsTestsSuite blocks with PathIOException in hadoop-aws-3.3.2
> --
>
> Key: SPARK-38652
> URL: https://issues.apache.org/jira/browse/SPARK-38652
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.3.0
>Reporter: qian
>Priority: Major
>
> DepsTestsSuite in k8s IT test is blocked with PathIOException in 
> hadoop-aws-3.3.2. Exception Message is as follow
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: Uploading file 
> /Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar
>  failed...
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:332)
> 
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:277)
> 
> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) 
>
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)   
>  
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)  
>   
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
> at scala.collection.TraversableLike.map(TraversableLike.scala:286)
> at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
> at scala.collection.AbstractTraversable.map(Traversable.scala:108)
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:275)
> 
> at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:187)
>
> at scala.collection.immutable.List.foreach(List.scala:431)
> at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:178)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$5(KubernetesDriverBuilder.scala:86)
> at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
> 
> at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)   
>  
> at scala.collection.immutable.List.foldLeft(List.scala:91)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:84)
> 
> at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:104)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:248)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:242)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2738)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:242)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:214)
> 
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
> 
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) 
>
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)  
>   
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: 
> org.apache.spark.SparkException: Error uploading file 
> spark-examples_2.12-3.4.0-SNAPSHOT.jar
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:355)
> 
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:328)
> 
> ... 30 more
> Caused by: org.apache.hadoop.fs.PathIOException: `Cannot get relative path 
> for 
> URI:file

[jira] [Commented] (SPARK-38652) K8S IT Test DepsTestsSuite blocks with PathIOException in hadoop-aws-3.3.2

2022-03-29 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514398#comment-17514398
 ] 

Dongjoon Hyun commented on SPARK-38652:
---

Got it, [~dcoliversun].

> K8S IT Test DepsTestsSuite blocks with PathIOException in hadoop-aws-3.3.2
> --
>
> Key: SPARK-38652
> URL: https://issues.apache.org/jira/browse/SPARK-38652
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.3.0
>Reporter: qian
>Priority: Major
>
> DepsTestsSuite in k8s IT test is blocked with PathIOException in 
> hadoop-aws-3.3.2. Exception Message is as follow
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: Uploading file 
> /Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar
>  failed...
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:332)
> 
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:277)
> 
> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) 
>
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)   
>  
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)  
>   
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
> at scala.collection.TraversableLike.map(TraversableLike.scala:286)
> at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
> at scala.collection.AbstractTraversable.map(Traversable.scala:108)
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:275)
> 
> at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:187)
>
> at scala.collection.immutable.List.foreach(List.scala:431)
> at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:178)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$5(KubernetesDriverBuilder.scala:86)
> at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
> 
> at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)   
>  
> at scala.collection.immutable.List.foldLeft(List.scala:91)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:84)
> 
> at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:104)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:248)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:242)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2738)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:242)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:214)
> 
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
> 
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) 
>
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)  
>   
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: 
> org.apache.spark.SparkException: Error uploading file 
> spark-examples_2.12-3.4.0-SNAPSHOT.jar
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:355)
> 
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:328)
> 
> ... 30 more
> Caused by: org.apache.hadoop.fs.PathIOException: `Cannot get relative path 
> for 
> URI:file:///Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar':
>  Input/output error
> at 
> org.apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.getFinalPath(CopyFromLocalOperation.java:365)
> 
> at 
> org.apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.uploadSourceFromFS(CopyFromLocalOperation

[jira] [Updated] (SPARK-38320) (flat)MapGroupsWithState can timeout groups which just received inputs in the same microbatch

2022-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38320:
--
Labels: correctness  (was: )

> (flat)MapGroupsWithState can timeout groups which just received inputs in the 
> same microbatch
> -
>
> Key: SPARK-38320
> URL: https://issues.apache.org/jira/browse/SPARK-38320
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.1
>Reporter: Alex Balikov
>Assignee: Alex Balikov
>Priority: Major
>  Labels: correctness
> Fix For: 3.3.0, 3.2.2
>
>
> We have identified an issue where the RocksDB state store iterator will not 
> pick up store updates made after its creation. As a result of this, the 
> _timeoutProcessorIter_ in
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala]
> will not pick up state changes made during _newDataProcessorIter_ input 
> processing. The user observed behavior is that a group state may receive 
> input records and also be called with timeout in the same micro batch. This 
> contradics the public documentation for GroupState -
> [https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/streaming/GroupState.html]
>  * The timeout is reset every time the function is called on a group, that 
> is, when the group has new data, or the group has timed out. So the user has 
> to set the timeout duration every time the function is called, otherwise, 
> there will not be any timeout set.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38694) Simplify Java UT code with Junit `assertThrows`

2022-03-29 Thread Yang Jie (Jira)

Yang Jie created SPARK-38694:


 Summary: Simplify Java UT code with Junit `assertThrows`
 Key: SPARK-38694
 URL: https://issues.apache.org/jira/browse/SPARK-38694
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 3.4.0
Reporter: Yang Jie


There are some code patterns in Java UTs:
{code:java}
 @Test
  public void testAuthReplay() throws Exception {
    try {
      doSomeOperation();
      fail("Should have failed");
    } catch (Exception e) {
      assertTrue(checkException(e));
    }
  }
{code}
or 

 
{code:java}
@Test(expected = SomeException.class)
  public void testAuthReplay() throws Exception {
    try {
      doSomeOperation();
      fail("Should have failed");
    } catch (Exception e) {
      assertTrue(checkException(e));
      throw e;
    }
  } {code}
we can use Junit assertThrows to simplify the similar patterns

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38694) Simplify Java UT code with Junit `assertThrows`

2022-03-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514412#comment-17514412
 ] 

Apache Spark commented on SPARK-38694:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36008

> Simplify Java UT code with Junit `assertThrows`
> ---
>
> Key: SPARK-38694
> URL: https://issues.apache.org/jira/browse/SPARK-38694
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> There are some code patterns in Java UTs:
> {code:java}
>  @Test
>   public void testAuthReplay() throws Exception {
>     try {
>       doSomeOperation();
>       fail("Should have failed");
>     } catch (Exception e) {
>       assertTrue(checkException(e));
>     }
>   }
> {code}
> or 
>  
> {code:java}
> @Test(expected = SomeException.class)
>   public void testAuthReplay() throws Exception {
>     try {
>       doSomeOperation();
>       fail("Should have failed");
>     } catch (Exception e) {
>       assertTrue(checkException(e));
>       throw e;
>     }
>   } {code}
> we can use Junit assertThrows to simplify the similar patterns
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38694) Simplify Java UT code with Junit `assertThrows`

2022-03-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38694:


Assignee: Apache Spark

> Simplify Java UT code with Junit `assertThrows`
> ---
>
> Key: SPARK-38694
> URL: https://issues.apache.org/jira/browse/SPARK-38694
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> There are some code patterns in Java UTs:
> {code:java}
>  @Test
>   public void testAuthReplay() throws Exception {
>     try {
>       doSomeOperation();
>       fail("Should have failed");
>     } catch (Exception e) {
>       assertTrue(checkException(e));
>     }
>   }
> {code}
> or 
>  
> {code:java}
> @Test(expected = SomeException.class)
>   public void testAuthReplay() throws Exception {
>     try {
>       doSomeOperation();
>       fail("Should have failed");
>     } catch (Exception e) {
>       assertTrue(checkException(e));
>       throw e;
>     }
>   } {code}
> we can use Junit assertThrows to simplify the similar patterns
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38694) Simplify Java UT code with Junit `assertThrows`

2022-03-29 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514413#comment-17514413
 ] 

Apache Spark commented on SPARK-38694:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36008

> Simplify Java UT code with Junit `assertThrows`
> ---
>
> Key: SPARK-38694
> URL: https://issues.apache.org/jira/browse/SPARK-38694
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> There are some code patterns in Java UTs:
> {code:java}
>  @Test
>   public void testAuthReplay() throws Exception {
>     try {
>       doSomeOperation();
>       fail("Should have failed");
>     } catch (Exception e) {
>       assertTrue(checkException(e));
>     }
>   }
> {code}
> or 
>  
> {code:java}
> @Test(expected = SomeException.class)
>   public void testAuthReplay() throws Exception {
>     try {
>       doSomeOperation();
>       fail("Should have failed");
>     } catch (Exception e) {
>       assertTrue(checkException(e));
>       throw e;
>     }
>   } {code}
> we can use Junit assertThrows to simplify the similar patterns
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38694) Simplify Java UT code with Junit `assertThrows`

2022-03-29 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38694:


Assignee: (was: Apache Spark)

> Simplify Java UT code with Junit `assertThrows`
> ---
>
> Key: SPARK-38694
> URL: https://issues.apache.org/jira/browse/SPARK-38694
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> There are some code patterns in Java UTs:
> {code:java}
>  @Test
>   public void testAuthReplay() throws Exception {
>     try {
>       doSomeOperation();
>       fail("Should have failed");
>     } catch (Exception e) {
>       assertTrue(checkException(e));
>     }
>   }
> {code}
> or 
>  
> {code:java}
> @Test(expected = SomeException.class)
>   public void testAuthReplay() throws Exception {
>     try {
>       doSomeOperation();
>       fail("Should have failed");
>     } catch (Exception e) {
>       assertTrue(checkException(e));
>       throw e;
>     }
>   } {code}
> we can use Junit assertThrows to simplify the similar patterns
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-38349) No need to filter events when sessionwindow gapDuration greater than 0

2022-03-29 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-38349:


Assignee: nyingping

> No need to filter events when sessionwindow gapDuration greater than 0
> --
>
> Key: SPARK-38349
> URL: https://issues.apache.org/jira/browse/SPARK-38349
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.2.1
>Reporter: nyingping
>Assignee: nyingping
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-38349) No need to filter events when sessionwindow gapDuration greater than 0

2022-03-29 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-38349.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 35680
[https://github.com/apache/spark/pull/35680]

> No need to filter events when sessionwindow gapDuration greater than 0
> --
>
> Key: SPARK-38349
> URL: https://issues.apache.org/jira/browse/SPARK-38349
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.2.1
>Reporter: nyingping
>Assignee: nyingping
>Priority: Trivial
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38652) K8S IT Test DepsTestsSuite blocks with PathIOException in hadoop-aws-3.3.2

2022-03-29 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514431#comment-17514431
 ] 

Dongjoon Hyun commented on SPARK-38652:
---

I also confirmed this regression and raise this issue as a blocker. Thank you, 
[~dcoliversun].

> K8S IT Test DepsTestsSuite blocks with PathIOException in hadoop-aws-3.3.2
> --
>
> Key: SPARK-38652
> URL: https://issues.apache.org/jira/browse/SPARK-38652
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.3.0
>Reporter: qian
>Priority: Major
>
> DepsTestsSuite in k8s IT test is blocked with PathIOException in 
> hadoop-aws-3.3.2. Exception Message is as follow
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: Uploading file 
> /Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar
>  failed...
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:332)
> 
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:277)
> 
> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) 
>
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)   
>  
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)  
>   
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
> at scala.collection.TraversableLike.map(TraversableLike.scala:286)
> at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
> at scala.collection.AbstractTraversable.map(Traversable.scala:108)
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:275)
> 
> at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:187)
>
> at scala.collection.immutable.List.foreach(List.scala:431)
> at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:178)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$5(KubernetesDriverBuilder.scala:86)
> at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
> 
> at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)   
>  
> at scala.collection.immutable.List.foldLeft(List.scala:91)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:84)
> 
> at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:104)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:248)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:242)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2738)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:242)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:214)
> 
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
> 
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) 
>
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)  
>   
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: 
> org.apache.spark.SparkException: Error uploading file 
> spark-examples_2.12-3.4.0-SNAPSHOT.jar
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:355)
> 
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:328)
> 
> ... 30 more
> Caused by: org.apache.hadoop.fs.PathIOException: `Cannot get relative path 
> for 
> URI:file:///Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar':
>  Input/output error
> at 
> org.apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.getFinalPath(CopyFromLocalOperation.java:365)
> 
> at 
> org.apache.hadoop.fs.s

[jira] [Updated] (SPARK-38652) K8S IT Test DepsTestsSuite blocks with PathIOException in hadoop-aws-3.3.2

2022-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38652:
--
Priority: Blocker  (was: Major)

> K8S IT Test DepsTestsSuite blocks with PathIOException in hadoop-aws-3.3.2
> --
>
> Key: SPARK-38652
> URL: https://issues.apache.org/jira/browse/SPARK-38652
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.3.0
>Reporter: qian
>Priority: Blocker
>
> DepsTestsSuite in k8s IT test is blocked with PathIOException in 
> hadoop-aws-3.3.2. Exception Message is as follow
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: Uploading file 
> /Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar
>  failed...
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:332)
> 
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:277)
> 
> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) 
>
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)   
>  
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)  
>   
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
> at scala.collection.TraversableLike.map(TraversableLike.scala:286)
> at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
> at scala.collection.AbstractTraversable.map(Traversable.scala:108)
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:275)
> 
> at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:187)
>
> at scala.collection.immutable.List.foreach(List.scala:431)
> at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:178)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$5(KubernetesDriverBuilder.scala:86)
> at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
> 
> at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)   
>  
> at scala.collection.immutable.List.foldLeft(List.scala:91)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:84)
> 
> at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:104)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:248)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:242)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2738)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:242)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:214)
> 
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
> 
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) 
>
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)  
>   
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: 
> org.apache.spark.SparkException: Error uploading file 
> spark-examples_2.12-3.4.0-SNAPSHOT.jar
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:355)
> 
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:328)
> 
> ... 30 more
> Caused by: org.apache.hadoop.fs.PathIOException: `Cannot get relative path 
> for 
> URI:file:///Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar':
>  Input/output error
> at 
> org.apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.getFinalPath(CopyFromLocalOperation.java:365)
> 
> at 
> org.apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.uploadSourceFromFS(CopyFromLocalOperation.java:226)
> 
> at 
> org.apache.had

[jira] [Updated] (SPARK-38652) K8S IT Test DepsTestsSuite blocks with PathIOException in hadoop-aws-3.3.2

2022-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38652:
--
Component/s: (was: Tests)

> K8S IT Test DepsTestsSuite blocks with PathIOException in hadoop-aws-3.3.2
> --
>
> Key: SPARK-38652
> URL: https://issues.apache.org/jira/browse/SPARK-38652
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: qian
>Priority: Blocker
>
> DepsTestsSuite in k8s IT test is blocked with PathIOException in 
> hadoop-aws-3.3.2. Exception Message is as follow
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: Uploading file 
> /Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar
>  failed...
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:332)
> 
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:277)
> 
> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) 
>
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)   
>  
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)  
>   
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
> at scala.collection.TraversableLike.map(TraversableLike.scala:286)
> at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
> at scala.collection.AbstractTraversable.map(Traversable.scala:108)
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:275)
> 
> at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:187)
>
> at scala.collection.immutable.List.foreach(List.scala:431)
> at 
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:178)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$5(KubernetesDriverBuilder.scala:86)
> at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
> 
> at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)   
>  
> at scala.collection.immutable.List.foldLeft(List.scala:91)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:84)
> 
> at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:104)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:248)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:242)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2738)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:242)
> 
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:214)
> 
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
> 
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) 
>
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)  
>   
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: 
> org.apache.spark.SparkException: Error uploading file 
> spark-examples_2.12-3.4.0-SNAPSHOT.jar
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:355)
> 
> at 
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:328)
> 
> ... 30 more
> Caused by: org.apache.hadoop.fs.PathIOException: `Cannot get relative path 
> for 
> URI:file:///Users/hengzhen.sq/IdeaProjects/spark/dist/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar':
>  Input/output error
> at 
> org.apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.getFinalPath(CopyFromLocalOperation.java:365)
> 
> at 
> org.apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.uploadSourceFromFS(CopyFromLocalOperation.java:226)
> 
> at 
> org.apache.hadoop.fs.s3

[jira] [Resolved] (SPARK-38605) Retrying on file manager operation in HDFSMetadataLog

2022-03-29 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-38605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-38605.
-
Resolution: Won't Fix

> Retrying on file manager operation in HDFSMetadataLog
> -
>
> Key: SPARK-38605
> URL: https://issues.apache.org/jira/browse/SPARK-38605
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: L. C. Hsieh
>Priority: Major
>
> Currently HDFSMetadataLog uses CheckpointFileManager to do some file 
> operation like opening metadata file. It is very easy to be affected by 
> network blips and causes the streaming query failed. Although we can restart 
> the streaming query, but it takes more time to recover.
> Such file operations should be resilient with such situation by retrying.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

78 matches

Mail list logo