date:20201104

[jira] [Assigned] (SPARK-33354) New explicit cast syntax rules in ANSI mode

2020-11-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33354:


Assignee: Apache Spark  (was: Gengliang Wang)

> New explicit cast syntax rules in ANSI mode
> ---
>
> Key: SPARK-33354
> URL: https://issues.apache.org/jira/browse/SPARK-33354
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> In section 6.13 of the ANSI SQL standard,  there are syntax rules for valid 
> combinations of the source and target data types.
> To make Spark's ANSI mode more ANSI SQL Compatible，I propose to disallow the 
> following casting in ANSI mode:
> {code:java}
> TimeStamp <=> Boolean
> Date <=> Boolean
> Numeric <=> Timestamp
> Numeric <=> Date
> Numeric <=> Binary
> String <=> Array
> String <=> Map
> String <=> Struct
> {code}
> The following castings are considered invalid in ANSI SQL standard, but they 
> are quite straight forward. Let's Allow them for now
> {code:java}
> Numeric <=> Boolean
> String <=> Boolean
> String <=> Binary
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33354) New explicit cast syntax rules in ANSI mode

2020-11-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226558#comment-17226558
 ] 

Apache Spark commented on SPARK-33354:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30260

> New explicit cast syntax rules in ANSI mode
> ---
>
> Key: SPARK-33354
> URL: https://issues.apache.org/jira/browse/SPARK-33354
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> In section 6.13 of the ANSI SQL standard,  there are syntax rules for valid 
> combinations of the source and target data types.
> To make Spark's ANSI mode more ANSI SQL Compatible，I propose to disallow the 
> following casting in ANSI mode:
> {code:java}
> TimeStamp <=> Boolean
> Date <=> Boolean
> Numeric <=> Timestamp
> Numeric <=> Date
> Numeric <=> Binary
> String <=> Array
> String <=> Map
> String <=> Struct
> {code}
> The following castings are considered invalid in ANSI SQL standard, but they 
> are quite straight forward. Let's Allow them for now
> {code:java}
> Numeric <=> Boolean
> String <=> Boolean
> String <=> Binary
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33354) New explicit cast syntax rules in ANSI mode

2020-11-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33354:


Assignee: Gengliang Wang  (was: Apache Spark)

> New explicit cast syntax rules in ANSI mode
> ---
>
> Key: SPARK-33354
> URL: https://issues.apache.org/jira/browse/SPARK-33354
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> In section 6.13 of the ANSI SQL standard,  there are syntax rules for valid 
> combinations of the source and target data types.
> To make Spark's ANSI mode more ANSI SQL Compatible，I propose to disallow the 
> following casting in ANSI mode:
> {code:java}
> TimeStamp <=> Boolean
> Date <=> Boolean
> Numeric <=> Timestamp
> Numeric <=> Date
> Numeric <=> Binary
> String <=> Array
> String <=> Map
> String <=> Struct
> {code}
> The following castings are considered invalid in ANSI SQL standard, but they 
> are quite straight forward. Let's Allow them for now
> {code:java}
> Numeric <=> Boolean
> String <=> Boolean
> String <=> Binary
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33354) New explicit cast syntax rules in ANSI mode

2020-11-04 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-33354:
--

 Summary: New explicit cast syntax rules in ANSI mode
 Key: SPARK-33354
 URL: https://issues.apache.org/jira/browse/SPARK-33354
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.1.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


In section 6.13 of the ANSI SQL standard,  there are syntax rules for valid 
combinations of the source and target data types.
To make Spark's ANSI mode more ANSI SQL Compatible，I propose to disallow the 
following casting in ANSI mode:

{code:java}
TimeStamp <=> Boolean
Date <=> Boolean
Numeric <=> Timestamp
Numeric <=> Date
Numeric <=> Binary
String <=> Array
String <=> Map
String <=> Struct
{code}


The following castings are considered invalid in ANSI SQL standard, but they 
are quite straight forward. Let's Allow them for now

{code:java}
Numeric <=> Boolean
String <=> Boolean
String <=> Binary
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33277) Python/Pandas UDF right after off-heap vectorized reader could cause executor crash.

2020-11-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33277:


Assignee: Apache Spark

> Python/Pandas UDF right after off-heap vectorized reader could cause executor 
> crash.
> 
>
> Key: SPARK-33277
> URL: https://issues.apache.org/jira/browse/SPARK-33277
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.7, 3.0.1
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>
> Python/Pandas UDF right after off-heap vectorized reader could cause executor 
> crash.
> E.g.,:
> {code:java}
> spark.range(0, 10, 1, 1).write.parquet(path)
> spark.conf.set("spark.sql.columnVector.offheap.enabled", True)
> def f(x):
> return 0
> fUdf = udf(f, LongType())
> spark.read.parquet(path).select(fUdf('id')).head()
> {code}
> This is because, the Python evaluation consumes the parent iterator in a 
> separate thread and it consumes more data from the parent even after the task 
> ends and the parent is closed. If an off-heap column vector exists in the 
> parent iterator, it could cause segmentation fault which crashes the executor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33277) Python/Pandas UDF right after off-heap vectorized reader could cause executor crash.

2020-11-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33277:


Assignee: (was: Apache Spark)

> Python/Pandas UDF right after off-heap vectorized reader could cause executor 
> crash.
> 
>
> Key: SPARK-33277
> URL: https://issues.apache.org/jira/browse/SPARK-33277
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.7, 3.0.1
>Reporter: Takuya Ueshin
>Priority: Major
>
> Python/Pandas UDF right after off-heap vectorized reader could cause executor 
> crash.
> E.g.,:
> {code:java}
> spark.range(0, 10, 1, 1).write.parquet(path)
> spark.conf.set("spark.sql.columnVector.offheap.enabled", True)
> def f(x):
> return 0
> fUdf = udf(f, LongType())
> spark.read.parquet(path).select(fUdf('id')).head()
> {code}
> This is because, the Python evaluation consumes the parent iterator in a 
> separate thread and it consumes more data from the parent even after the task 
> ends and the parent is closed. If an off-heap column vector exists in the 
> parent iterator, it could cause segmentation fault which crashes the executor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-33277) Python/Pandas UDF right after off-heap vectorized reader could cause executor crash.

2020-11-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-33277:
--
  Assignee: (was: Takuya Ueshin)

Reverted in:

master: 
https://github.com/apache/spark/commit/d530ed0ea8bdba09fba6dcd51f8e4f7745781c2e
branch-3.0: 
https://github.com/apache/spark/commit/74d8eacbe9cdc0b25a177543eb48ac54bd065cbb
branch-2.4: 
https://github.com/apache/spark/commit/c342bcd4c4ba68506ca6b459bd3a9c688d2aecfa

> Python/Pandas UDF right after off-heap vectorized reader could cause executor 
> crash.
> 
>
> Key: SPARK-33277
> URL: https://issues.apache.org/jira/browse/SPARK-33277
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.7, 3.0.1
>Reporter: Takuya Ueshin
>Priority: Major
>
> Python/Pandas UDF right after off-heap vectorized reader could cause executor 
> crash.
> E.g.,:
> {code:java}
> spark.range(0, 10, 1, 1).write.parquet(path)
> spark.conf.set("spark.sql.columnVector.offheap.enabled", True)
> def f(x):
> return 0
> fUdf = udf(f, LongType())
> spark.read.parquet(path).select(fUdf('id')).head()
> {code}
> This is because, the Python evaluation consumes the parent iterator in a 
> separate thread and it consumes more data from the parent even after the task 
> ends and the parent is closed. If an off-heap column vector exists in the 
> parent iterator, it could cause segmentation fault which crashes the executor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33277) Python/Pandas UDF right after off-heap vectorized reader could cause executor crash.

2020-11-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33277:
-
Fix Version/s: (was: 3.0.2)
   (was: 2.4.8)
   (was: 3.1.0)

> Python/Pandas UDF right after off-heap vectorized reader could cause executor 
> crash.
> 
>
> Key: SPARK-33277
> URL: https://issues.apache.org/jira/browse/SPARK-33277
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.7, 3.0.1
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
>
> Python/Pandas UDF right after off-heap vectorized reader could cause executor 
> crash.
> E.g.,:
> {code:java}
> spark.range(0, 10, 1, 1).write.parquet(path)
> spark.conf.set("spark.sql.columnVector.offheap.enabled", True)
> def f(x):
> return 0
> fUdf = udf(f, LongType())
> spark.read.parquet(path).select(fUdf('id')).head()
> {code}
> This is because, the Python evaluation consumes the parent iterator in a 
> separate thread and it consumes more data from the parent even after the task 
> ends and the parent is closed. If an off-heap column vector exists in the 
> parent iterator, it could cause segmentation fault which crashes the executor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33331) Limit the number of pending blocks in memory and store blocks that collide

2020-11-04 Thread wuyi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226531#comment-17226531
 ] 

wuyi edited comment on SPARK-1 at 11/5/20, 7:12 AM:


I like the idea to cache the blocks of the worst case instead of throwing it 
away as long as we have the memory threshold(either memory size or block 
number). And we can always fallback to the original way whenever we set the 
threshold to 0.

 

Another problem may be, when should the client retry the block after we have 
the memory cache? Shall we retry it immediately or wait for a few seconds 
regarding the number of deferred blocks?


was (Author: ngone51):
I like the idea to cache the blocks of the worst case instead of throwing it 
away as long as we have the memory threshold(either memory size or block 
number). And we can always fallback to the original way whenever they set the 
threshold to 0.

 

Another problem may be, when should the client retry the block after we have 
the memory cache? Shall we retry it immediately or wait for a few seconds 
regarding the number of deferred blocks?

> Limit the number of pending blocks in memory and store blocks that collide
> --
>
> Key: SPARK-1
> URL: https://issues.apache.org/jira/browse/SPARK-1
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>
> This jira addresses the below two points:
>  1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately 
> are stored in memory. The stream callback maintains a list of 
> {{deferredBufs}}. When a block cannot be merged it is added to this list. 
> Currently, there isn't a limit on the number of pending blocks. We can limit 
> the number of pending blocks in memory. There has been a discussion around 
> this here:
> [https://github.com/apache/spark/pull/30062#discussion_r514026014]
> 2. When a stream doesn't get an opportunity to merge, then 
> {{RemoteBlockPushResolver}} ignores the data from that stream. Another 
> approach is to store the data of the stream in {{AppShufflePartitionInfo}} 
> when it reaches the worst-case scenario. This may increase the memory usage 
> of the shuffle service though. However, given a limit introduced with 1 we 
> can try this out.
>  More information can be found in this discussion:
>  [https://github.com/apache/spark/pull/30062#discussion_r517524546]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33282) Replace Probot Autolabeler with Github Action

2020-11-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33282.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30244
[https://github.com/apache/spark/pull/30244]

> Replace Probot Autolabeler with Github Action
> -
>
> Key: SPARK-33282
> URL: https://issues.apache.org/jira/browse/SPARK-33282
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 3.0.1
>Reporter: Kyle Bendickson
>Assignee: Kyle Bendickson
>Priority: Major
> Fix For: 3.1.0
>
>
> The Probot Autolabeler that we were using in both the Iceberg and the Spark 
> repo is no longer working. I've confirmed that with the devleper, github user 
> [at]mithro, who has indicated that the Probot Autolabeler is end of life and 
> will not be maintained moving forward.
> PRs have not been labeled for a few weeks now.
>  
> As I'm already interfacing with ASF Infra to have the probot permissions 
> revoked from the Iceberg repo, and I've already submitted a patch to switch 
> Iceberg to the standard github labeler action, I figured I would go ahead and 
> volunteer myself to switch the Spark repo as well.
> I will have a patch to switch to the new github labeler open within a few 
> days.
>  
> Also thank you [~blue] (or [~holden]) for shepherding this! I didn't exactly 
> ask, but it was understood in our group meeting for Iceberg that I'd be 
> converting our labeler there so I figured I'd tackle the spark issue while 
> I'm getting my hands into the labeling configs anyway =)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33282) Replace Probot Autolabeler with Github Action

2020-11-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-33282:


Assignee: Kyle Bendickson

> Replace Probot Autolabeler with Github Action
> -
>
> Key: SPARK-33282
> URL: https://issues.apache.org/jira/browse/SPARK-33282
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 3.0.1
>Reporter: Kyle Bendickson
>Assignee: Kyle Bendickson
>Priority: Major
>
> The Probot Autolabeler that we were using in both the Iceberg and the Spark 
> repo is no longer working. I've confirmed that with the devleper, github user 
> [at]mithro, who has indicated that the Probot Autolabeler is end of life and 
> will not be maintained moving forward.
> PRs have not been labeled for a few weeks now.
>  
> As I'm already interfacing with ASF Infra to have the probot permissions 
> revoked from the Iceberg repo, and I've already submitted a patch to switch 
> Iceberg to the standard github labeler action, I figured I would go ahead and 
> volunteer myself to switch the Spark repo as well.
> I will have a patch to switch to the new github labeler open within a few 
> days.
>  
> Also thank you [~blue] (or [~holden]) for shepherding this! I didn't exactly 
> ask, but it was understood in our group meeting for Iceberg that I'd be 
> converting our labeler there so I figured I'd tackle the spark issue while 
> I'm getting my hands into the labeling configs anyway =)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33331) Limit the number of pending blocks in memory and store blocks that collide

2020-11-04 Thread wuyi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226531#comment-17226531
 ] 

wuyi edited comment on SPARK-1 at 11/5/20, 7:09 AM:


I like the idea to cache the blocks of the worst case instead of throwing it 
away as long as we have the memory threshold(either memory size or block 
number). And we can always fallback to the original way whenever they set the 
threshold to 0.

 

Another problem may be, when should the client retry the block after we have 
the memory cache? Shall we retry it immediately or wait for a few seconds 
regarding the number of deferred blocks?


was (Author: ngone51):
I like the idea to cache the blocks of the worst case instead of throwing it 
away as long as we have the memory threshold(either memory size or block 
number). And users actually can fallback to the original way whenever they set 
the threshold to 0.

 

Another problem may be, when should the client retry the block after we have 
the memory cache? Shall we retry it immediately or wait for a few seconds 
regarding the number of deferred blocks?

> Limit the number of pending blocks in memory and store blocks that collide
> --
>
> Key: SPARK-1
> URL: https://issues.apache.org/jira/browse/SPARK-1
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>
> This jira addresses the below two points:
>  1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately 
> are stored in memory. The stream callback maintains a list of 
> {{deferredBufs}}. When a block cannot be merged it is added to this list. 
> Currently, there isn't a limit on the number of pending blocks. We can limit 
> the number of pending blocks in memory. There has been a discussion around 
> this here:
> [https://github.com/apache/spark/pull/30062#discussion_r514026014]
> 2. When a stream doesn't get an opportunity to merge, then 
> {{RemoteBlockPushResolver}} ignores the data from that stream. Another 
> approach is to store the data of the stream in {{AppShufflePartitionInfo}} 
> when it reaches the worst-case scenario. This may increase the memory usage 
> of the shuffle service though. However, given a limit introduced with 1 we 
> can try this out.
>  More information can be found in this discussion:
>  [https://github.com/apache/spark/pull/30062#discussion_r517524546]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33331) Limit the number of pending blocks in memory and store blocks that collide

2020-11-04 Thread wuyi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226531#comment-17226531
 ] 

wuyi edited comment on SPARK-1 at 11/5/20, 6:59 AM:


I like the idea to cache the blocks of the worst case instead of throwing it 
away as long as we have the memory threshold(either memory size or block 
number). And users actually can fallback to the original way whenever they set 
the threshold to 0.

 

Another problem may be, when should the client retry the block after we have 
the memory cache? Shall we retry it immediately or wait for a few seconds 
regarding the number of deferred blocks?


was (Author: ngone51):
I like the idea to cache the blocks of the worst case instead of throwing it 
away as long as we have the memory threshold(either memory size or block 
number). And users actually can fallback to the original way whenever they set 
the threshold to 0.

 

Another problem may be, when should the client retry the block after we have 
the memory cache? Shall we retry it immediately or wait a little bit regarding 
the number of deferred blocks?

> Limit the number of pending blocks in memory and store blocks that collide
> --
>
> Key: SPARK-1
> URL: https://issues.apache.org/jira/browse/SPARK-1
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>
> This jira addresses the below two points:
>  1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately 
> are stored in memory. The stream callback maintains a list of 
> {{deferredBufs}}. When a block cannot be merged it is added to this list. 
> Currently, there isn't a limit on the number of pending blocks. We can limit 
> the number of pending blocks in memory. There has been a discussion around 
> this here:
> [https://github.com/apache/spark/pull/30062#discussion_r514026014]
> 2. When a stream doesn't get an opportunity to merge, then 
> {{RemoteBlockPushResolver}} ignores the data from that stream. Another 
> approach is to store the data of the stream in {{AppShufflePartitionInfo}} 
> when it reaches the worst-case scenario. This may increase the memory usage 
> of the shuffle service though. However, given a limit introduced with 1 we 
> can try this out.
>  More information can be found in this discussion:
>  [https://github.com/apache/spark/pull/30062#discussion_r517524546]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33331) Limit the number of pending blocks in memory and store blocks that collide

2020-11-04 Thread wuyi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226531#comment-17226531
 ] 

wuyi commented on SPARK-1:
--

I like the idea to cache the blocks of the worst case instead of throwing it 
away as long as we have the memory threshold(either memory size or block 
number). And users actually can fallback to the original way whenever they set 
the threshold to 0.

 

Another problem may be, when should the client retry the block after we have 
the memory cache? Shall we retry it immediately or wait a little bit regarding 
the number of deferred blocks?

> Limit the number of pending blocks in memory and store blocks that collide
> --
>
> Key: SPARK-1
> URL: https://issues.apache.org/jira/browse/SPARK-1
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>
> This jira addresses the below two points:
>  1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately 
> are stored in memory. The stream callback maintains a list of 
> {{deferredBufs}}. When a block cannot be merged it is added to this list. 
> Currently, there isn't a limit on the number of pending blocks. We can limit 
> the number of pending blocks in memory. There has been a discussion around 
> this here:
> [https://github.com/apache/spark/pull/30062#discussion_r514026014]
> 2. When a stream doesn't get an opportunity to merge, then 
> {{RemoteBlockPushResolver}} ignores the data from that stream. Another 
> approach is to store the data of the stream in {{AppShufflePartitionInfo}} 
> when it reaches the worst-case scenario. This may increase the memory usage 
> of the shuffle service though. However, given a limit introduced with 1 we 
> can try this out.
>  More information can be found in this discussion:
>  [https://github.com/apache/spark/pull/30062#discussion_r517524546]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33239) Use pre-built image at GitHub Action SparkR job

2020-11-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33239:
--
Fix Version/s: 3.0.2

> Use pre-built image at GitHub Action SparkR job
> ---
>
> Key: SPARK-33239
> URL: https://issues.apache.org/jira/browse/SPARK-33239
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.2, 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33353) Cache dependencies for Coursier with new sbt in GitHub Actions

2020-11-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226494#comment-17226494
 ] 

Apache Spark commented on SPARK-33353:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/30259

> Cache dependencies for Coursier with new sbt in GitHub Actions
> --
>
> Key: SPARK-33353
> URL: https://issues.apache.org/jira/browse/SPARK-33353
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> SPARK-33226 upgraded sbt to 1.4.1.
> As of 1.3.0, sbt uses Coursier as the dependency resolver / fetcher.
> So let's change the dependency cache configuration for the GitHub Actions job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33353) Cache dependencies for Coursier with new sbt in GitHub Actions

2020-11-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33353:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Cache dependencies for Coursier with new sbt in GitHub Actions
> --
>
> Key: SPARK-33353
> URL: https://issues.apache.org/jira/browse/SPARK-33353
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> SPARK-33226 upgraded sbt to 1.4.1.
> As of 1.3.0, sbt uses Coursier as the dependency resolver / fetcher.
> So let's change the dependency cache configuration for the GitHub Actions job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33353) Cache dependencies for Coursier with new sbt in GitHub Actions

2020-11-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33353:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Cache dependencies for Coursier with new sbt in GitHub Actions
> --
>
> Key: SPARK-33353
> URL: https://issues.apache.org/jira/browse/SPARK-33353
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Minor
>
> SPARK-33226 upgraded sbt to 1.4.1.
> As of 1.3.0, sbt uses Coursier as the dependency resolver / fetcher.
> So let's change the dependency cache configuration for the GitHub Actions job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33353) Cache dependencies for Coursier with new sbt in GitHub Actions

2020-11-04 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-33353:
--

 Summary: Cache dependencies for Coursier with new sbt in GitHub 
Actions
 Key: SPARK-33353
 URL: https://issues.apache.org/jira/browse/SPARK-33353
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.1.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


SPARK-33226 upgraded sbt to 1.4.1.
As of 1.3.0, sbt uses Coursier as the dependency resolver / fetcher.
So let's change the dependency cache configuration for the GitHub Actions job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33239) Use pre-built image at GitHub Action SparkR job

2020-11-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226490#comment-17226490
 ] 

Apache Spark commented on SPARK-33239:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30258

> Use pre-built image at GitHub Action SparkR job
> ---
>
> Key: SPARK-33239
> URL: https://issues.apache.org/jira/browse/SPARK-33239
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33316) Support nullable Avro schemas for non-nullable data in Avro writing

2020-11-04 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-33316:
--

Assignee: Bo Zhang

> Support nullable Avro schemas for non-nullable data in Avro writing
> ---
>
> Key: SPARK-33316
> URL: https://issues.apache.org/jira/browse/SPARK-33316
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0, 3.0.1
>Reporter: Bo Zhang
>Assignee: Bo Zhang
>Priority: Major
> Fix For: 3.1.0
>
>
> Currently when users try to use nullable Avro schemas for non-nullable data 
> in Avro writing, Spark will throw a IncompatibleSchemaException.
> There are some cases when users do not have full control over the nullability 
> of the data, or the nullability of the Avro schemas they have to use. We 
> should support nullable Avro schemas for non-nullable data in Avro writing 
> for better usability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33162) Use pre-built image at GitHub Action PySpark jobs

2020-11-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33162:
--
Fix Version/s: 3.0.2

> Use pre-built image at GitHub Action PySpark jobs
> -
>
> Key: SPARK-33162
> URL: https://issues.apache.org/jira/browse/SPARK-33162
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.2, 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33316) Support nullable Avro schemas for non-nullable data in Avro writing

2020-11-04 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-33316.

Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30224
[https://github.com/apache/spark/pull/30224]

> Support nullable Avro schemas for non-nullable data in Avro writing
> ---
>
> Key: SPARK-33316
> URL: https://issues.apache.org/jira/browse/SPARK-33316
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0, 3.0.1
>Reporter: Bo Zhang
>Priority: Major
> Fix For: 3.1.0
>
>
> Currently when users try to use nullable Avro schemas for non-nullable data 
> in Avro writing, Spark will throw a IncompatibleSchemaException.
> There are some cases when users do not have full control over the nullability 
> of the data, or the nullability of the Avro schemas they have to use. We 
> should support nullable Avro schemas for non-nullable data in Avro writing 
> for better usability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33351) WithColumn should add a column with specific position

2020-11-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226482#comment-17226482
 ] 

Apache Spark commented on SPARK-33351:
--

User 'Karl-WangSK' has created a pull request for this issue:
https://github.com/apache/spark/pull/30257

> WithColumn should add a column with specific position
> -
>
> Key: SPARK-33351
> URL: https://issues.apache.org/jira/browse/SPARK-33351
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: karl wang
>Priority: Major
>
> In `DataSet`, WithColumn usually add a new col at the end of the DF.
> But  sometime users want to add new col at the specific position.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33351) WithColumn should add a column with specific position

2020-11-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33351:


Assignee: Apache Spark

> WithColumn should add a column with specific position
> -
>
> Key: SPARK-33351
> URL: https://issues.apache.org/jira/browse/SPARK-33351
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: karl wang
>Assignee: Apache Spark
>Priority: Major
>
> In `DataSet`, WithColumn usually add a new col at the end of the DF.
> But  sometime users want to add new col at the specific position.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33351) WithColumn should add a column with specific position

2020-11-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226483#comment-17226483
 ] 

Apache Spark commented on SPARK-33351:
--

User 'Karl-WangSK' has created a pull request for this issue:
https://github.com/apache/spark/pull/30257

> WithColumn should add a column with specific position
> -
>
> Key: SPARK-33351
> URL: https://issues.apache.org/jira/browse/SPARK-33351
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: karl wang
>Priority: Major
>
> In `DataSet`, WithColumn usually add a new col at the end of the DF.
> But  sometime users want to add new col at the specific position.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33351) WithColumn should add a column with specific position

2020-11-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33351:


Assignee: (was: Apache Spark)

> WithColumn should add a column with specific position
> -
>
> Key: SPARK-33351
> URL: https://issues.apache.org/jira/browse/SPARK-33351
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: karl wang
>Priority: Major
>
> In `DataSet`, WithColumn usually add a new col at the end of the DF.
> But  sometime users want to add new col at the specific position.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33290) REFRESH TABLE should invalidate cache even though the table itself may not be cached

2020-11-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226476#comment-17226476
 ] 

Apache Spark commented on SPARK-33290:
--

User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/30256

> REFRESH TABLE should invalidate cache even though the table itself may not be 
> cached
> 
>
> Key: SPARK-33290
> URL: https://issues.apache.org/jira/browse/SPARK-33290
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.1
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: correctness
> Fix For: 2.4.8, 3.0.2, 3.1.0
>
>
> For the following example:
> {code}
> CREATE TABLE t ...;
> CREATE VIEW t1 AS SELECT * FROM t;
> REFRESH TABLE t
> {code}
> If t is cached, t1 will be invalidated. However if t is not cached as above, 
> the REFRESH command won't invalidate view t1. This could lead to incorrect 
> result if the view is used later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33290) REFRESH TABLE should invalidate cache even though the table itself may not be cached

2020-11-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226475#comment-17226475
 ] 

Apache Spark commented on SPARK-33290:
--

User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/30256

> REFRESH TABLE should invalidate cache even though the table itself may not be 
> cached
> 
>
> Key: SPARK-33290
> URL: https://issues.apache.org/jira/browse/SPARK-33290
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.1
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: correctness
> Fix For: 2.4.8, 3.0.2, 3.1.0
>
>
> For the following example:
> {code}
> CREATE TABLE t ...;
> CREATE VIEW t1 AS SELECT * FROM t;
> REFRESH TABLE t
> {code}
> If t is cached, t1 will be invalidated. However if t is not cached as above, 
> the REFRESH command won't invalidate view t1. This could lead to incorrect 
> result if the view is used later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33352) Fix procedure-like declaration compilation warning in Scala 2.13

2020-11-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226473#comment-17226473
 ] 

Apache Spark commented on SPARK-33352:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30255

> Fix procedure-like declaration compilation warning in Scala 2.13
> 
>
> Key: SPARK-33352
> URL: https://issues.apache.org/jira/browse/SPARK-33352
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Minor
>
> Similar to spark-29291, just to track Spark 3.1.0.
> There are two similar compilation warnings about procedure-like declaration 
> in Scala 2.13.3:
>  
> {code:java}
> [WARNING] [Warn] 
> /spark/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70: 
> procedure syntax is deprecated for constructors: add `=`, as in method 
> definition
> [WARNING] [Warn] 
> /spark/core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala:211:
>  procedure syntax is deprecated: instead, add `: Unit =` to explicitly 
> declare `run`'s return type
> {code}
>  
> For constructors method definition should be `this(...) = \{ }` not 
> `this(...) \{ }`, for without 
> `return type` methods definition should be `def methodName(...): Unit = {}` 
> not `def methodName(...) {}`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33352) Fix procedure-like declaration compilation warning in Scala 2.13

2020-11-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226472#comment-17226472
 ] 

Apache Spark commented on SPARK-33352:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30255

> Fix procedure-like declaration compilation warning in Scala 2.13
> 
>
> Key: SPARK-33352
> URL: https://issues.apache.org/jira/browse/SPARK-33352
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Minor
>
> Similar to spark-29291, just to track Spark 3.1.0.
> There are two similar compilation warnings about procedure-like declaration 
> in Scala 2.13.3:
>  
> {code:java}
> [WARNING] [Warn] 
> /spark/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70: 
> procedure syntax is deprecated for constructors: add `=`, as in method 
> definition
> [WARNING] [Warn] 
> /spark/core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala:211:
>  procedure syntax is deprecated: instead, add `: Unit =` to explicitly 
> declare `run`'s return type
> {code}
>  
> For constructors method definition should be `this(...) = \{ }` not 
> `this(...) \{ }`, for without 
> `return type` methods definition should be `def methodName(...): Unit = {}` 
> not `def methodName(...) {}`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33352) Fix procedure-like declaration compilation warning in Scala 2.13

2020-11-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33352:


Assignee: Apache Spark

> Fix procedure-like declaration compilation warning in Scala 2.13
> 
>
> Key: SPARK-33352
> URL: https://issues.apache.org/jira/browse/SPARK-33352
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> Similar to spark-29291, just to track Spark 3.1.0.
> There are two similar compilation warnings about procedure-like declaration 
> in Scala 2.13.3:
>  
> {code:java}
> [WARNING] [Warn] 
> /spark/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70: 
> procedure syntax is deprecated for constructors: add `=`, as in method 
> definition
> [WARNING] [Warn] 
> /spark/core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala:211:
>  procedure syntax is deprecated: instead, add `: Unit =` to explicitly 
> declare `run`'s return type
> {code}
>  
> For constructors method definition should be `this(...) = \{ }` not 
> `this(...) \{ }`, for without 
> `return type` methods definition should be `def methodName(...): Unit = {}` 
> not `def methodName(...) {}`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33352) Fix procedure-like declaration compilation warning in Scala 2.13

2020-11-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33352:


Assignee: (was: Apache Spark)

> Fix procedure-like declaration compilation warning in Scala 2.13
> 
>
> Key: SPARK-33352
> URL: https://issues.apache.org/jira/browse/SPARK-33352
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Minor
>
> Similar to spark-29291, just to track Spark 3.1.0.
> There are two similar compilation warnings about procedure-like declaration 
> in Scala 2.13.3:
>  
> {code:java}
> [WARNING] [Warn] 
> /spark/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70: 
> procedure syntax is deprecated for constructors: add `=`, as in method 
> definition
> [WARNING] [Warn] 
> /spark/core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala:211:
>  procedure syntax is deprecated: instead, add `: Unit =` to explicitly 
> declare `run`'s return type
> {code}
>  
> For constructors method definition should be `this(...) = \{ }` not 
> `this(...) \{ }`, for without 
> `return type` methods definition should be `def methodName(...): Unit = {}` 
> not `def methodName(...) {}`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33314) Avro reader drops rows

2020-11-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33314.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30221
[https://github.com/apache/spark/pull/30221]

> Avro reader drops rows
> --
>
> Key: SPARK-33314
> URL: https://issues.apache.org/jira/browse/SPARK-33314
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Blocker
>  Labels: correctness
> Fix For: 3.1.0
>
>
> Under certain circumstances, the V1 Avro reader drops rows. For example:
> {noformat}
> scala> val df = spark.range(0, 25).toDF("index")
> df: org.apache.spark.sql.DataFrame = [index: bigint]
> scala> df.write.mode("overwrite").format("avro").save("index_avro")
> scala> val loaded = spark.read.format("avro").load("index_avro")
> loaded: org.apache.spark.sql.DataFrame = [index: bigint]
> scala> loaded.collect.size
> res1: Int = 25
> scala> loaded.orderBy("index").collect.size
> res2: Int = 17   <== expected 25
> scala> 
> loaded.orderBy("index").write.mode("overwrite").format("parquet").save("index_as_parquet")
> scala> spark.read.parquet("index_as_parquet").count
> res4: Long = 17
> scala>
> {noformat}
> SPARK-32346 slightly refactored the AvroFileFormat and 
> AvroPartitionReaderFactory to use a new iterator-like trait called 
> AvroUtils#RowReader. RowReader#hasNextRow consumes a raw input record and 
> stores the deserialized row for the next call to RowReader#nextRow. 
> Unfortunately, sometimes hasNextRow is called twice before nextRow is called, 
> resulting in a lost row (see 
> [BypassMergeSortShuffleWriter#write|https://github.com/apache/spark/blob/69c27f49acf2fe6fbc8335bde2aac4afd4188678/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java#L132],
>  which calls records.hasNext once before calling it again 
> [here|https://github.com/apache/spark/blob/69c27f49acf2fe6fbc8335bde2aac4afd4188678/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java#L155]).
> RowReader consumes the Avro record in hasNextRow, rather than nextRow, 
> because AvroDeserializer#deserialize potentially filters out the record.
> Two possible fixes that I thought of:
> 1) keep state in RowReader such that multiple calls to RowReader#hasNextRow 
> with no intervening call to RowReader#nextRow avoids consuming more than 1 
> Avro record. This requires no changes to any code that extends RowReader, 
> just RowReader itself.
>  2) Move record consumption to RowReader#nextRow (such that RowReader#nextRow 
> could potentially return None) and wrap any iterator that extends RowReader 
> with a new iterator created by flatMap. This last iterator will filter out 
> the Nones and extract rows from the Somes. This requires changes to 
> AvroFileFormat and AvroPartitionReaderFactory as well as RowReader.
> The first one seems simplest and most straightfoward, and doesn't require 
> changes to AvroFileFormat and AvroPartitionReaderFactory, only to 
> AvroUtils#RowReader. So I propose this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33314) Avro reader drops rows

2020-11-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-33314:


Assignee: Bruce Robbins

> Avro reader drops rows
> --
>
> Key: SPARK-33314
> URL: https://issues.apache.org/jira/browse/SPARK-33314
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Blocker
>  Labels: correctness
>
> Under certain circumstances, the V1 Avro reader drops rows. For example:
> {noformat}
> scala> val df = spark.range(0, 25).toDF("index")
> df: org.apache.spark.sql.DataFrame = [index: bigint]
> scala> df.write.mode("overwrite").format("avro").save("index_avro")
> scala> val loaded = spark.read.format("avro").load("index_avro")
> loaded: org.apache.spark.sql.DataFrame = [index: bigint]
> scala> loaded.collect.size
> res1: Int = 25
> scala> loaded.orderBy("index").collect.size
> res2: Int = 17   <== expected 25
> scala> 
> loaded.orderBy("index").write.mode("overwrite").format("parquet").save("index_as_parquet")
> scala> spark.read.parquet("index_as_parquet").count
> res4: Long = 17
> scala>
> {noformat}
> SPARK-32346 slightly refactored the AvroFileFormat and 
> AvroPartitionReaderFactory to use a new iterator-like trait called 
> AvroUtils#RowReader. RowReader#hasNextRow consumes a raw input record and 
> stores the deserialized row for the next call to RowReader#nextRow. 
> Unfortunately, sometimes hasNextRow is called twice before nextRow is called, 
> resulting in a lost row (see 
> [BypassMergeSortShuffleWriter#write|https://github.com/apache/spark/blob/69c27f49acf2fe6fbc8335bde2aac4afd4188678/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java#L132],
>  which calls records.hasNext once before calling it again 
> [here|https://github.com/apache/spark/blob/69c27f49acf2fe6fbc8335bde2aac4afd4188678/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java#L155]).
> RowReader consumes the Avro record in hasNextRow, rather than nextRow, 
> because AvroDeserializer#deserialize potentially filters out the record.
> Two possible fixes that I thought of:
> 1) keep state in RowReader such that multiple calls to RowReader#hasNextRow 
> with no intervening call to RowReader#nextRow avoids consuming more than 1 
> Avro record. This requires no changes to any code that extends RowReader, 
> just RowReader itself.
>  2) Move record consumption to RowReader#nextRow (such that RowReader#nextRow 
> could potentially return None) and wrap any iterator that extends RowReader 
> with a new iterator created by flatMap. This last iterator will filter out 
> the Nones and extract rows from the Somes. This requires changes to 
> AvroFileFormat and AvroPartitionReaderFactory as well as RowReader.
> The first one seems simplest and most straightfoward, and doesn't require 
> changes to AvroFileFormat and AvroPartitionReaderFactory, only to 
> AvroUtils#RowReader. So I propose this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33352) Fix procedure-like declaration compilation warning in Scala 2.13

2020-11-04 Thread Yang Jie (Jira)

Yang Jie created SPARK-33352:


 Summary: Fix procedure-like declaration compilation warning in 
Scala 2.13
 Key: SPARK-33352
 URL: https://issues.apache.org/jira/browse/SPARK-33352
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 3.1.0
Reporter: Yang Jie


Similar to spark-29291, just to track Spark 3.1.0.

There are two similar compilation warnings about procedure-like declaration in 
Scala 2.13.3:

 
{code:java}
[WARNING] [Warn] 
/spark/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70: 
procedure syntax is deprecated for constructors: add `=`, as in method 
definition

[WARNING] [Warn] 
/spark/core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala:211:
 procedure syntax is deprecated: instead, add `: Unit =` to explicitly declare 
`run`'s return type
{code}
 

For constructors method definition should be `this(...) = \{ }` not `this(...) 
\{ }`, for without 

`return type` methods definition should be `def methodName(...): Unit = {}` not 
`def methodName(...) {}`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33351) WithColumn should add a column with specific position

2020-11-04 Thread karl wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

karl wang updated SPARK-33351:
--
Description: 
In `DataSet`, WithColumn usually add a new col at the end of the DF.

But  sometime users want to add new col at the specific position.

> WithColumn should add a column with specific position
> -
>
> Key: SPARK-33351
> URL: https://issues.apache.org/jira/browse/SPARK-33351
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: karl wang
>Priority: Major
>
> In `DataSet`, WithColumn usually add a new col at the end of the DF.
> But  sometime users want to add new col at the specific position.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33351) WithColumn should add a column with specific position

2020-11-04 Thread karl wang (Jira)

karl wang created SPARK-33351:
-

 Summary: WithColumn should add a column with specific position
 Key: SPARK-33351
 URL: https://issues.apache.org/jira/browse/SPARK-33351
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.1.0
Reporter: karl wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33350) Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data

2020-11-04 Thread Chandni Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated SPARK-33350:
--
Summary: Add support to DiskBlockManager to create merge directory and to 
get the local shuffle merged data  (was: Add support to DiskBlockManager to 
create merge directory and the ability to get the shuffle merged data)

> Add support to DiskBlockManager to create merge directory and to get the 
> local shuffle merged data
> --
>
> Key: SPARK-33350
> URL: https://issues.apache.org/jira/browse/SPARK-33350
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>
> DiskBlockManager should be able to create the {{merge_manager}} directory, 
> where the push-based merged shuffle files are written and also create 
> sub-dirs under it. 
> It should also be able to serve the local merged shuffle data/index/meta 
> files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33350) Add support to DiskBlockManager to create merge directory and the ability to get the shuffle merged data

2020-11-04 Thread Chandni Singh (Jira)

Chandni Singh created SPARK-33350:
-

 Summary: Add support to DiskBlockManager to create merge directory 
and the ability to get the shuffle merged data
 Key: SPARK-33350
 URL: https://issues.apache.org/jira/browse/SPARK-33350
 Project: Spark
  Issue Type: Sub-task
  Components: Shuffle
Affects Versions: 3.1.0
Reporter: Chandni Singh


DiskBlockManager should be able to create the {{merge_manager}} directory, 
where the push-based merged shuffle files are written and also create sub-dirs 
under it. 

It should also be able to serve the local merged shuffle data/index/meta files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33343) Fix the build with sbt to copy hadoop-client-runtime.jar

2020-11-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33343.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30250
[https://github.com/apache/spark/pull/30250]

> Fix the build with sbt to copy hadoop-client-runtime.jar
> 
>
> Key: SPARK-33343
> URL: https://issues.apache.org/jira/browse/SPARK-33343
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
> Fix For: 3.1.0
>
>
> With the current master, spark-shell doesn't work if it's built with sbt 
> package.
> It's due to hadoop-client-runtime.jar isn't copied to 
> assembly/target/scala-2.12/jars.
> {code}
> $ bin/spark-shell
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426)
>   at 
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   ... 11 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31711) Register the executor source with the metrics system when running in local mode.

2020-11-04 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-31711.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

> Register the executor source with the metrics system when running in local 
> mode.
> 
>
> Key: SPARK-31711
> URL: https://issues.apache.org/jira/browse/SPARK-31711
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Minor
> Fix For: 3.1.0
>
>
> The Apache Spark metrics system provides many useful insights on the Spark 
> workload. In particular, the executor source metrics 
> (https://github.com/apache/spark/blob/master/docs/monitoring.md#component-instance--executor)
>  provide detailed info, including the number of active tasks, some I/O 
> metrics, and task metrics details. Executor source metrics, contrary to other 
> sources (for example ExecutorMetrics source), are not yet available when 
> running in local mode.
> This JIRA proposes to register the executor source with the Spark metrics 
> system when running in local mode, as this can be very useful when testing 
> and troubleshooting Spark workloads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31711) Register the executor source with the metrics system when running in local mode.

2020-11-04 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reassigned SPARK-31711:
-

Assignee: Luca Canali

> Register the executor source with the metrics system when running in local 
> mode.
> 
>
> Key: SPARK-31711
> URL: https://issues.apache.org/jira/browse/SPARK-31711
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Minor
>
> The Apache Spark metrics system provides many useful insights on the Spark 
> workload. In particular, the executor source metrics 
> (https://github.com/apache/spark/blob/master/docs/monitoring.md#component-instance--executor)
>  provide detailed info, including the number of active tasks, some I/O 
> metrics, and task metrics details. Executor source metrics, contrary to other 
> sources (for example ExecutorMetrics source), are not yet available when 
> running in local mode.
> This JIRA proposes to register the executor source with the Spark metrics 
> system when running in local mode, as this can be very useful when testing 
> and troubleshooting Spark workloads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33282) Replace Probot Autolabeler with Github Action

2020-11-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226414#comment-17226414
 ] 

Apache Spark commented on SPARK-33282:
--

User 'kbendick' has created a pull request for this issue:
https://github.com/apache/spark/pull/30254

> Replace Probot Autolabeler with Github Action
> -
>
> Key: SPARK-33282
> URL: https://issues.apache.org/jira/browse/SPARK-33282
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 3.0.1
>Reporter: Kyle Bendickson
>Priority: Major
>
> The Probot Autolabeler that we were using in both the Iceberg and the Spark 
> repo is no longer working. I've confirmed that with the devleper, github user 
> [at]mithro, who has indicated that the Probot Autolabeler is end of life and 
> will not be maintained moving forward.
> PRs have not been labeled for a few weeks now.
>  
> As I'm already interfacing with ASF Infra to have the probot permissions 
> revoked from the Iceberg repo, and I've already submitted a patch to switch 
> Iceberg to the standard github labeler action, I figured I would go ahead and 
> volunteer myself to switch the Spark repo as well.
> I will have a patch to switch to the new github labeler open within a few 
> days.
>  
> Also thank you [~blue] (or [~holden]) for shepherding this! I didn't exactly 
> ask, but it was understood in our group meeting for Iceberg that I'd be 
> converting our labeler there so I figured I'd tackle the spark issue while 
> I'm getting my hands into the labeling configs anyway =)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33282) Replace Probot Autolabeler with Github Action

2020-11-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226413#comment-17226413
 ] 

Apache Spark commented on SPARK-33282:
--

User 'kbendick' has created a pull request for this issue:
https://github.com/apache/spark/pull/30254

> Replace Probot Autolabeler with Github Action
> -
>
> Key: SPARK-33282
> URL: https://issues.apache.org/jira/browse/SPARK-33282
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 3.0.1
>Reporter: Kyle Bendickson
>Priority: Major
>
> The Probot Autolabeler that we were using in both the Iceberg and the Spark 
> repo is no longer working. I've confirmed that with the devleper, github user 
> [at]mithro, who has indicated that the Probot Autolabeler is end of life and 
> will not be maintained moving forward.
> PRs have not been labeled for a few weeks now.
>  
> As I'm already interfacing with ASF Infra to have the probot permissions 
> revoked from the Iceberg repo, and I've already submitted a patch to switch 
> Iceberg to the standard github labeler action, I figured I would go ahead and 
> volunteer myself to switch the Spark repo as well.
> I will have a patch to switch to the new github labeler open within a few 
> days.
>  
> Also thank you [~blue] (or [~holden]) for shepherding this! I didn't exactly 
> ask, but it was understood in our group meeting for Iceberg that I'd be 
> converting our labeler there so I figured I'd tackle the spark issue while 
> I'm getting my hands into the labeling configs anyway =)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33162) Use pre-built image at GitHub Action PySpark jobs

2020-11-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226408#comment-17226408
 ] 

Apache Spark commented on SPARK-33162:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30253

> Use pre-built image at GitHub Action PySpark jobs
> -
>
> Key: SPARK-33162
> URL: https://issues.apache.org/jira/browse/SPARK-33162
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33162) Use pre-built image at GitHub Action PySpark jobs

2020-11-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226407#comment-17226407
 ] 

Apache Spark commented on SPARK-33162:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30253

> Use pre-built image at GitHub Action PySpark jobs
> -
>
> Key: SPARK-33162
> URL: https://issues.apache.org/jira/browse/SPARK-33162
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

2020-11-04 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226395#comment-17226395
 ] 

Dongjoon Hyun commented on SPARK-33349:
---

I converted this to a subtask of SPARK-33005 to give more visibility.

> ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
> --
>
> Key: SPARK-33349
> URL: https://issues.apache.org/jira/browse/SPARK-33349
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.0.1, 3.0.2
>Reporter: Nicola Bova
>Priority: Critical
>
> I launch my spark application with the 
> [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator]
>  with the following yaml file:
> {code:yaml}
> apiVersion: sparkoperator.k8s.io/v1beta2
> kind: SparkApplication
> metadata:
>    name: spark-kafka-streamer-test
>    namespace: kafka2hdfs
> spec: 
>    type: Scala
>    mode: cluster
>    image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
>    imagePullPolicy: Always
>    timeToLiveSeconds: 259200
>    mainClass: path.to.my.class.KafkaStreamer
>    mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
>    sparkVersion: 3.0.1
>    restartPolicy:
>  type: Always
>    sparkConf:
>  "spark.kafka.consumer.cache.capacity": "8192"
>  "spark.kubernetes.memoryOverheadFactor": "0.3"
>    deps:
>    jars:
>  - my
>  - jar
>  - list
>    hadoopConfigMap: hdfs-config
>    driver:
>  cores: 4
>  memory: 12g
>  labels:
>    version: 3.0.1
>  serviceAccount: default
>  javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
>   executor:
>  instances: 4
>     cores: 4
>     memory: 16g
>     labels:
>   version: 3.0.1
>     javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
> {code}
>  I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart 
> the watcher when we receive a version changed from 
> k8s"|https://github.com/apache/spark/pull/29533] patch.
> This is the driver log:
> {code}
> 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> ... // my app log, it's a structured streaming app reading from kafka and 
> writing to hdfs
> 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed (this is expected if the application is shutting down.)
> io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
> version: 1574101276 (1574213896)
>  at 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
>  at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
>  at 
> okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
>  at 
> okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
>  at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
>  at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
>  at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
>  at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>  at java.base/java.lang.Thread.run(Unknown Source)
> {code}
> The error above appears after roughly 50 minutes.
> After the exception above, no more logs are produced and the app hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

2020-11-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33349:
--
Affects Version/s: 3.1.0

> ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
> --
>
> Key: SPARK-33349
> URL: https://issues.apache.org/jira/browse/SPARK-33349
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.0.1, 3.0.2, 3.1.0
>Reporter: Nicola Bova
>Priority: Critical
>
> I launch my spark application with the 
> [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator]
>  with the following yaml file:
> {code:yaml}
> apiVersion: sparkoperator.k8s.io/v1beta2
> kind: SparkApplication
> metadata:
>    name: spark-kafka-streamer-test
>    namespace: kafka2hdfs
> spec: 
>    type: Scala
>    mode: cluster
>    image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
>    imagePullPolicy: Always
>    timeToLiveSeconds: 259200
>    mainClass: path.to.my.class.KafkaStreamer
>    mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
>    sparkVersion: 3.0.1
>    restartPolicy:
>  type: Always
>    sparkConf:
>  "spark.kafka.consumer.cache.capacity": "8192"
>  "spark.kubernetes.memoryOverheadFactor": "0.3"
>    deps:
>    jars:
>  - my
>  - jar
>  - list
>    hadoopConfigMap: hdfs-config
>    driver:
>  cores: 4
>  memory: 12g
>  labels:
>    version: 3.0.1
>  serviceAccount: default
>  javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
>   executor:
>  instances: 4
>     cores: 4
>     memory: 16g
>     labels:
>   version: 3.0.1
>     javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
> {code}
>  I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart 
> the watcher when we receive a version changed from 
> k8s"|https://github.com/apache/spark/pull/29533] patch.
> This is the driver log:
> {code}
> 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> ... // my app log, it's a structured streaming app reading from kafka and 
> writing to hdfs
> 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed (this is expected if the application is shutting down.)
> io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
> version: 1574101276 (1574213896)
>  at 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
>  at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
>  at 
> okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
>  at 
> okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
>  at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
>  at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
>  at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
>  at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>  at java.base/java.lang.Thread.run(Unknown Source)
> {code}
> The error above appears after roughly 50 minutes.
> After the exception above, no more logs are produced and the app hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

2020-11-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33349:
--
Parent: SPARK-33005
Issue Type: Sub-task  (was: Bug)

> ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
> --
>
> Key: SPARK-33349
> URL: https://issues.apache.org/jira/browse/SPARK-33349
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.0.1, 3.0.2
>Reporter: Nicola Bova
>Priority: Critical
>
> I launch my spark application with the 
> [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator]
>  with the following yaml file:
> {code:yaml}
> apiVersion: sparkoperator.k8s.io/v1beta2
> kind: SparkApplication
> metadata:
>    name: spark-kafka-streamer-test
>    namespace: kafka2hdfs
> spec: 
>    type: Scala
>    mode: cluster
>    image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
>    imagePullPolicy: Always
>    timeToLiveSeconds: 259200
>    mainClass: path.to.my.class.KafkaStreamer
>    mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
>    sparkVersion: 3.0.1
>    restartPolicy:
>  type: Always
>    sparkConf:
>  "spark.kafka.consumer.cache.capacity": "8192"
>  "spark.kubernetes.memoryOverheadFactor": "0.3"
>    deps:
>    jars:
>  - my
>  - jar
>  - list
>    hadoopConfigMap: hdfs-config
>    driver:
>  cores: 4
>  memory: 12g
>  labels:
>    version: 3.0.1
>  serviceAccount: default
>  javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
>   executor:
>  instances: 4
>     cores: 4
>     memory: 16g
>     labels:
>   version: 3.0.1
>     javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
> {code}
>  I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart 
> the watcher when we receive a version changed from 
> k8s"|https://github.com/apache/spark/pull/29533] patch.
> This is the driver log:
> {code}
> 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> ... // my app log, it's a structured streaming app reading from kafka and 
> writing to hdfs
> 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed (this is expected if the application is shutting down.)
> io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
> version: 1574101276 (1574213896)
>  at 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
>  at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
>  at 
> okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
>  at 
> okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
>  at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
>  at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
>  at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
>  at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>  at java.base/java.lang.Thread.run(Unknown Source)
> {code}
> The error above appears after roughly 50 minutes.
> After the exception above, no more logs are produced and the app hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

2020-11-04 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226393#comment-17226393
 ] 

Dongjoon Hyun commented on SPARK-33349:
---

Thanks, [~jkleckner]. It looks like a breaking change.
{code:java}
Note Minor breaking changes:

- PR #2424 (#2414) slightly changes the API by adding the new WatchAndWaitable 
"combiner" interface.
Most projects shouldn't require any additional changes.
{code}

> ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
> --
>
> Key: SPARK-33349
> URL: https://issues.apache.org/jira/browse/SPARK-33349
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.1, 3.0.2
>Reporter: Nicola Bova
>Priority: Critical
>
> I launch my spark application with the 
> [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator]
>  with the following yaml file:
> {code:yaml}
> apiVersion: sparkoperator.k8s.io/v1beta2
> kind: SparkApplication
> metadata:
>    name: spark-kafka-streamer-test
>    namespace: kafka2hdfs
> spec: 
>    type: Scala
>    mode: cluster
>    image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
>    imagePullPolicy: Always
>    timeToLiveSeconds: 259200
>    mainClass: path.to.my.class.KafkaStreamer
>    mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
>    sparkVersion: 3.0.1
>    restartPolicy:
>  type: Always
>    sparkConf:
>  "spark.kafka.consumer.cache.capacity": "8192"
>  "spark.kubernetes.memoryOverheadFactor": "0.3"
>    deps:
>    jars:
>  - my
>  - jar
>  - list
>    hadoopConfigMap: hdfs-config
>    driver:
>  cores: 4
>  memory: 12g
>  labels:
>    version: 3.0.1
>  serviceAccount: default
>  javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
>   executor:
>  instances: 4
>     cores: 4
>     memory: 16g
>     labels:
>   version: 3.0.1
>     javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
> {code}
>  I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart 
> the watcher when we receive a version changed from 
> k8s"|https://github.com/apache/spark/pull/29533] patch.
> This is the driver log:
> {code}
> 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> ... // my app log, it's a structured streaming app reading from kafka and 
> writing to hdfs
> 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed (this is expected if the application is shutting down.)
> io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
> version: 1574101276 (1574213896)
>  at 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
>  at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
>  at 
> okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
>  at 
> okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
>  at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
>  at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
>  at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
>  at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>  at java.base/java.lang.Thread.run(Unknown Source)
> {code}
> The error above appears after roughly 50 minutes.
> After the exception above, no more logs are produced and the app hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33331) Limit the number of pending blocks in memory and store blocks that collide

2020-11-04 Thread Chandni Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated SPARK-1:
--
Description: 
This jira addresses the below two points:
 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately are 
stored in memory. The stream callback maintains a list of {{deferredBufs}}. 
When a block cannot be merged it is added to this list. Currently, there isn't 
a limit on the number of pending blocks. We can limit the number of pending 
blocks in memory. There has been a discussion around this here:
[https://github.com/apache/spark/pull/30062#discussion_r514026014]

2. When a stream doesn't get an opportunity to merge, then 
{{RemoteBlockPushResolver}} ignores the data from that stream. Another approach 
is to store the data of the stream in {{AppShufflePartitionInfo}} when it 
reaches the worst-case scenario. This may increase the memory usage of the 
shuffle service though. However, given a limit introduced with 1 we can try 
this out.
 More information can be found in this discussion:
 [https://github.com/apache/spark/pull/30062#discussion_r517524546]

  was:
This jira addresses the below two points:
 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately are 
stored in memory. The stream callback maintains a list of {{deferredBufs}}. 
When a block cannot be merged it is added to this list. Currently, there isn't 
a limit on the number of pending blocks. We can limit the number of pending 
blocks in memory. There has been a discussion around this here:
 [https://github.com/apache/spark/pull/30062#discussion_r514026014
]

2. When a stream doesn't get an opportunity to merge, then 
{{RemoteBlockPushResolver}} ignores the data from that stream. Another approach 
is to store the data of the stream in {{AppShufflePartitionInfo}} when it 
reaches the worst-case scenario. This may increase the memory usage of the 
shuffle service though. However, given a limit introduced with 1 we can try 
this out.
 More information can be found in this discussion:
 [https://github.com/apache/spark/pull/30062#discussion_r517524546]


> Limit the number of pending blocks in memory and store blocks that collide
> --
>
> Key: SPARK-1
> URL: https://issues.apache.org/jira/browse/SPARK-1
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>
> This jira addresses the below two points:
>  1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately 
> are stored in memory. The stream callback maintains a list of 
> {{deferredBufs}}. When a block cannot be merged it is added to this list. 
> Currently, there isn't a limit on the number of pending blocks. We can limit 
> the number of pending blocks in memory. There has been a discussion around 
> this here:
> [https://github.com/apache/spark/pull/30062#discussion_r514026014]
> 2. When a stream doesn't get an opportunity to merge, then 
> {{RemoteBlockPushResolver}} ignores the data from that stream. Another 
> approach is to store the data of the stream in {{AppShufflePartitionInfo}} 
> when it reaches the worst-case scenario. This may increase the memory usage 
> of the shuffle service though. However, given a limit introduced with 1 we 
> can try this out.
>  More information can be found in this discussion:
>  [https://github.com/apache/spark/pull/30062#discussion_r517524546]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33331) Limit the number of pending blocks in memory and store blocks that collide

2020-11-04 Thread Chandni Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated SPARK-1:
--
Description: 
This jira addresses the below two points:
 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately are 
stored in memory. The stream callback maintains a list of {{deferredBufs}}. 
When a block cannot be merged it is added to this list. Currently, there isn't 
a limit on the number of pending blocks. We can limit the number of pending 
blocks in memory. There has been a discussion around this here:
 [https://github.com/apache/spark/pull/30062#discussion_r514026014
]

2. When a stream doesn't get an opportunity to merge, then 
{{RemoteBlockPushResolver}} ignores the data from that stream. Another approach 
is to store the data of the stream in {{AppShufflePartitionInfo}} when it 
reaches the worst-case scenario. This may increase the memory usage of the 
shuffle service though. However, given a limit introduced with 1 we can try 
this out.
 More information can be found in this discussion:
 [https://github.com/apache/spark/pull/30062#discussion_r517524546]

  was:
1. In {{RemoteBlockPushResolver}},  bytes that cannot be merged immediately are 
stored in memory. The stream callback maintains a list of {{deferredBufs}}. 
When a block cannot be merged it is added to this list. Currently, there isn't 
a limit on the number of pending blocks. There has been a discussion around 
this here:
https://github.com/apache/spark/pull/30062#discussion_r514026014

2. When a stream doesn't get an opportunity to merge, then 
{{RemoteBlockPushResolver}} ignores the data from that stream. Another approach 
is to store the data of the stream in {{AppShufflePartitionInfo}} when it 
reaches the worst-case scenario. This may increase the memory usage of the 
shuffle service though. However, given a limit introduced with 1 we can try 
this out.



> Limit the number of pending blocks in memory and store blocks that collide
> --
>
> Key: SPARK-1
> URL: https://issues.apache.org/jira/browse/SPARK-1
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>
> This jira addresses the below two points:
>  1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately 
> are stored in memory. The stream callback maintains a list of 
> {{deferredBufs}}. When a block cannot be merged it is added to this list. 
> Currently, there isn't a limit on the number of pending blocks. We can limit 
> the number of pending blocks in memory. There has been a discussion around 
> this here:
>  [https://github.com/apache/spark/pull/30062#discussion_r514026014
> ]
> 2. When a stream doesn't get an opportunity to merge, then 
> {{RemoteBlockPushResolver}} ignores the data from that stream. Another 
> approach is to store the data of the stream in {{AppShufflePartitionInfo}} 
> when it reaches the worst-case scenario. This may increase the memory usage 
> of the shuffle service though. However, given a limit introduced with 1 we 
> can try this out.
>  More information can be found in this discussion:
>  [https://github.com/apache/spark/pull/30062#discussion_r517524546]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33331) Limit the number of pending blocks in memory and store blocks that collide

2020-11-04 Thread Chandni Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated SPARK-1:
--
Summary: Limit the number of pending blocks in memory and store blocks that 
collide  (was: Limit the number of pending blocks in memory when 
RemoteBlockPushResolver defers a block)

> Limit the number of pending blocks in memory and store blocks that collide
> --
>
> Key: SPARK-1
> URL: https://issues.apache.org/jira/browse/SPARK-1
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>
> 1. In {{RemoteBlockPushResolver}},  bytes that cannot be merged immediately 
> are stored in memory. The stream callback maintains a list of 
> {{deferredBufs}}. When a block cannot be merged it is added to this list. 
> Currently, there isn't a limit on the number of pending blocks. There has 
> been a discussion around this here:
> https://github.com/apache/spark/pull/30062#discussion_r514026014
> 2. When a stream doesn't get an opportunity to merge, then 
> {{RemoteBlockPushResolver}} ignores the data from that stream. Another 
> approach is to store the data of the stream in {{AppShufflePartitionInfo}} 
> when it reaches the worst-case scenario. This may increase the memory usage 
> of the shuffle service though. However, given a limit introduced with 1 we 
> can try this out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33331) Limit the number of pending blocks in memory when RemoteBlockPushResolver defers a block

2020-11-04 Thread Chandni Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated SPARK-1:
--
Description: 
1. In {{RemoteBlockPushResolver}},  bytes that cannot be merged immediately are 
stored in memory. The stream callback maintains a list of {{deferredBufs}}. 
When a block cannot be merged it is added to this list. Currently, there isn't 
a limit on the number of pending blocks. There has been a discussion around 
this here:
https://github.com/apache/spark/pull/30062#discussion_r514026014

2. When a stream doesn't get an opportunity to merge, then 
{{RemoteBlockPushResolver}} ignores the data from that stream. Another approach 
is to store the data of the stream in {{AppShufflePartitionInfo}} when it 
reaches the worst-case scenario. This may increase the memory usage of the 
shuffle service though. However, given a limit introduced with 1 we can try 
this out.


  was:
This is to address the comment here:
https://github.com/apache/spark/pull/30062#discussion_r514026014


> Limit the number of pending blocks in memory when RemoteBlockPushResolver 
> defers a block
> 
>
> Key: SPARK-1
> URL: https://issues.apache.org/jira/browse/SPARK-1
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>
> 1. In {{RemoteBlockPushResolver}},  bytes that cannot be merged immediately 
> are stored in memory. The stream callback maintains a list of 
> {{deferredBufs}}. When a block cannot be merged it is added to this list. 
> Currently, there isn't a limit on the number of pending blocks. There has 
> been a discussion around this here:
> https://github.com/apache/spark/pull/30062#discussion_r514026014
> 2. When a stream doesn't get an opportunity to merge, then 
> {{RemoteBlockPushResolver}} ignores the data from that stream. Another 
> approach is to store the data of the stream in {{AppShufflePartitionInfo}} 
> when it reaches the worst-case scenario. This may increase the memory usage 
> of the shuffle service though. However, given a limit introduced with 1 we 
> can try this out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

2020-11-04 Thread Jim Kleckner (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226362#comment-17226362
 ] 

Jim Kleckner commented on SPARK-33349:
--

This fabric8 kubernetes-client issue/MR looks relevant:

Repeated "too old resource version" exception with 
BaseOperation.waitUntilCondition(). #2414
 * 
[https://github.com/fabric8io/kubernetes-client/issues/2414|https://github.com/fabric8io/kubernetes-client/issues/2414]
 * 
[https://github.com/fabric8io/kubernetes-client/pull/2424|https://github.com/fabric8io/kubernetes-client/pull/2424]

 

This is released in 
[https://github.com/fabric8io/kubernetes-client/releases/tag/v4.12.0|https://github.com/fabric8io/kubernetes-client/releases/tag/v4.12.0]


 

> ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
> --
>
> Key: SPARK-33349
> URL: https://issues.apache.org/jira/browse/SPARK-33349
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.1, 3.0.2
>Reporter: Nicola Bova
>Priority: Critical
>
> I launch my spark application with the 
> [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator]
>  with the following yaml file:
> {code:yaml}
> apiVersion: sparkoperator.k8s.io/v1beta2
> kind: SparkApplication
> metadata:
>    name: spark-kafka-streamer-test
>    namespace: kafka2hdfs
> spec: 
>    type: Scala
>    mode: cluster
>    image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
>    imagePullPolicy: Always
>    timeToLiveSeconds: 259200
>    mainClass: path.to.my.class.KafkaStreamer
>    mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
>    sparkVersion: 3.0.1
>    restartPolicy:
>  type: Always
>    sparkConf:
>  "spark.kafka.consumer.cache.capacity": "8192"
>  "spark.kubernetes.memoryOverheadFactor": "0.3"
>    deps:
>    jars:
>  - my
>  - jar
>  - list
>    hadoopConfigMap: hdfs-config
>    driver:
>  cores: 4
>  memory: 12g
>  labels:
>    version: 3.0.1
>  serviceAccount: default
>  javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
>   executor:
>  instances: 4
>     cores: 4
>     memory: 16g
>     labels:
>   version: 3.0.1
>     javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
> {code}
>  I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart 
> the watcher when we receive a version changed from 
> k8s"|https://github.com/apache/spark/pull/29533] patch.
> This is the driver log:
> {code}
> 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> ... // my app log, it's a structured streaming app reading from kafka and 
> writing to hdfs
> 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed (this is expected if the application is shutting down.)
> io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
> version: 1574101276 (1574213896)
>  at 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
>  at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
>  at 
> okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
>  at 
> okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
>  at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
>  at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
>  at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
>  at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>  at java.base/java.lang.Thread.run(Unknown Source)
> {code}
> The error above appears after roughly 50 minutes.
> After the exception above, no more logs are produced and the app hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

2020-11-04 Thread Nicola Bova (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicola Bova updated SPARK-33349:

Shepherd: Dongjoon Hyun

> ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
> --
>
> Key: SPARK-33349
> URL: https://issues.apache.org/jira/browse/SPARK-33349
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.1, 3.0.2
>Reporter: Nicola Bova
>Priority: Critical
>
> I launch my spark application with the 
> [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator]
>  with the following yaml file:
> {code:yaml}
> apiVersion: sparkoperator.k8s.io/v1beta2
> kind: SparkApplication
> metadata:
>    name: spark-kafka-streamer-test
>    namespace: kafka2hdfs
> spec: 
>    type: Scala
>    mode: cluster
>    image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
>    imagePullPolicy: Always
>    timeToLiveSeconds: 259200
>    mainClass: path.to.my.class.KafkaStreamer
>    mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
>    sparkVersion: 3.0.1
>    restartPolicy:
>  type: Always
>    sparkConf:
>  "spark.kafka.consumer.cache.capacity": "8192"
>  "spark.kubernetes.memoryOverheadFactor": "0.3"
>    deps:
>    jars:
>  - my
>  - jar
>  - list
>    hadoopConfigMap: hdfs-config
>    driver:
>  cores: 4
>  memory: 12g
>  labels:
>    version: 3.0.1
>  serviceAccount: default
>  javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
>   executor:
>  instances: 4
>     cores: 4
>     memory: 16g
>     labels:
>   version: 3.0.1
>     javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
> {code}
>  I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart 
> the watcher when we receive a version changed from 
> k8s"|https://github.com/apache/spark/pull/29533] patch.
> This is the driver log:
> {code}
> 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> ... // my app log, it's a structured streaming app reading from kafka and 
> writing to hdfs
> 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed (this is expected if the application is shutting down.)
> io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
> version: 1574101276 (1574213896)
>  at 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
>  at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
>  at 
> okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
>  at 
> okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
>  at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
>  at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
>  at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
>  at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>  at java.base/java.lang.Thread.run(Unknown Source)
> {code}
> The error above appears after roughly 50 minutes.
> After the exception above, no more logs are produced and the app hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

2020-11-04 Thread Nicola Bova (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicola Bova updated SPARK-33349:

Description: 
I launch my spark application with the 
[spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator]
 with the following yaml file:

{code:yaml}
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
   name: spark-kafka-streamer-test
   namespace: kafka2hdfs
spec: 
   type: Scala
   mode: cluster
   image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
   imagePullPolicy: Always
   timeToLiveSeconds: 259200
   mainClass: path.to.my.class.KafkaStreamer
   mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
   sparkVersion: 3.0.1
   restartPolicy:
 type: Always
   sparkConf:
 "spark.kafka.consumer.cache.capacity": "8192"
 "spark.kubernetes.memoryOverheadFactor": "0.3"
   deps:
   jars:
 - my
 - jar
 - list
   hadoopConfigMap: hdfs-config

   driver:
 cores: 4
 memory: 12g
 labels:
   version: 3.0.1
 serviceAccount: default
 javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

  executor:
 instances: 4
    cores: 4
    memory: 16g
    labels:
  version: 3.0.1
    javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
{code}

 I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart 
the watcher when we receive a version changed from 
k8s"|https://github.com/apache/spark/pull/29533] patch.

This is the driver log:

{code}
20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable

... // my app log, it's a structured streaming app reading from kafka and 
writing to hdfs

20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed (this is expected if the application is shutting down.)
io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
version: 1574101276 (1574213896)
 at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
 at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
 at 
okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
 at 
okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
 at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
 at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
 at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
 at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.base/java.lang.Thread.run(Unknown Source)
{code}

The error above appears after roughly 50 minutes.

After the exception above, no more logs are produced and the app hangs.

  was:
I launch my spark application with the 
[spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator]
 with the following yaml file:

{code:yaml}
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
   name: spark-kafka-streamer-test
   namespace: kafka2hdfs
spec: 
   type: Scala
   mode: cluster
   image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
   imagePullPolicy: Always
   timeToLiveSeconds: 259200
   mainClass: path.to.my.class.KafkaStreamer
   mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
   sparkVersion: 3.0.1
   restartPolicy:
 type: Always
   sparkConf:
 "spark.kafka.consumer.cache.capacity": "8192"
 "spark.kubernetes.memoryOverheadFactor": "0.3"
   deps:
   jars:
 - my
 - jar
 - list
   hadoopConfigMap: hdfs-config

   driver:
 cores: 4
 memory: 12g
 labels:
   version: 3.0.1
 serviceAccount: default
 javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

  executor:
 instances: 4
    cores: 4
    memory: 16g
    labels:
  version: 3.0.1
    javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
{code}

 

This is the driver log:

{code}
20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable

... // my app log, it's a structured streaming app reading from kafka and 
writing to hdfs

20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed (this is expected if the application is shutting down.)
io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
version: 1574101276 (1574213896)
 at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
 at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealW

[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

2020-11-04 Thread Nicola Bova (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicola Bova updated SPARK-33349:

Description: 
I launch my spark application with the 
[spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator]
 with the following yaml file:

{code:yaml}
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
   name: spark-kafka-streamer-test
   namespace: kafka2hdfs
spec: 
   type: Scala
   mode: cluster
   image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
   imagePullPolicy: Always
   timeToLiveSeconds: 259200
   mainClass: path.to.my.class.KafkaStreamer
   mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
   sparkVersion: 3.0.1
   restartPolicy:
 type: Always
   sparkConf:
 "spark.kafka.consumer.cache.capacity": "8192"
 "spark.kubernetes.memoryOverheadFactor": "0.3"
   deps:
   jars:
 - my
 - jar
 - list
   hadoopConfigMap: hdfs-config

   driver:
 cores: 4
 memory: 12g
 labels:
   version: 3.0.1
 serviceAccount: default
 javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

  executor:
 instances: 4
    cores: 4
    memory: 16g
    labels:
  version: 3.0.1
    javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
{code}

 

This is the driver log:

{code}
20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable

... // my app log, it's a structured streaming app reading from kafka and 
writing to hdfs

20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed (this is expected if the application is shutting down.)
io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
version: 1574101276 (1574213896)
 at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
 at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
 at 
okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
 at 
okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
 at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
 at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
 at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
 at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.base/java.lang.Thread.run(Unknown Source)
{code}

The error above appears after roughly 50 minutes.

After the exception above, no more logs are produced and the app hangs.

  was:
I launch my spark application with the 
[spark-on-kubernetes-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)
 with the following yaml file:

{code:yaml}
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
   name: spark-kafka-streamer-test
   namespace: kafka2hdfs
spec: 
   type: Scala
   mode: cluster
   image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
   imagePullPolicy: Always
   timeToLiveSeconds: 259200
   mainClass: path.to.my.class.KafkaStreamer
   mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
   sparkVersion: 3.0.1
   restartPolicy:
 type: Always
   sparkConf:
 "spark.kafka.consumer.cache.capacity": "8192"
 "spark.kubernetes.memoryOverheadFactor": "0.3"
   deps:
   jars:
 - my
 - jar
 - list
   hadoopConfigMap: hdfs-config

   driver:
 cores: 4
 memory: 12g
 labels:
   version: 3.0.1
 serviceAccount: default
 javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

  executor:
 instances: 4
    cores: 4
    memory: 16g
    labels:
  version: 3.0.1
    javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
{code}

 

This is the driver log:

{code}
20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable

... // my app log, it's a structured streaming app reading from kafka and 
writing to hdfs

20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed (this is expected if the application is shutting down.)
io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
version: 1574101276 (1574213896)
 at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
 at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
 at 
okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
 at 
okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:10

[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

2020-11-04 Thread Nicola Bova (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicola Bova updated SPARK-33349:

Description: 
I launch my spark application with the 
[spark-on-kubernetes-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)
 with the following yaml file:

{code:yaml}
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
   name: spark-kafka-streamer-test
   namespace: kafka2hdfs
spec: 
   type: Scala
   mode: cluster
   image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
   imagePullPolicy: Always
   timeToLiveSeconds: 259200
   mainClass: path.to.my.class.KafkaStreamer
   mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
   sparkVersion: 3.0.1
   restartPolicy:
 type: Always
   sparkConf:
 "spark.kafka.consumer.cache.capacity": "8192"
 "spark.kubernetes.memoryOverheadFactor": "0.3"
   deps:
   jars:
 - my
 - jar
 - list
   hadoopConfigMap: hdfs-config

   driver:
 cores: 4
 memory: 12g
 labels:
   version: 3.0.1
 serviceAccount: default
 javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

  executor:
 instances: 4
    cores: 4
    memory: 16g
    labels:
  version: 3.0.1
    javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
{code}

 

This is the driver log:

{code}
20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable

... // my app log, it's a structured streaming app reading from kafka and 
writing to hdfs

20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed (this is expected if the application is shutting down.)
io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
version: 1574101276 (1574213896)
 at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
 at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
 at 
okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
 at 
okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
 at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
 at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
 at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
 at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.base/java.lang.Thread.run(Unknown Source)
{code}

The error above appears after roughly 50 minutes.

After the exception above, no more logs are produced and the app hangs.

  was:
I launch my spark application with the 
[spark-on-kubernetes-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)
 with the following yaml file:

{code:yaml}
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
   name: spark-kafka-streamer-test
   namespace: kafka2hdfs
spec: 
   type: Scala
   mode: cluster
   image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
   imagePullPolicy: Always
   timeToLiveSeconds: 259200
   mainClass: path.to.my.class.KafkaStreamer
   mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
   sparkVersion: 3.0.1
   restartPolicy:
 type: Always
   sparkConf:
 "spark.kafka.consumer.cache.capacity": "8192"
 "spark.kubernetes.memoryOverheadFactor": "0.3"
   deps:
   jars:
 - my
 - jar
 - list
   hadoopConfigMap: hdfs-config

   driver:
 cores: 4
 memory: 12g
 labels:
   version: 3.0.1
 serviceAccount: default
 javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

  executor:
 instances: 4
    cores: 4
    memory: 16g
    labels:
  version: 3.0.1
    javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
{code}

 

This is the driver log:

```

20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable

... // my app log, it's a structured streaming app reading from kafka and 
writing to hdfs

20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed (this is expected if the application is shutting down.)
io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
version: 1574101276 (1574213896)
 at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
 at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
 at 
okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
 at 
okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105

[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

2020-11-04 Thread Nicola Bova (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicola Bova updated SPARK-33349:

Description: 
I launch my spark application with the 
[spark-on-kubernetes-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)
 with the following yaml file:

```
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
   name: spark-kafka-streamer-test
   namespace: kafka2hdfs
spec: 
   type: Scala
   mode: cluster
   image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
   imagePullPolicy: Always
   timeToLiveSeconds: 259200
   mainClass: path.to.my.class.KafkaStreamer
   mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
   sparkVersion: 3.0.1
   restartPolicy:
 type: Always
   sparkConf:
 "spark.kafka.consumer.cache.capacity": "8192"
 "spark.kubernetes.memoryOverheadFactor": "0.3"
   deps:
   jars:
 - my
 - jar
 - list
   hadoopConfigMap: hdfs-config

   driver:
 cores: 4
 memory: 12g
 labels:
   version: 3.0.1
 serviceAccount: default
 javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

  executor:
 instances: 4
    cores: 4
    memory: 16g
    labels:
  version: 3.0.1
    javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

```

 

This is the driver log:

```

20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable

... // my app log, it's a structured streaming app reading from kafka and 
writing to hdfs

20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed (this is expected if the application is shutting down.)
io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
version: 1574101276 (1574213896)
 at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
 at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
 at 
okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
 at 
okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
 at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
 at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
 at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
 at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.base/java.lang.Thread.run(Unknown Source)

```

The error above appears after roughly 50 minutes.

After the exception above, no more logs are produced and the app hangs.

  was:
I launch my spark application with the 
[spark-on-kubernetes-operator]([https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)]
 with the following yaml file:

```
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
   name: spark-kafka-streamer-test
   namespace: kafka2hdfs
spec: 
   type: Scala
   mode: cluster
   image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
   imagePullPolicy: Always
   timeToLiveSeconds: 259200
   mainClass: path.to.my.class.KafkaStreamer
   mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
   sparkVersion: 3.0.1
   restartPolicy:
 type: Always
   sparkConf:
 "spark.kafka.consumer.cache.capacity": "8192"
 "spark.kubernetes.memoryOverheadFactor": "0.3"
   deps:
   jars:
 - my
 - jar
 - list
   hadoopConfigMap: hdfs-config

   driver:
 cores: 4
 memory: 12g
 labels:
   version: 3.0.1
 serviceAccount: default
 javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

  executor:
 instances: 4
    cores: 4
    memory: 16g
    labels:
  version: 3.0.1
    javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

```

 

This is the driver log:

```

20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable

... // my app log, it's a structured streaming app reading from kafka and 
writing to hdfs

20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed (this is expected if the application is shutting down.)
io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
version: 1574101276 (1574213896)
 at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
 at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
 at 
okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
 at 
okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
 at okhttp3.internal

[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

2020-11-04 Thread Nicola Bova (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicola Bova updated SPARK-33349:

Description: 
I launch my spark application with the 
[spark-on-kubernetes-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)
 with the following yaml file:

{code:yaml}
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
   name: spark-kafka-streamer-test
   namespace: kafka2hdfs
spec: 
   type: Scala
   mode: cluster
   image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
   imagePullPolicy: Always
   timeToLiveSeconds: 259200
   mainClass: path.to.my.class.KafkaStreamer
   mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
   sparkVersion: 3.0.1
   restartPolicy:
 type: Always
   sparkConf:
 "spark.kafka.consumer.cache.capacity": "8192"
 "spark.kubernetes.memoryOverheadFactor": "0.3"
   deps:
   jars:
 - my
 - jar
 - list
   hadoopConfigMap: hdfs-config

   driver:
 cores: 4
 memory: 12g
 labels:
   version: 3.0.1
 serviceAccount: default
 javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

  executor:
 instances: 4
    cores: 4
    memory: 16g
    labels:
  version: 3.0.1
    javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
{code}

 

This is the driver log:

```

20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable

... // my app log, it's a structured streaming app reading from kafka and 
writing to hdfs

20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed (this is expected if the application is shutting down.)
io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
version: 1574101276 (1574213896)
 at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
 at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
 at 
okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
 at 
okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
 at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
 at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
 at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
 at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.base/java.lang.Thread.run(Unknown Source)

```

The error above appears after roughly 50 minutes.

After the exception above, no more logs are produced and the app hangs.

  was:
I launch my spark application with the 
[spark-on-kubernetes-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)
 with the following yaml file:

```
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
   name: spark-kafka-streamer-test
   namespace: kafka2hdfs
spec: 
   type: Scala
   mode: cluster
   image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
   imagePullPolicy: Always
   timeToLiveSeconds: 259200
   mainClass: path.to.my.class.KafkaStreamer
   mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
   sparkVersion: 3.0.1
   restartPolicy:
 type: Always
   sparkConf:
 "spark.kafka.consumer.cache.capacity": "8192"
 "spark.kubernetes.memoryOverheadFactor": "0.3"
   deps:
   jars:
 - my
 - jar
 - list
   hadoopConfigMap: hdfs-config

   driver:
 cores: 4
 memory: 12g
 labels:
   version: 3.0.1
 serviceAccount: default
 javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

  executor:
 instances: 4
    cores: 4
    memory: 16g
    labels:
  version: 3.0.1
    javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

```

 

This is the driver log:

```

20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable

... // my app log, it's a structured streaming app reading from kafka and 
writing to hdfs

20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed (this is expected if the application is shutting down.)
io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
version: 1574101276 (1574213896)
 at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
 at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
 at 
okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
 at 
okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
 at okhttp3.

[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

2020-11-04 Thread Nicola Bova (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicola Bova updated SPARK-33349:

Description: 
I launch my spark application with the 
[spark-on-kubernetes-operator]([https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)]
 with the following yaml file:

```
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
   name: spark-kafka-streamer-test
   namespace: kafka2hdfs
spec: 
   type: Scala
   mode: cluster
   image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
   imagePullPolicy: Always
   timeToLiveSeconds: 259200
   mainClass: path.to.my.class.KafkaStreamer
   mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
   sparkVersion: 3.0.1
   restartPolicy:
 type: Always
   sparkConf:
 "spark.kafka.consumer.cache.capacity": "8192"
 "spark.kubernetes.memoryOverheadFactor": "0.3"
   deps:
   jars:
 - my
 - jar
 - list
   hadoopConfigMap: hdfs-config

   driver:
 cores: 4
 memory: 12g
 labels:
   version: 3.0.1
 serviceAccount: default
 javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

  executor:
 instances: 4
    cores: 4
    memory: 16g
    labels:
  version: 3.0.1
    javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

```

 

This is the driver log:

```

20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable

... // my app log, it's a structured streaming app reading from kafka and 
writing to hdfs

20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed (this is expected if the application is shutting down.)
io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
version: 1574101276 (1574213896)
 at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
 at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
 at 
okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
 at 
okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
 at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
 at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
 at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
 at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.base/java.lang.Thread.run(Unknown Source)

```

The error above appears after roughly 50 minutes.

After the exception above, no more logs are produced and the app hangs.

  was:
I launch my spark application with the 
[spark-on-kubernetes-operator]([https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)]
 with the following yaml file:

```

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
   name: spark-kafka-streamer-test
   namespace: kafka2hdfs
spec: 
   type: Scala
   mode: cluster
   image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
   imagePullPolicy: Always
   timeToLiveSeconds: 259200
   mainClass: path.to.my.class.KafkaStreamer
   mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
   sparkVersion: 3.0.1
   restartPolicy:
 type: Always
   sparkConf:
 "spark.kafka.consumer.cache.capacity": "8192"
 "spark.kubernetes.memoryOverheadFactor": "0.3"
   deps:
   jars:
 - my
 - jar
 - list
   hadoopConfigMap: hdfs-config

   driver:
 cores: 4
 memory: 12g
 labels:
   version: 3.0.1
 serviceAccount: default
 javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

  executor:
 instances: 4
    cores: 4
    memory: 16g
    labels:
  version: 3.0.1
    javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

```

 

This is the driver log:

```

20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable

... // my app log, it's a structured streaming app reading from kafka and 
writing to hdfs

20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed (this is expected if the application is shutting down.)
io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
version: 1574101276 (1574213896)
 at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
 at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
 at 
okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
 at 
okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
 at okhttp3.inter

[jira] [Created] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

2020-11-04 Thread Nicola Bova (Jira)

Nicola Bova created SPARK-33349:
---

 Summary: ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed
 Key: SPARK-33349
 URL: https://issues.apache.org/jira/browse/SPARK-33349
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.0.1, 3.0.2
Reporter: Nicola Bova


I launch my spark application with the 
[spark-on-kubernetes-operator]([https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)]
 with the following yaml file:

```

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
   name: spark-kafka-streamer-test
   namespace: kafka2hdfs
spec: 
   type: Scala
   mode: cluster
   image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
   imagePullPolicy: Always
   timeToLiveSeconds: 259200
   mainClass: path.to.my.class.KafkaStreamer
   mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
   sparkVersion: 3.0.1
   restartPolicy:
 type: Always
   sparkConf:
 "spark.kafka.consumer.cache.capacity": "8192"
 "spark.kubernetes.memoryOverheadFactor": "0.3"
   deps:
   jars:
 - my
 - jar
 - list
   hadoopConfigMap: hdfs-config

   driver:
 cores: 4
 memory: 12g
 labels:
   version: 3.0.1
 serviceAccount: default
 javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

  executor:
 instances: 4
    cores: 4
    memory: 16g
    labels:
  version: 3.0.1
    javaOptions: 
"-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"

```

 

This is the driver log:

```

20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable

... // my app log, it's a structured streaming app reading from kafka and 
writing to hdfs

20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed (this is expected if the application is shutting down.)
io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
version: 1574101276 (1574213896)
 at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
 at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
 at 
okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
 at 
okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
 at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
 at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
 at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
 at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.base/java.lang.Thread.run(Unknown Source)

```

The error above appears after roughly 50 minutes.

After the exception above, no more logs are produced and the app hangs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33348) Use scala.jdk.CollectionConverters replace scala.collection.JavaConverters

2020-11-04 Thread Yang Jie (Jira)

Yang Jie created SPARK-33348:


 Summary: Use scala.jdk.CollectionConverters replace 
scala.collection.JavaConverters
 Key: SPARK-33348
 URL: https://issues.apache.org/jira/browse/SPARK-33348
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 3.1.0
Reporter: Yang Jie


`scala.collection.JavaConverters` is deprecated in Scala 2.13, there are many 
compilation warnings about this, should use `scala.jdk.CollectionConverters` 
replace it. 

But `scala.jdk.CollectionConverters` only available in Scala 2.13.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33343) Fix the build with sbt to copy hadoop-client-runtime.jar

2020-11-04 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-33343:
---
Description: 
With the current master, spark-shell doesn't work if it's built with sbt 
package.
It's due to hadoop-client-runtime.jar isn't copied to 
assembly/target/scala-2.12/jars.
{code}
$ bin/spark-shell
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper
at 
org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426)
at 
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
at scala.Option.getOrElse(Option.scala:189)
at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 11 more
{code}

  was:
With the current master, spark-shell doesn't work if it's built with sbt.
It's due to hadoop-client-runtime.jar isn't copied to 
assembly/target/scala-2.12/jars.
{code}
$ bin/spark-shell
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper
at 
org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426)
at 
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
at scala.Option.getOrElse(Option.scala:189)
at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 11 more
{code}


> Fix the build with sbt to copy hadoop-client-runtime.jar
> 
>
> Key: SPARK-33343
> URL: https://issues.apache.org/jira/browse/SPARK-33343
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> With the current master, spark-shell doesn't work if it's built with sbt 
> package.
> It's due to hadoop-client-runtime.jar isn't copied to 
> assembly/target/scala-2.12/jars.
> {code}
> $ bin/spark-shell
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426)
>   at 
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   at org.apache.spark.deploy

[jira] [Updated] (SPARK-33343) Fix the build with sbt to copy hadoop-client-runtime.jar

2020-11-04 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-33343:
---
Priority: Major  (was: Critical)

> Fix the build with sbt to copy hadoop-client-runtime.jar
> 
>
> Key: SPARK-33343
> URL: https://issues.apache.org/jira/browse/SPARK-33343
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> With the current master, spark-shell doesn't work if it's built with sbt.
> It's due to hadoop-client-runtime.jar isn't copied to 
> assembly/target/scala-2.12/jars.
> {code}
> $ bin/spark-shell
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426)
>   at 
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   ... 11 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33338) GROUP BY using literal map should not fail

2020-11-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-8?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-8.
---
Fix Version/s: 2.4.8
   3.0.2
   3.1.0
   Resolution: Fixed

Issue resolved by pull request 30246
[https://github.com/apache/spark/pull/30246]

> GROUP BY using literal map should not fail
> --
>
> Key: SPARK-8
> URL: https://issues.apache.org/jira/browse/SPARK-8
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.7, 3.0.1, 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0, 3.0.2, 2.4.8
>
>
> Apache Spark 2.x ~ 3.0.1 raise`RuntimeException` for the following queries.
> *SQL*
> {code}
> CREATE TABLE t USING ORC AS SELECT map('k1', 'v1') m, 'k1' k
> SELECT map('k1', 'v1')[k] FROM t GROUP BY 1
> SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k]
> SELECT map('k1', 'v1')[k] a FROM t GROUP BY a
> {code}
> *ERROR*
> {code}
> Caused by: java.lang.RuntimeException: Couldn't find k#3 in [keys: [k1], 
> values: [v1][k#3]#6]
>   at scala.sys.package$.error(package.scala:27)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:85)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:79)
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
> {code}
> This is a regression from Apache Spark 1.6.x.
> {code}
> scala> sc.version
> res1: String = 1.6.3
> scala> sqlContext.sql("SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 
> 'v1')[k]").show
> +---+
> |_c0|
> +---+
> | v1|
> +---+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33347) Clean up useless variables in MutableApplicationInfo

2020-11-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33347:


Assignee: (was: Apache Spark)

> Clean up useless variables in MutableApplicationInfo
> 
>
> Key: SPARK-33347
> URL: https://issues.apache.org/jira/browse/SPARK-33347
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Major
>
>  
> {code:java}
> private class MutableApplicationInfo {
>   var id: String = null
>   var name: String = null
>   var coresGranted: Option[Int] = None
>   var maxCores: Option[Int] = None
>   var coresPerExecutor: Option[Int] = None
>   var memoryPerExecutorMB: Option[Int] = None
>   def toView(): ApplicationInfoWrapper = {
> val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, 
> coresPerExecutor,
>   memoryPerExecutorMB, Nil)
> new ApplicationInfoWrapper(apiInfo, List(attempt.toView()))
>   }
> }
> {code}
>  
> coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None 
> and never reassign



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33347) Clean up useless variables in MutableApplicationInfo

2020-11-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226265#comment-17226265
 ] 

Apache Spark commented on SPARK-33347:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30251

> Clean up useless variables in MutableApplicationInfo
> 
>
> Key: SPARK-33347
> URL: https://issues.apache.org/jira/browse/SPARK-33347
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Major
>
>  
> {code:java}
> private class MutableApplicationInfo {
>   var id: String = null
>   var name: String = null
>   var coresGranted: Option[Int] = None
>   var maxCores: Option[Int] = None
>   var coresPerExecutor: Option[Int] = None
>   var memoryPerExecutorMB: Option[Int] = None
>   def toView(): ApplicationInfoWrapper = {
> val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, 
> coresPerExecutor,
>   memoryPerExecutorMB, Nil)
> new ApplicationInfoWrapper(apiInfo, List(attempt.toView()))
>   }
> }
> {code}
>  
> coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None 
> and never reassign



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33347) Clean up useless variables in MutableApplicationInfo

2020-11-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33347:


Assignee: Apache Spark

> Clean up useless variables in MutableApplicationInfo
> 
>
> Key: SPARK-33347
> URL: https://issues.apache.org/jira/browse/SPARK-33347
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
>  
> {code:java}
> private class MutableApplicationInfo {
>   var id: String = null
>   var name: String = null
>   var coresGranted: Option[Int] = None
>   var maxCores: Option[Int] = None
>   var coresPerExecutor: Option[Int] = None
>   var memoryPerExecutorMB: Option[Int] = None
>   def toView(): ApplicationInfoWrapper = {
> val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, 
> coresPerExecutor,
>   memoryPerExecutorMB, Nil)
> new ApplicationInfoWrapper(apiInfo, List(attempt.toView()))
>   }
> }
> {code}
>  
> coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None 
> and never reassign



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33347) Clean up useless variables in MutableApplicationInfo

2020-11-04 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-33347:
-
Description: 
 
{code:java}
private class MutableApplicationInfo {
  var id: String = null
  var name: String = null
  var coresGranted: Option[Int] = None
  var maxCores: Option[Int] = None
  var coresPerExecutor: Option[Int] = None
  var memoryPerExecutorMB: Option[Int] = None

  def toView(): ApplicationInfoWrapper = {
val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, 
coresPerExecutor,
  memoryPerExecutorMB, Nil)
new ApplicationInfoWrapper(apiInfo, List(attempt.toView()))
  }

}
{code}
 

coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None 
and never reassign

  was:
 
{code:java}
private class MutableApplicationInfo {
  var id: String = null
  var name: String = null
  var coresGranted: Option[Int] = None
  var maxCores: Option[Int] = None
  var coresPerExecutor: Option[Int] = None
  var memoryPerExecutorMB: Option[Int] = None

  def toView(): ApplicationInfoWrapper = {
val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, 
coresPerExecutor,
  memoryPerExecutorMB, Nil)
new ApplicationInfoWrapper(apiInfo, List(attempt.toView()))
  }

}
{code}
 

coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None 
and no place to reassign


> Clean up useless variables in MutableApplicationInfo
> 
>
> Key: SPARK-33347
> URL: https://issues.apache.org/jira/browse/SPARK-33347
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Major
>
>  
> {code:java}
> private class MutableApplicationInfo {
>   var id: String = null
>   var name: String = null
>   var coresGranted: Option[Int] = None
>   var maxCores: Option[Int] = None
>   var coresPerExecutor: Option[Int] = None
>   var memoryPerExecutorMB: Option[Int] = None
>   def toView(): ApplicationInfoWrapper = {
> val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, 
> coresPerExecutor,
>   memoryPerExecutorMB, Nil)
> new ApplicationInfoWrapper(apiInfo, List(attempt.toView()))
>   }
> }
> {code}
>  
> coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None 
> and never reassign



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33347) Clean up useless variables in MutableApplicationInfo

2020-11-04 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-33347:
-
Description: 
 
{code:java}
private class MutableApplicationInfo {
  var id: String = null
  var name: String = null
  var coresGranted: Option[Int] = None
  var maxCores: Option[Int] = None
  var coresPerExecutor: Option[Int] = None
  var memoryPerExecutorMB: Option[Int] = None

  def toView(): ApplicationInfoWrapper = {
val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, 
coresPerExecutor,
  memoryPerExecutorMB, Nil)
new ApplicationInfoWrapper(apiInfo, List(attempt.toView()))
  }

}
{code}
 

coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None 
and no place to reassign

  was:
 
{code:java}
private class MutableApplicationInfo {
  var id: String = null
  var name: String = null
  var coresGranted: Option[Int] = None
  var maxCores: Option[Int] = None
  var coresPerExecutor: Option[Int] = None
  var memoryPerExecutorMB: Option[Int] = None

  def toView(): ApplicationInfoWrapper = {
val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, 
coresPerExecutor,
  memoryPerExecutorMB, Nil)
new ApplicationInfoWrapper(apiInfo, List(attempt.toView()))
  }

}
{code}
 

coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None no 
place to reassign


> Clean up useless variables in MutableApplicationInfo
> 
>
> Key: SPARK-33347
> URL: https://issues.apache.org/jira/browse/SPARK-33347
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Major
>
>  
> {code:java}
> private class MutableApplicationInfo {
>   var id: String = null
>   var name: String = null
>   var coresGranted: Option[Int] = None
>   var maxCores: Option[Int] = None
>   var coresPerExecutor: Option[Int] = None
>   var memoryPerExecutorMB: Option[Int] = None
>   def toView(): ApplicationInfoWrapper = {
> val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, 
> coresPerExecutor,
>   memoryPerExecutorMB, Nil)
> new ApplicationInfoWrapper(apiInfo, List(attempt.toView()))
>   }
> }
> {code}
>  
> coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None 
> and no place to reassign



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33347) Clean up useless variables in MutableApplicationInfo

2020-11-04 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-33347:
-
Description: 
 
{code:java}
private class MutableApplicationInfo {
  var id: String = null
  var name: String = null
  var coresGranted: Option[Int] = None
  var maxCores: Option[Int] = None
  var coresPerExecutor: Option[Int] = None
  var memoryPerExecutorMB: Option[Int] = None

  def toView(): ApplicationInfoWrapper = {
val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, 
coresPerExecutor,
  memoryPerExecutorMB, Nil)
new ApplicationInfoWrapper(apiInfo, List(attempt.toView()))
  }

}
{code}
 

coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None no 
place to reassign

  was:
 
{code:java}
private class MutableApplicationInfo {
  var id: String = null
  var name: String = null
  var coresGranted: Option[Int] = None
  var maxCores: Option[Int] = None
  var coresPerExecutor: Option[Int] = None
  var memoryPerExecutorMB: Option[Int] = None

  def toView(): ApplicationInfoWrapper = {
val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, 
coresPerExecutor,
  memoryPerExecutorMB, Nil)
new ApplicationInfoWrapper(apiInfo, List(attempt.toView()))
  }

}
{code}
 

coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB alway None no 
place to reassign


> Clean up useless variables in MutableApplicationInfo
> 
>
> Key: SPARK-33347
> URL: https://issues.apache.org/jira/browse/SPARK-33347
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Major
>
>  
> {code:java}
> private class MutableApplicationInfo {
>   var id: String = null
>   var name: String = null
>   var coresGranted: Option[Int] = None
>   var maxCores: Option[Int] = None
>   var coresPerExecutor: Option[Int] = None
>   var memoryPerExecutorMB: Option[Int] = None
>   def toView(): ApplicationInfoWrapper = {
> val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, 
> coresPerExecutor,
>   memoryPerExecutorMB, Nil)
> new ApplicationInfoWrapper(apiInfo, List(attempt.toView()))
>   }
> }
> {code}
>  
> coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None 
> no place to reassign



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33347) Clean up useless variables in MutableApplicationInfo

2020-11-04 Thread Yang Jie (Jira)

Yang Jie created SPARK-33347:


 Summary: Clean up useless variables in MutableApplicationInfo
 Key: SPARK-33347
 URL: https://issues.apache.org/jira/browse/SPARK-33347
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.1.0
Reporter: Yang Jie


 
{code:java}
private class MutableApplicationInfo {
  var id: String = null
  var name: String = null
  var coresGranted: Option[Int] = None
  var maxCores: Option[Int] = None
  var coresPerExecutor: Option[Int] = None
  var memoryPerExecutorMB: Option[Int] = None

  def toView(): ApplicationInfoWrapper = {
val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, 
coresPerExecutor,
  memoryPerExecutorMB, Nil)
new ApplicationInfoWrapper(apiInfo, List(attempt.toView()))
  }

}
{code}
 

coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB alway None no 
place to reassign



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33346) Change the never changed var to val

2020-11-04 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226255#comment-17226255
 ] 

Yang Jie commented on SPARK-33346:
--

Above case still can't use as val, also throw illegal access error at runtime 
now

> Change the never changed var to val
> ---
>
> Key: SPARK-33346
> URL: https://issues.apache.org/jira/browse/SPARK-33346
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Minor
>
> Some local variables are declared as "var", but they are never reassigned and 
> should be declared as "val".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33285) Too many "Auto-application to `()` is deprecated." related compilation warnings

2020-11-04 Thread Guillaume Martres (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226251#comment-17226251
 ] 

Guillaume Martres commented on SPARK-33285:
---

Note that Scala 2.13 has a configurable warning mechanism, making it possible 
to hide some warnings: [https://github.com/scala/scala/pull/8373,] this can be 
combined with {{-Xfatal-warnings}} to enforce a warning-free build without 
actually fixing all warnings.

> Too many "Auto-application to `()` is deprecated."  related compilation 
> warnings
> 
>
> Key: SPARK-33285
> URL: https://issues.apache.org/jira/browse/SPARK-33285
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Minor
>
> There are too many  "Auto-application to `()` is deprecated." related 
> compilation warnings when compile with Scala 2.13 like
> {code:java}
> [WARNING] [Warn] 
> /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: 
> Auto-application to `()` is deprecated. Supply the empty argument list `()` 
> explicitly to invoke method stdev,
> or remove the empty argument list from its definition (Java-defined methods 
> are exempt).
> In Scala 3, an unapplied method like this will be eta-expanded into a 
> function.
> {code}
> A lot of them, but it's easy to fix.
> If there is a definition as follows:
> {code:java}
> Class Foo {
>def bar(): Unit = {}
> }
> val foo = new Foo{code}
> Should be
> {code:java}
> foo.bar()
> {code}
> not
> {code:java}
> foo.bar {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33285) Too many "Auto-application to `()` is deprecated." related compilation warnings

2020-11-04 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226250#comment-17226250
 ] 

Yang Jie commented on SPARK-33285:
--

{quote}Replacing {{'foo}} by {{Symbol("foo")}} will get rid of the warning and 
is compatible with all Scala versions.
{quote}
 

[~smarter] , You're right. :)

> Too many "Auto-application to `()` is deprecated."  related compilation 
> warnings
> 
>
> Key: SPARK-33285
> URL: https://issues.apache.org/jira/browse/SPARK-33285
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Minor
>
> There are too many  "Auto-application to `()` is deprecated." related 
> compilation warnings when compile with Scala 2.13 like
> {code:java}
> [WARNING] [Warn] 
> /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: 
> Auto-application to `()` is deprecated. Supply the empty argument list `()` 
> explicitly to invoke method stdev,
> or remove the empty argument list from its definition (Java-defined methods 
> are exempt).
> In Scala 3, an unapplied method like this will be eta-expanded into a 
> function.
> {code}
> A lot of them, but it's easy to fix.
> If there is a definition as follows:
> {code:java}
> Class Foo {
>def bar(): Unit = {}
> }
> val foo = new Foo{code}
> Should be
> {code:java}
> foo.bar()
> {code}
> not
> {code:java}
> foo.bar {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33285) Too many "Auto-application to `()` is deprecated." related compilation warnings

2020-11-04 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226248#comment-17226248
 ] 

Yang Jie commented on SPARK-33285:
--

[~srowen] Yes, "Auto-application to `()` is deprecated."  warnings will cover 
up other warnings because there are too many, but some of them are known, and 
I've add other JIRA

> Too many "Auto-application to `()` is deprecated."  related compilation 
> warnings
> 
>
> Key: SPARK-33285
> URL: https://issues.apache.org/jira/browse/SPARK-33285
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Minor
>
> There are too many  "Auto-application to `()` is deprecated." related 
> compilation warnings when compile with Scala 2.13 like
> {code:java}
> [WARNING] [Warn] 
> /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: 
> Auto-application to `()` is deprecated. Supply the empty argument list `()` 
> explicitly to invoke method stdev,
> or remove the empty argument list from its definition (Java-defined methods 
> are exempt).
> In Scala 3, an unapplied method like this will be eta-expanded into a 
> function.
> {code}
> A lot of them, but it's easy to fix.
> If there is a definition as follows:
> {code:java}
> Class Foo {
>def bar(): Unit = {}
> }
> val foo = new Foo{code}
> Should be
> {code:java}
> foo.bar()
> {code}
> not
> {code:java}
> foo.bar {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29392) Remove use of deprecated symbol literal " 'name " syntax in favor Symbol("name")

2020-11-04 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226244#comment-17226244
 ] 

Yang Jie commented on SPARK-29392:
--

OK

> Remove use of deprecated symbol literal " 'name " syntax in favor 
> Symbol("name")
> 
>
> Key: SPARK-29392
> URL: https://issues.apache.org/jira/browse/SPARK-29392
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 3.0.0
>
>
> Example:
> {code}
> [WARNING] [Warn] 
> /Users/seanowen/Documents/spark_2.13/core/src/test/scala/org/apache/spark/memory/UnifiedMemoryManagerSuite.scala:308:
>  symbol literal is deprecated; use Symbol("assertInvariants") instead
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33341) Remove unnecessary semicolons

2020-11-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33341.
--
Resolution: Won't Fix

> Remove unnecessary semicolons
> -
>
> Key: SPARK-33341
> URL: https://issues.apache.org/jira/browse/SPARK-33341
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Minor
>
> There are some unnecessary semicolons in Spark code because Scala doesn't 
> really need them, to unify the style, we should remove it 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29392) Remove use of deprecated symbol literal " 'name " syntax in favor Symbol("name")

2020-11-04 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226230#comment-17226230
 ] 

Sean R. Owen commented on SPARK-29392:
--

For this one, we already started I suppose, so OK to finish.

> Remove use of deprecated symbol literal " 'name " syntax in favor 
> Symbol("name")
> 
>
> Key: SPARK-29392
> URL: https://issues.apache.org/jira/browse/SPARK-29392
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 3.0.0
>
>
> Example:
> {code}
> [WARNING] [Warn] 
> /Users/seanowen/Documents/spark_2.13/core/src/test/scala/org/apache/spark/memory/UnifiedMemoryManagerSuite.scala:308:
>  symbol literal is deprecated; use Symbol("assertInvariants") instead
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33346) Change the never changed var to val

2020-11-04 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226228#comment-17226228
 ] 

Yang Jie edited comment on SPARK-33346 at 11/4/20, 3:25 PM:


There are some code and comments as follow:
{code:java}
// They also should have been val's. We use var's because there is a Scala 
compiler bug that
// would throw illegal access error at runtime if they are declared as val's.
protected var grow = (newCapacity: Int) => {
  _oldValues = _values
  _values = new Array[V](newCapacity)
}

protected var move = (oldPos: Int, newPos: Int) => {
  _values(newPos) = _oldValues(oldPos)
}
{code}
Need to test whether the current version of Scala（2.12 & 2.13） still has this 
problem, No additional information was found in the original pr


was (Author: luciferyang):
There are some code and comments as follow:
{code:java}
// They also should have been val's. We use var's because there is a Scala 
compiler bug that
// would throw illegal access error at runtime if they are declared as val's.
protected var grow = (newCapacity: Int) => {
  _oldValues = _values
  _values = new Array[V](newCapacity)
}

protected var move = (oldPos: Int, newPos: Int) => {
  _values(newPos) = _oldValues(oldPos)
}
{code}
Need to test whether the current version of Scala（2.12 & 2.13） still has this 
problem

> Change the never changed var to val
> ---
>
> Key: SPARK-33346
> URL: https://issues.apache.org/jira/browse/SPARK-33346
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Minor
>
> Some local variables are declared as "var", but they are never reassigned and 
> should be declared as "val".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33346) Change the never changed var to val

2020-11-04 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226228#comment-17226228
 ] 

Yang Jie commented on SPARK-33346:
--

There are some code and comments as follow:
{code:java}
// They also should have been val's. We use var's because there is a Scala 
compiler bug that
// would throw illegal access error at runtime if they are declared as val's.
protected var grow = (newCapacity: Int) => {
  _oldValues = _values
  _values = new Array[V](newCapacity)
}

protected var move = (oldPos: Int, newPos: Int) => {
  _values(newPos) = _oldValues(oldPos)
}
{code}
Need to test whether the current version of Scala（2.12 & 2.13） still has this 
problem

> Change the never changed var to val
> ---
>
> Key: SPARK-33346
> URL: https://issues.apache.org/jira/browse/SPARK-33346
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Minor
>
> Some local variables are declared as "var", but they are never reassigned and 
> should be declared as "val".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33285) Too many "Auto-application to `()` is deprecated." related compilation warnings

2020-11-04 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226189#comment-17226189
 ] 

Sean R. Owen commented on SPARK-33285:
--

I think it's OK to fix the Symbol issue (we already started that; it's 
separate). For this, it's such a big change right now that I'm neutral. If it's 
making it hard to detect real other warnings to fix, maybe.

> Too many "Auto-application to `()` is deprecated."  related compilation 
> warnings
> 
>
> Key: SPARK-33285
> URL: https://issues.apache.org/jira/browse/SPARK-33285
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Minor
>
> There are too many  "Auto-application to `()` is deprecated." related 
> compilation warnings when compile with Scala 2.13 like
> {code:java}
> [WARNING] [Warn] 
> /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: 
> Auto-application to `()` is deprecated. Supply the empty argument list `()` 
> explicitly to invoke method stdev,
> or remove the empty argument list from its definition (Java-defined methods 
> are exempt).
> In Scala 3, an unapplied method like this will be eta-expanded into a 
> function.
> {code}
> A lot of them, but it's easy to fix.
> If there is a definition as follows:
> {code:java}
> Class Foo {
>def bar(): Unit = {}
> }
> val foo = new Foo{code}
> Should be
> {code:java}
> foo.bar()
> {code}
> not
> {code:java}
> foo.bar {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33346) Change the never changed var to val

2020-11-04 Thread Yang Jie (Jira)

Yang Jie created SPARK-33346:


 Summary: Change the never changed var to val
 Key: SPARK-33346
 URL: https://issues.apache.org/jira/browse/SPARK-33346
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, SQL
Affects Versions: 3.1.0
Reporter: Yang Jie


Some local variables are declared as "var", but they are never reassigned and 
should be declared as "val".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33345) Batch fix compilation warnings about "Widening conversion from XXX to XXX is deprecated"

2020-11-04 Thread Yang Jie (Jira)

Yang Jie created SPARK-33345:


 Summary:  Batch fix compilation warnings about "Widening 
conversion from XXX to XXX is deprecated"
 Key: SPARK-33345
 URL: https://issues.apache.org/jira/browse/SPARK-33345
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 3.1.0
Reporter: Yang Jie


There is a batch of compilation warnings in Scala 2.13 as follows:
{code:java}
[WARNING] [Warn] 
/spark/core/src/main/scala/org/apache/spark/input/FixedLengthBinaryInputFormat.scala:77:
 Widening conversion from Long to Double is deprecated because it loses 
precision. Write `.toDouble` instead.
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33344) Fix Compilation warings of "multiarg infix syntax looks like a tuple and will be deprecated" in Scala 2.13

2020-11-04 Thread Yang Jie (Jira)

Yang Jie created SPARK-33344:


 Summary: Fix Compilation warings of "multiarg infix syntax looks 
like a tuple and will be deprecated" in Scala 2.13
 Key: SPARK-33344
 URL: https://issues.apache.org/jira/browse/SPARK-33344
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 3.1.0
Reporter: Yang Jie


There is a batch of compilation warnings in Scala 2.13 as follow:
{code:java}
[WARNING] [Warn] 
/spark/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala:656: 
multiarg infix syntax looks like a tuple and will be deprecated
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33343) Fix the build with sbt to copy hadoop-client-runtime.jar

2020-11-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226080#comment-17226080
 ] 

Apache Spark commented on SPARK-33343:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/30250

> Fix the build with sbt to copy hadoop-client-runtime.jar
> 
>
> Key: SPARK-33343
> URL: https://issues.apache.org/jira/browse/SPARK-33343
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Critical
>
> With the current master, spark-shell doesn't work if it's built with sbt.
> It's due to hadoop-client-runtime.jar isn't copied to 
> assembly/target/scala-2.12/jars.
> {code}
> $ bin/spark-shell
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426)
>   at 
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   ... 11 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33343) Fix the build with sbt to copy hadoop-client-runtime.jar

2020-11-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33343:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Fix the build with sbt to copy hadoop-client-runtime.jar
> 
>
> Key: SPARK-33343
> URL: https://issues.apache.org/jira/browse/SPARK-33343
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Critical
>
> With the current master, spark-shell doesn't work if it's built with sbt.
> It's due to hadoop-client-runtime.jar isn't copied to 
> assembly/target/scala-2.12/jars.
> {code}
> $ bin/spark-shell
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426)
>   at 
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   ... 11 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33343) Fix the build with sbt to copy hadoop-client-runtime.jar

2020-11-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33343:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Fix the build with sbt to copy hadoop-client-runtime.jar
> 
>
> Key: SPARK-33343
> URL: https://issues.apache.org/jira/browse/SPARK-33343
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Critical
>
> With the current master, spark-shell doesn't work if it's built with sbt.
> It's due to hadoop-client-runtime.jar isn't copied to 
> assembly/target/scala-2.12/jars.
> {code}
> $ bin/spark-shell
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426)
>   at 
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   ... 11 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33343) Fix the build with sbt to copy hadoop-client-runtime.jar

2020-11-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226079#comment-17226079
 ] 

Apache Spark commented on SPARK-33343:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/30250

> Fix the build with sbt to copy hadoop-client-runtime.jar
> 
>
> Key: SPARK-33343
> URL: https://issues.apache.org/jira/browse/SPARK-33343
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Critical
>
> With the current master, spark-shell doesn't work if it's built with sbt.
> It's due to hadoop-client-runtime.jar isn't copied to 
> assembly/target/scala-2.12/jars.
> {code}
> $ bin/spark-shell
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426)
>   at 
> org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   ... 11 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33343) Fix the build with sbt to copy hadoop-client-runtime.jar

2020-11-04 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-33343:
--

 Summary: Fix the build with sbt to copy hadoop-client-runtime.jar
 Key: SPARK-33343
 URL: https://issues.apache.org/jira/browse/SPARK-33343
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.1.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


With the current master, spark-shell doesn't work if it's built with sbt.
It's due to hadoop-client-runtime.jar isn't copied to 
assembly/target/scala-2.12/jars.
{code}
$ bin/spark-shell
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper
at 
org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426)
at 
org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342)
at scala.Option.getOrElse(Option.scala:189)
at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 11 more
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23086) Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog

2020-11-04 Thread gaofeng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226069#comment-17226069
 ] 

gaofeng commented on SPARK-23086:
-

I also encountered this problem in the production environment. it's 
emergency.Please help me . Thank you very much!:)

> Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog
> --
>
> Key: SPARK-23086
> URL: https://issues.apache.org/jira/browse/SPARK-23086
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: * Spark 2.2.1
>Reporter: pin_zhang
>Priority: Major
>  Labels: bulk-closed
>
> * Hive metastore is mysql
> * Set hive.server2.thrift.max.worker.threads=500
> create table test (id string ) partitioned by (index int) stored as  
> parquet;
> insert into test  partition (index=1) values('id1');
>  * 100 Clients run SQL“select * from table” on table
>  * Many clients (97%) blocked at HiveExternalCatalog.withClient
>  * Is synchronized expected when only run query against tables?   
> "pool-21-thread-65" #1178 prio=5 os_prio=0 tid=0x2aaac8e06800 nid=0x1e70 
> waiting for monitor entry [0x4e19a000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   - waiting to lock <0xc06a3ba8> (a 
> org.apache.spark.sql.hive.HiveExternalCatalog)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:674)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:667)
>   - locked <0xc41ab748> (a 
> org.apache.spark.sql.hive.HiveSessionCatalog)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:646)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:601)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:631)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:624)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:61)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:59)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:624)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:570)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
>   at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
>   at scala.collection.immutable.List.foldLeft(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)
>   - locked <0xff491c48> (a 
> org.apache.spark.sql.execution.QueryExecution)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryE

[jira] [Commented] (SPARK-29392) Remove use of deprecated symbol literal " 'name " syntax in favor Symbol("name")

2020-11-04 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226070#comment-17226070
 ] 

Yang Jie commented on SPARK-29392:
--

I can take the time to fix them by module, but is this the right time, or do I 
have to wait until Scala 2.13 becomes the default option?

> Remove use of deprecated symbol literal " 'name " syntax in favor 
> Symbol("name")
> 
>
> Key: SPARK-29392
> URL: https://issues.apache.org/jira/browse/SPARK-29392
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 3.0.0
>
>
> Example:
> {code}
> [WARNING] [Warn] 
> /Users/seanowen/Documents/spark_2.13/core/src/test/scala/org/apache/spark/memory/UnifiedMemoryManagerSuite.scala:308:
>  symbol literal is deprecated; use Symbol("assertInvariants") instead
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23086) Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog

2020-11-04 Thread gaofeng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226067#comment-17226067
 ] 

gaofeng commented on SPARK-23086:
-

hi?how to resolve this problem

> Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog
> --
>
> Key: SPARK-23086
> URL: https://issues.apache.org/jira/browse/SPARK-23086
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: * Spark 2.2.1
>Reporter: pin_zhang
>Priority: Major
>  Labels: bulk-closed
>
> * Hive metastore is mysql
> * Set hive.server2.thrift.max.worker.threads=500
> create table test (id string ) partitioned by (index int) stored as  
> parquet;
> insert into test  partition (index=1) values('id1');
>  * 100 Clients run SQL“select * from table” on table
>  * Many clients (97%) blocked at HiveExternalCatalog.withClient
>  * Is synchronized expected when only run query against tables?   
> "pool-21-thread-65" #1178 prio=5 os_prio=0 tid=0x2aaac8e06800 nid=0x1e70 
> waiting for monitor entry [0x4e19a000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   - waiting to lock <0xc06a3ba8> (a 
> org.apache.spark.sql.hive.HiveExternalCatalog)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:674)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:667)
>   - locked <0xc41ab748> (a 
> org.apache.spark.sql.hive.HiveSessionCatalog)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:646)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:601)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:631)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:624)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:61)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:59)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:624)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:570)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
>   at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
>   at scala.collection.immutable.List.foldLeft(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)
>   - locked <0xff491c48> (a 
> org.apache.spark.sql.execution.QueryExecution)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:67)
>

[jira] [Updated] (SPARK-33256) Update contribution guide about NumPy documentation style

2020-11-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33256:
-
Description: 
We should document that PySpark uses NumPy documentation style.
See also https://github.com/apache/spark/pull/30181#discussion_r517314341

  was:We should document that PySpark uses NumPy documentation style.


> Update contribution guide about NumPy documentation style
> -
>
> Key: SPARK-33256
> URL: https://issues.apache.org/jira/browse/SPARK-33256
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We should document that PySpark uses NumPy documentation style.
> See also https://github.com/apache/spark/pull/30181#discussion_r517314341



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33285) Too many "Auto-application to `()` is deprecated." related compilation warnings

2020-11-04 Thread Guillaume Martres (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226040#comment-17226040
 ] 

Guillaume Martres commented on SPARK-33285:
---

{quote}
Similarly, there are many "symbol literal is degraded" warnings too, but this 
can only be fixed after Scala 2.12 is no longer supported
{quote}

 

Replacing {{'foo}} by {{Symbol("foo")}} will get rid of the warning and is 
compatible with all Scala versions.

> Too many "Auto-application to `()` is deprecated."  related compilation 
> warnings
> 
>
> Key: SPARK-33285
> URL: https://issues.apache.org/jira/browse/SPARK-33285
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Minor
>
> There are too many  "Auto-application to `()` is deprecated." related 
> compilation warnings when compile with Scala 2.13 like
> {code:java}
> [WARNING] [Warn] 
> /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: 
> Auto-application to `()` is deprecated. Supply the empty argument list `()` 
> explicitly to invoke method stdev,
> or remove the empty argument list from its definition (Java-defined methods 
> are exempt).
> In Scala 3, an unapplied method like this will be eta-expanded into a 
> function.
> {code}
> A lot of them, but it's easy to fix.
> If there is a definition as follows:
> {code:java}
> Class Foo {
>def bar(): Unit = {}
> }
> val foo = new Foo{code}
> Should be
> {code:java}
> foo.bar()
> {code}
> not
> {code:java}
> foo.bar {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32894) Timestamp cast in exernal orc table

2020-11-04 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226016#comment-17226016
 ] 

Yang Jie commented on SPARK-32894:
--

What is the data generation process?It seems that the data type is inconsistent 
with the schema of the table

> Timestamp cast in exernal orc table
> ---
>
> Key: SPARK-32894
> URL: https://issues.apache.org/jira/browse/SPARK-32894
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 3.0.0
> Environment: Spark 3.0.0
> Java 1.8
> Hadoop 3.3.0
> Hive 3.1.2
> Python 3.7 (from pyspark)
>Reporter: Grigory Skvortsov
>Priority: Major
>
> I have the external hive table stored as orc. I want to work with timestamp 
> column in my table using pyspark.
> For example, I try this:
>  spark.sql('select id, time_ from mydb.table1`).show()
>  
>  Py4JJavaError: An error occurred while calling o2877.showString.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
> in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 
> (TID 19, 172.29.14.241, executor 1): java.lang.ClassCastException: 
> org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Long
> at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
> at 
> org.apache.spark.sql.catalyst.expressions.MutableLong.update(SpecificInternalRow.scala:148)
> at 
> org.apache.spark.sql.catalyst.expressions.SpecificInternalRow.update(SpecificInternalRow.scala:228)
> at 
> org.apache.spark.sql.hive.HiveInspectors.$anonfun$unwrapperFor$53(HiveInspectors.scala:730)
> at 
> org.apache.spark.sql.hive.HiveInspectors.$anonfun$unwrapperFor$53$adapted(HiveInspectors.scala:730)
> at 
> org.apache.spark.sql.hive.orc.OrcFileFormat$.$anonfun$unwrapOrcStructs$4(OrcFileFormat.scala:351)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:96)
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872)
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> at org.apache.spark.scheduler.Task.run(Task.scala:127)
> at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Driver stacktrace:
> at 
> org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2023)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1972)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1971)
> at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
> at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1971)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:950)
> at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:950)
> at scala.Option.foreach(Option.scala:407)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:950)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2203)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.o

[jira] [Commented] (SPARK-33325) Spark executors pod are not shutting down when losing driver connection

2020-11-04 Thread Hadrien Kohl (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226009#comment-17226009
 ] 

Hadrien Kohl commented on SPARK-33325:
--

It looks one thread gets stuck here on the awaitTermination

[https://github.com/apache/spark/blob/2b147c4cd50da32fe2b4167f97c8142102a0510d/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L52-L61]
{code:java}
  def stop(): Unit = {
synchronized {
  if (!stopped) {
setActive(MessageLoop.PoisonPill)
threadpool.shutdown()
stopped = true
  }
}
threadpool.awaitTermination(Long.MaxValue, TimeUnit.MILLISECONDS)
  }
{code}

> Spark executors pod are not shutting down when losing driver connection
> ---
>
> Key: SPARK-33325
> URL: https://issues.apache.org/jira/browse/SPARK-33325
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.1
>Reporter: Hadrien Kohl
>Priority: Major
>
> In situations where the executors lose contact with the driver, the java 
> process does not die. I am looking at what on the kubernetes cluster could 
> prevent proper clean-up. 
> The spark driver is started in it's own pod in client mode (pyspark shell 
> started by jupyter). I works fine most of the time but if the driver process 
> crashes (OOM or kill signal for instance) the executor complains about the 
> connection reset by peer and then hangs.
> Here's the log from an executor pod that hangs:
> {code:java}
> 20/11/03 07:35:30 WARN TransportChannelHandler: Exception in connection from 
> /10.17.0.152:37161
> java.io.IOException: Connection reset by peer
>   at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>   at java.base/sun.nio.ch.SocketDispatcher.read(Unknown Source)
>   at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
>   at java.base/sun.nio.ch.IOUtil.read(Unknown Source)
>   at java.base/sun.nio.ch.IOUtil.read(Unknown Source)
>   at java.base/sun.nio.ch.SocketChannelImpl.read(Unknown Source)
>   at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:253)
>   at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1133)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:350)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.base/java.lang.Thread.run(Unknown Source)
> 20/11/03 07:35:30 ERROR CoarseGrainedExecutorBackend: Executor self-exiting 
> due to : Driver 10.17.0.152:37161 disassociated! Shutting down.
> 20/11/03 07:35:31 INFO MemoryStore: MemoryStore cleared
> 20/11/03 07:35:31 INFO BlockManager: BlockManager stopped
> {code}
> When start a shell in the pod I can see the process are still running: 
> {code:java}
> UID  PIDPPID  CSZ   RSS PSR STIME TTY  TIME CMD
> 185  125   0  0  5045  3968   2 10:07 pts/000:00:00 /bin/bash
> 185  166 125  0  9019  3364   1 10:39 pts/000:00:00  \_ ps 
> -AF --forest
> 1851   0  0  1130   768   0 07:34 ?00:00:00 
> /usr/bin/tini -s -- /opt/java/openjdk/
> 185   14   1  0 1935527 493976 3 07:34 ?   00:00:21 
> /opt/java/openjdk/bin/java -Dspark.dri
> {code}
> Here's the full command used to start the executor: 
> {code:java}
> /opt/java/openjdk/
> bin/java -Dspark.driver.port=37161 -Xms4g -Xmx4g -cp :/opt/spark/jars/*: 
> org.apache.spark.executor.CoarseG
> rainedExecutorBackend --driver-url 
> spark://CoarseGrainedScheduler@10.17.0.152:37161 --executor-id 1 --core
> s 1 --app-id spark-application-1604388891044 --hostname 10.17.2.151
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 140 matches

Mail list logo