[jira] [Assigned] (SPARK-33354) New explicit cast syntax rules in ANSI mode
[ https://issues.apache.org/jira/browse/SPARK-33354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33354: Assignee: Apache Spark (was: Gengliang Wang) > New explicit cast syntax rules in ANSI mode > --- > > Key: SPARK-33354 > URL: https://issues.apache.org/jira/browse/SPARK-33354 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > In section 6.13 of the ANSI SQL standard, there are syntax rules for valid > combinations of the source and target data types. > To make Spark's ANSI mode more ANSI SQL Compatible,I propose to disallow the > following casting in ANSI mode: > {code:java} > TimeStamp <=> Boolean > Date <=> Boolean > Numeric <=> Timestamp > Numeric <=> Date > Numeric <=> Binary > String <=> Array > String <=> Map > String <=> Struct > {code} > The following castings are considered invalid in ANSI SQL standard, but they > are quite straight forward. Let's Allow them for now > {code:java} > Numeric <=> Boolean > String <=> Boolean > String <=> Binary > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33354) New explicit cast syntax rules in ANSI mode
[ https://issues.apache.org/jira/browse/SPARK-33354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226558#comment-17226558 ] Apache Spark commented on SPARK-33354: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/30260 > New explicit cast syntax rules in ANSI mode > --- > > Key: SPARK-33354 > URL: https://issues.apache.org/jira/browse/SPARK-33354 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > In section 6.13 of the ANSI SQL standard, there are syntax rules for valid > combinations of the source and target data types. > To make Spark's ANSI mode more ANSI SQL Compatible,I propose to disallow the > following casting in ANSI mode: > {code:java} > TimeStamp <=> Boolean > Date <=> Boolean > Numeric <=> Timestamp > Numeric <=> Date > Numeric <=> Binary > String <=> Array > String <=> Map > String <=> Struct > {code} > The following castings are considered invalid in ANSI SQL standard, but they > are quite straight forward. Let's Allow them for now > {code:java} > Numeric <=> Boolean > String <=> Boolean > String <=> Binary > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33354) New explicit cast syntax rules in ANSI mode
[ https://issues.apache.org/jira/browse/SPARK-33354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33354: Assignee: Gengliang Wang (was: Apache Spark) > New explicit cast syntax rules in ANSI mode > --- > > Key: SPARK-33354 > URL: https://issues.apache.org/jira/browse/SPARK-33354 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > In section 6.13 of the ANSI SQL standard, there are syntax rules for valid > combinations of the source and target data types. > To make Spark's ANSI mode more ANSI SQL Compatible,I propose to disallow the > following casting in ANSI mode: > {code:java} > TimeStamp <=> Boolean > Date <=> Boolean > Numeric <=> Timestamp > Numeric <=> Date > Numeric <=> Binary > String <=> Array > String <=> Map > String <=> Struct > {code} > The following castings are considered invalid in ANSI SQL standard, but they > are quite straight forward. Let's Allow them for now > {code:java} > Numeric <=> Boolean > String <=> Boolean > String <=> Binary > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33354) New explicit cast syntax rules in ANSI mode
Gengliang Wang created SPARK-33354: -- Summary: New explicit cast syntax rules in ANSI mode Key: SPARK-33354 URL: https://issues.apache.org/jira/browse/SPARK-33354 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.1.0 Reporter: Gengliang Wang Assignee: Gengliang Wang In section 6.13 of the ANSI SQL standard, there are syntax rules for valid combinations of the source and target data types. To make Spark's ANSI mode more ANSI SQL Compatible,I propose to disallow the following casting in ANSI mode: {code:java} TimeStamp <=> Boolean Date <=> Boolean Numeric <=> Timestamp Numeric <=> Date Numeric <=> Binary String <=> Array String <=> Map String <=> Struct {code} The following castings are considered invalid in ANSI SQL standard, but they are quite straight forward. Let's Allow them for now {code:java} Numeric <=> Boolean String <=> Boolean String <=> Binary {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33277) Python/Pandas UDF right after off-heap vectorized reader could cause executor crash.
[ https://issues.apache.org/jira/browse/SPARK-33277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33277: Assignee: Apache Spark > Python/Pandas UDF right after off-heap vectorized reader could cause executor > crash. > > > Key: SPARK-33277 > URL: https://issues.apache.org/jira/browse/SPARK-33277 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.7, 3.0.1 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > > Python/Pandas UDF right after off-heap vectorized reader could cause executor > crash. > E.g.,: > {code:java} > spark.range(0, 10, 1, 1).write.parquet(path) > spark.conf.set("spark.sql.columnVector.offheap.enabled", True) > def f(x): > return 0 > fUdf = udf(f, LongType()) > spark.read.parquet(path).select(fUdf('id')).head() > {code} > This is because, the Python evaluation consumes the parent iterator in a > separate thread and it consumes more data from the parent even after the task > ends and the parent is closed. If an off-heap column vector exists in the > parent iterator, it could cause segmentation fault which crashes the executor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33277) Python/Pandas UDF right after off-heap vectorized reader could cause executor crash.
[ https://issues.apache.org/jira/browse/SPARK-33277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33277: Assignee: (was: Apache Spark) > Python/Pandas UDF right after off-heap vectorized reader could cause executor > crash. > > > Key: SPARK-33277 > URL: https://issues.apache.org/jira/browse/SPARK-33277 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.7, 3.0.1 >Reporter: Takuya Ueshin >Priority: Major > > Python/Pandas UDF right after off-heap vectorized reader could cause executor > crash. > E.g.,: > {code:java} > spark.range(0, 10, 1, 1).write.parquet(path) > spark.conf.set("spark.sql.columnVector.offheap.enabled", True) > def f(x): > return 0 > fUdf = udf(f, LongType()) > spark.read.parquet(path).select(fUdf('id')).head() > {code} > This is because, the Python evaluation consumes the parent iterator in a > separate thread and it consumes more data from the parent even after the task > ends and the parent is closed. If an off-heap column vector exists in the > parent iterator, it could cause segmentation fault which crashes the executor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-33277) Python/Pandas UDF right after off-heap vectorized reader could cause executor crash.
[ https://issues.apache.org/jira/browse/SPARK-33277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-33277: -- Assignee: (was: Takuya Ueshin) Reverted in: master: https://github.com/apache/spark/commit/d530ed0ea8bdba09fba6dcd51f8e4f7745781c2e branch-3.0: https://github.com/apache/spark/commit/74d8eacbe9cdc0b25a177543eb48ac54bd065cbb branch-2.4: https://github.com/apache/spark/commit/c342bcd4c4ba68506ca6b459bd3a9c688d2aecfa > Python/Pandas UDF right after off-heap vectorized reader could cause executor > crash. > > > Key: SPARK-33277 > URL: https://issues.apache.org/jira/browse/SPARK-33277 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.7, 3.0.1 >Reporter: Takuya Ueshin >Priority: Major > > Python/Pandas UDF right after off-heap vectorized reader could cause executor > crash. > E.g.,: > {code:java} > spark.range(0, 10, 1, 1).write.parquet(path) > spark.conf.set("spark.sql.columnVector.offheap.enabled", True) > def f(x): > return 0 > fUdf = udf(f, LongType()) > spark.read.parquet(path).select(fUdf('id')).head() > {code} > This is because, the Python evaluation consumes the parent iterator in a > separate thread and it consumes more data from the parent even after the task > ends and the parent is closed. If an off-heap column vector exists in the > parent iterator, it could cause segmentation fault which crashes the executor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33277) Python/Pandas UDF right after off-heap vectorized reader could cause executor crash.
[ https://issues.apache.org/jira/browse/SPARK-33277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33277: - Fix Version/s: (was: 3.0.2) (was: 2.4.8) (was: 3.1.0) > Python/Pandas UDF right after off-heap vectorized reader could cause executor > crash. > > > Key: SPARK-33277 > URL: https://issues.apache.org/jira/browse/SPARK-33277 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.7, 3.0.1 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > > Python/Pandas UDF right after off-heap vectorized reader could cause executor > crash. > E.g.,: > {code:java} > spark.range(0, 10, 1, 1).write.parquet(path) > spark.conf.set("spark.sql.columnVector.offheap.enabled", True) > def f(x): > return 0 > fUdf = udf(f, LongType()) > spark.read.parquet(path).select(fUdf('id')).head() > {code} > This is because, the Python evaluation consumes the parent iterator in a > separate thread and it consumes more data from the parent even after the task > ends and the parent is closed. If an off-heap column vector exists in the > parent iterator, it could cause segmentation fault which crashes the executor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33331) Limit the number of pending blocks in memory and store blocks that collide
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226531#comment-17226531 ] wuyi edited comment on SPARK-1 at 11/5/20, 7:12 AM: I like the idea to cache the blocks of the worst case instead of throwing it away as long as we have the memory threshold(either memory size or block number). And we can always fallback to the original way whenever we set the threshold to 0. Another problem may be, when should the client retry the block after we have the memory cache? Shall we retry it immediately or wait for a few seconds regarding the number of deferred blocks? was (Author: ngone51): I like the idea to cache the blocks of the worst case instead of throwing it away as long as we have the memory threshold(either memory size or block number). And we can always fallback to the original way whenever they set the threshold to 0. Another problem may be, when should the client retry the block after we have the memory cache? Shall we retry it immediately or wait for a few seconds regarding the number of deferred blocks? > Limit the number of pending blocks in memory and store blocks that collide > -- > > Key: SPARK-1 > URL: https://issues.apache.org/jira/browse/SPARK-1 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Priority: Major > > This jira addresses the below two points: > 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately > are stored in memory. The stream callback maintains a list of > {{deferredBufs}}. When a block cannot be merged it is added to this list. > Currently, there isn't a limit on the number of pending blocks. We can limit > the number of pending blocks in memory. There has been a discussion around > this here: > [https://github.com/apache/spark/pull/30062#discussion_r514026014] > 2. When a stream doesn't get an opportunity to merge, then > {{RemoteBlockPushResolver}} ignores the data from that stream. Another > approach is to store the data of the stream in {{AppShufflePartitionInfo}} > when it reaches the worst-case scenario. This may increase the memory usage > of the shuffle service though. However, given a limit introduced with 1 we > can try this out. > More information can be found in this discussion: > [https://github.com/apache/spark/pull/30062#discussion_r517524546] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33282) Replace Probot Autolabeler with Github Action
[ https://issues.apache.org/jira/browse/SPARK-33282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33282. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30244 [https://github.com/apache/spark/pull/30244] > Replace Probot Autolabeler with Github Action > - > > Key: SPARK-33282 > URL: https://issues.apache.org/jira/browse/SPARK-33282 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 3.0.1 >Reporter: Kyle Bendickson >Assignee: Kyle Bendickson >Priority: Major > Fix For: 3.1.0 > > > The Probot Autolabeler that we were using in both the Iceberg and the Spark > repo is no longer working. I've confirmed that with the devleper, github user > [at]mithro, who has indicated that the Probot Autolabeler is end of life and > will not be maintained moving forward. > PRs have not been labeled for a few weeks now. > > As I'm already interfacing with ASF Infra to have the probot permissions > revoked from the Iceberg repo, and I've already submitted a patch to switch > Iceberg to the standard github labeler action, I figured I would go ahead and > volunteer myself to switch the Spark repo as well. > I will have a patch to switch to the new github labeler open within a few > days. > > Also thank you [~blue] (or [~holden]) for shepherding this! I didn't exactly > ask, but it was understood in our group meeting for Iceberg that I'd be > converting our labeler there so I figured I'd tackle the spark issue while > I'm getting my hands into the labeling configs anyway =) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33282) Replace Probot Autolabeler with Github Action
[ https://issues.apache.org/jira/browse/SPARK-33282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-33282: Assignee: Kyle Bendickson > Replace Probot Autolabeler with Github Action > - > > Key: SPARK-33282 > URL: https://issues.apache.org/jira/browse/SPARK-33282 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 3.0.1 >Reporter: Kyle Bendickson >Assignee: Kyle Bendickson >Priority: Major > > The Probot Autolabeler that we were using in both the Iceberg and the Spark > repo is no longer working. I've confirmed that with the devleper, github user > [at]mithro, who has indicated that the Probot Autolabeler is end of life and > will not be maintained moving forward. > PRs have not been labeled for a few weeks now. > > As I'm already interfacing with ASF Infra to have the probot permissions > revoked from the Iceberg repo, and I've already submitted a patch to switch > Iceberg to the standard github labeler action, I figured I would go ahead and > volunteer myself to switch the Spark repo as well. > I will have a patch to switch to the new github labeler open within a few > days. > > Also thank you [~blue] (or [~holden]) for shepherding this! I didn't exactly > ask, but it was understood in our group meeting for Iceberg that I'd be > converting our labeler there so I figured I'd tackle the spark issue while > I'm getting my hands into the labeling configs anyway =) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33331) Limit the number of pending blocks in memory and store blocks that collide
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226531#comment-17226531 ] wuyi edited comment on SPARK-1 at 11/5/20, 7:09 AM: I like the idea to cache the blocks of the worst case instead of throwing it away as long as we have the memory threshold(either memory size or block number). And we can always fallback to the original way whenever they set the threshold to 0. Another problem may be, when should the client retry the block after we have the memory cache? Shall we retry it immediately or wait for a few seconds regarding the number of deferred blocks? was (Author: ngone51): I like the idea to cache the blocks of the worst case instead of throwing it away as long as we have the memory threshold(either memory size or block number). And users actually can fallback to the original way whenever they set the threshold to 0. Another problem may be, when should the client retry the block after we have the memory cache? Shall we retry it immediately or wait for a few seconds regarding the number of deferred blocks? > Limit the number of pending blocks in memory and store blocks that collide > -- > > Key: SPARK-1 > URL: https://issues.apache.org/jira/browse/SPARK-1 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Priority: Major > > This jira addresses the below two points: > 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately > are stored in memory. The stream callback maintains a list of > {{deferredBufs}}. When a block cannot be merged it is added to this list. > Currently, there isn't a limit on the number of pending blocks. We can limit > the number of pending blocks in memory. There has been a discussion around > this here: > [https://github.com/apache/spark/pull/30062#discussion_r514026014] > 2. When a stream doesn't get an opportunity to merge, then > {{RemoteBlockPushResolver}} ignores the data from that stream. Another > approach is to store the data of the stream in {{AppShufflePartitionInfo}} > when it reaches the worst-case scenario. This may increase the memory usage > of the shuffle service though. However, given a limit introduced with 1 we > can try this out. > More information can be found in this discussion: > [https://github.com/apache/spark/pull/30062#discussion_r517524546] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33331) Limit the number of pending blocks in memory and store blocks that collide
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226531#comment-17226531 ] wuyi edited comment on SPARK-1 at 11/5/20, 6:59 AM: I like the idea to cache the blocks of the worst case instead of throwing it away as long as we have the memory threshold(either memory size or block number). And users actually can fallback to the original way whenever they set the threshold to 0. Another problem may be, when should the client retry the block after we have the memory cache? Shall we retry it immediately or wait for a few seconds regarding the number of deferred blocks? was (Author: ngone51): I like the idea to cache the blocks of the worst case instead of throwing it away as long as we have the memory threshold(either memory size or block number). And users actually can fallback to the original way whenever they set the threshold to 0. Another problem may be, when should the client retry the block after we have the memory cache? Shall we retry it immediately or wait a little bit regarding the number of deferred blocks? > Limit the number of pending blocks in memory and store blocks that collide > -- > > Key: SPARK-1 > URL: https://issues.apache.org/jira/browse/SPARK-1 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Priority: Major > > This jira addresses the below two points: > 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately > are stored in memory. The stream callback maintains a list of > {{deferredBufs}}. When a block cannot be merged it is added to this list. > Currently, there isn't a limit on the number of pending blocks. We can limit > the number of pending blocks in memory. There has been a discussion around > this here: > [https://github.com/apache/spark/pull/30062#discussion_r514026014] > 2. When a stream doesn't get an opportunity to merge, then > {{RemoteBlockPushResolver}} ignores the data from that stream. Another > approach is to store the data of the stream in {{AppShufflePartitionInfo}} > when it reaches the worst-case scenario. This may increase the memory usage > of the shuffle service though. However, given a limit introduced with 1 we > can try this out. > More information can be found in this discussion: > [https://github.com/apache/spark/pull/30062#discussion_r517524546] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33331) Limit the number of pending blocks in memory and store blocks that collide
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226531#comment-17226531 ] wuyi commented on SPARK-1: -- I like the idea to cache the blocks of the worst case instead of throwing it away as long as we have the memory threshold(either memory size or block number). And users actually can fallback to the original way whenever they set the threshold to 0. Another problem may be, when should the client retry the block after we have the memory cache? Shall we retry it immediately or wait a little bit regarding the number of deferred blocks? > Limit the number of pending blocks in memory and store blocks that collide > -- > > Key: SPARK-1 > URL: https://issues.apache.org/jira/browse/SPARK-1 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Priority: Major > > This jira addresses the below two points: > 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately > are stored in memory. The stream callback maintains a list of > {{deferredBufs}}. When a block cannot be merged it is added to this list. > Currently, there isn't a limit on the number of pending blocks. We can limit > the number of pending blocks in memory. There has been a discussion around > this here: > [https://github.com/apache/spark/pull/30062#discussion_r514026014] > 2. When a stream doesn't get an opportunity to merge, then > {{RemoteBlockPushResolver}} ignores the data from that stream. Another > approach is to store the data of the stream in {{AppShufflePartitionInfo}} > when it reaches the worst-case scenario. This may increase the memory usage > of the shuffle service though. However, given a limit introduced with 1 we > can try this out. > More information can be found in this discussion: > [https://github.com/apache/spark/pull/30062#discussion_r517524546] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33239) Use pre-built image at GitHub Action SparkR job
[ https://issues.apache.org/jira/browse/SPARK-33239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33239: -- Fix Version/s: 3.0.2 > Use pre-built image at GitHub Action SparkR job > --- > > Key: SPARK-33239 > URL: https://issues.apache.org/jira/browse/SPARK-33239 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.2, 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33353) Cache dependencies for Coursier with new sbt in GitHub Actions
[ https://issues.apache.org/jira/browse/SPARK-33353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226494#comment-17226494 ] Apache Spark commented on SPARK-33353: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/30259 > Cache dependencies for Coursier with new sbt in GitHub Actions > -- > > Key: SPARK-33353 > URL: https://issues.apache.org/jira/browse/SPARK-33353 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > SPARK-33226 upgraded sbt to 1.4.1. > As of 1.3.0, sbt uses Coursier as the dependency resolver / fetcher. > So let's change the dependency cache configuration for the GitHub Actions job. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33353) Cache dependencies for Coursier with new sbt in GitHub Actions
[ https://issues.apache.org/jira/browse/SPARK-33353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33353: Assignee: Kousuke Saruta (was: Apache Spark) > Cache dependencies for Coursier with new sbt in GitHub Actions > -- > > Key: SPARK-33353 > URL: https://issues.apache.org/jira/browse/SPARK-33353 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > SPARK-33226 upgraded sbt to 1.4.1. > As of 1.3.0, sbt uses Coursier as the dependency resolver / fetcher. > So let's change the dependency cache configuration for the GitHub Actions job. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33353) Cache dependencies for Coursier with new sbt in GitHub Actions
[ https://issues.apache.org/jira/browse/SPARK-33353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33353: Assignee: Apache Spark (was: Kousuke Saruta) > Cache dependencies for Coursier with new sbt in GitHub Actions > -- > > Key: SPARK-33353 > URL: https://issues.apache.org/jira/browse/SPARK-33353 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Minor > > SPARK-33226 upgraded sbt to 1.4.1. > As of 1.3.0, sbt uses Coursier as the dependency resolver / fetcher. > So let's change the dependency cache configuration for the GitHub Actions job. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33353) Cache dependencies for Coursier with new sbt in GitHub Actions
Kousuke Saruta created SPARK-33353: -- Summary: Cache dependencies for Coursier with new sbt in GitHub Actions Key: SPARK-33353 URL: https://issues.apache.org/jira/browse/SPARK-33353 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.1.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta SPARK-33226 upgraded sbt to 1.4.1. As of 1.3.0, sbt uses Coursier as the dependency resolver / fetcher. So let's change the dependency cache configuration for the GitHub Actions job. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33239) Use pre-built image at GitHub Action SparkR job
[ https://issues.apache.org/jira/browse/SPARK-33239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226490#comment-17226490 ] Apache Spark commented on SPARK-33239: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/30258 > Use pre-built image at GitHub Action SparkR job > --- > > Key: SPARK-33239 > URL: https://issues.apache.org/jira/browse/SPARK-33239 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33316) Support nullable Avro schemas for non-nullable data in Avro writing
[ https://issues.apache.org/jira/browse/SPARK-33316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-33316: -- Assignee: Bo Zhang > Support nullable Avro schemas for non-nullable data in Avro writing > --- > > Key: SPARK-33316 > URL: https://issues.apache.org/jira/browse/SPARK-33316 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0, 3.0.0, 3.0.1 >Reporter: Bo Zhang >Assignee: Bo Zhang >Priority: Major > Fix For: 3.1.0 > > > Currently when users try to use nullable Avro schemas for non-nullable data > in Avro writing, Spark will throw a IncompatibleSchemaException. > There are some cases when users do not have full control over the nullability > of the data, or the nullability of the Avro schemas they have to use. We > should support nullable Avro schemas for non-nullable data in Avro writing > for better usability. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33162) Use pre-built image at GitHub Action PySpark jobs
[ https://issues.apache.org/jira/browse/SPARK-33162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33162: -- Fix Version/s: 3.0.2 > Use pre-built image at GitHub Action PySpark jobs > - > > Key: SPARK-33162 > URL: https://issues.apache.org/jira/browse/SPARK-33162 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.2, 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33316) Support nullable Avro schemas for non-nullable data in Avro writing
[ https://issues.apache.org/jira/browse/SPARK-33316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-33316. Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30224 [https://github.com/apache/spark/pull/30224] > Support nullable Avro schemas for non-nullable data in Avro writing > --- > > Key: SPARK-33316 > URL: https://issues.apache.org/jira/browse/SPARK-33316 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0, 3.0.0, 3.0.1 >Reporter: Bo Zhang >Priority: Major > Fix For: 3.1.0 > > > Currently when users try to use nullable Avro schemas for non-nullable data > in Avro writing, Spark will throw a IncompatibleSchemaException. > There are some cases when users do not have full control over the nullability > of the data, or the nullability of the Avro schemas they have to use. We > should support nullable Avro schemas for non-nullable data in Avro writing > for better usability. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33351) WithColumn should add a column with specific position
[ https://issues.apache.org/jira/browse/SPARK-33351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226482#comment-17226482 ] Apache Spark commented on SPARK-33351: -- User 'Karl-WangSK' has created a pull request for this issue: https://github.com/apache/spark/pull/30257 > WithColumn should add a column with specific position > - > > Key: SPARK-33351 > URL: https://issues.apache.org/jira/browse/SPARK-33351 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: karl wang >Priority: Major > > In `DataSet`, WithColumn usually add a new col at the end of the DF. > But sometime users want to add new col at the specific position. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33351) WithColumn should add a column with specific position
[ https://issues.apache.org/jira/browse/SPARK-33351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33351: Assignee: Apache Spark > WithColumn should add a column with specific position > - > > Key: SPARK-33351 > URL: https://issues.apache.org/jira/browse/SPARK-33351 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: karl wang >Assignee: Apache Spark >Priority: Major > > In `DataSet`, WithColumn usually add a new col at the end of the DF. > But sometime users want to add new col at the specific position. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33351) WithColumn should add a column with specific position
[ https://issues.apache.org/jira/browse/SPARK-33351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226483#comment-17226483 ] Apache Spark commented on SPARK-33351: -- User 'Karl-WangSK' has created a pull request for this issue: https://github.com/apache/spark/pull/30257 > WithColumn should add a column with specific position > - > > Key: SPARK-33351 > URL: https://issues.apache.org/jira/browse/SPARK-33351 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: karl wang >Priority: Major > > In `DataSet`, WithColumn usually add a new col at the end of the DF. > But sometime users want to add new col at the specific position. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33351) WithColumn should add a column with specific position
[ https://issues.apache.org/jira/browse/SPARK-33351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33351: Assignee: (was: Apache Spark) > WithColumn should add a column with specific position > - > > Key: SPARK-33351 > URL: https://issues.apache.org/jira/browse/SPARK-33351 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: karl wang >Priority: Major > > In `DataSet`, WithColumn usually add a new col at the end of the DF. > But sometime users want to add new col at the specific position. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33290) REFRESH TABLE should invalidate cache even though the table itself may not be cached
[ https://issues.apache.org/jira/browse/SPARK-33290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226476#comment-17226476 ] Apache Spark commented on SPARK-33290: -- User 'sunchao' has created a pull request for this issue: https://github.com/apache/spark/pull/30256 > REFRESH TABLE should invalidate cache even though the table itself may not be > cached > > > Key: SPARK-33290 > URL: https://issues.apache.org/jira/browse/SPARK-33290 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.7, 3.0.1 >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: correctness > Fix For: 2.4.8, 3.0.2, 3.1.0 > > > For the following example: > {code} > CREATE TABLE t ...; > CREATE VIEW t1 AS SELECT * FROM t; > REFRESH TABLE t > {code} > If t is cached, t1 will be invalidated. However if t is not cached as above, > the REFRESH command won't invalidate view t1. This could lead to incorrect > result if the view is used later. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33290) REFRESH TABLE should invalidate cache even though the table itself may not be cached
[ https://issues.apache.org/jira/browse/SPARK-33290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226475#comment-17226475 ] Apache Spark commented on SPARK-33290: -- User 'sunchao' has created a pull request for this issue: https://github.com/apache/spark/pull/30256 > REFRESH TABLE should invalidate cache even though the table itself may not be > cached > > > Key: SPARK-33290 > URL: https://issues.apache.org/jira/browse/SPARK-33290 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.7, 3.0.1 >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: correctness > Fix For: 2.4.8, 3.0.2, 3.1.0 > > > For the following example: > {code} > CREATE TABLE t ...; > CREATE VIEW t1 AS SELECT * FROM t; > REFRESH TABLE t > {code} > If t is cached, t1 will be invalidated. However if t is not cached as above, > the REFRESH command won't invalidate view t1. This could lead to incorrect > result if the view is used later. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33352) Fix procedure-like declaration compilation warning in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226473#comment-17226473 ] Apache Spark commented on SPARK-33352: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/30255 > Fix procedure-like declaration compilation warning in Scala 2.13 > > > Key: SPARK-33352 > URL: https://issues.apache.org/jira/browse/SPARK-33352 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Minor > > Similar to spark-29291, just to track Spark 3.1.0. > There are two similar compilation warnings about procedure-like declaration > in Scala 2.13.3: > > {code:java} > [WARNING] [Warn] > /spark/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70: > procedure syntax is deprecated for constructors: add `=`, as in method > definition > [WARNING] [Warn] > /spark/core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala:211: > procedure syntax is deprecated: instead, add `: Unit =` to explicitly > declare `run`'s return type > {code} > > For constructors method definition should be `this(...) = \{ }` not > `this(...) \{ }`, for without > `return type` methods definition should be `def methodName(...): Unit = {}` > not `def methodName(...) {}` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33352) Fix procedure-like declaration compilation warning in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226472#comment-17226472 ] Apache Spark commented on SPARK-33352: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/30255 > Fix procedure-like declaration compilation warning in Scala 2.13 > > > Key: SPARK-33352 > URL: https://issues.apache.org/jira/browse/SPARK-33352 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Minor > > Similar to spark-29291, just to track Spark 3.1.0. > There are two similar compilation warnings about procedure-like declaration > in Scala 2.13.3: > > {code:java} > [WARNING] [Warn] > /spark/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70: > procedure syntax is deprecated for constructors: add `=`, as in method > definition > [WARNING] [Warn] > /spark/core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala:211: > procedure syntax is deprecated: instead, add `: Unit =` to explicitly > declare `run`'s return type > {code} > > For constructors method definition should be `this(...) = \{ }` not > `this(...) \{ }`, for without > `return type` methods definition should be `def methodName(...): Unit = {}` > not `def methodName(...) {}` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33352) Fix procedure-like declaration compilation warning in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33352: Assignee: Apache Spark > Fix procedure-like declaration compilation warning in Scala 2.13 > > > Key: SPARK-33352 > URL: https://issues.apache.org/jira/browse/SPARK-33352 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > Similar to spark-29291, just to track Spark 3.1.0. > There are two similar compilation warnings about procedure-like declaration > in Scala 2.13.3: > > {code:java} > [WARNING] [Warn] > /spark/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70: > procedure syntax is deprecated for constructors: add `=`, as in method > definition > [WARNING] [Warn] > /spark/core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala:211: > procedure syntax is deprecated: instead, add `: Unit =` to explicitly > declare `run`'s return type > {code} > > For constructors method definition should be `this(...) = \{ }` not > `this(...) \{ }`, for without > `return type` methods definition should be `def methodName(...): Unit = {}` > not `def methodName(...) {}` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33352) Fix procedure-like declaration compilation warning in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33352: Assignee: (was: Apache Spark) > Fix procedure-like declaration compilation warning in Scala 2.13 > > > Key: SPARK-33352 > URL: https://issues.apache.org/jira/browse/SPARK-33352 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Minor > > Similar to spark-29291, just to track Spark 3.1.0. > There are two similar compilation warnings about procedure-like declaration > in Scala 2.13.3: > > {code:java} > [WARNING] [Warn] > /spark/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70: > procedure syntax is deprecated for constructors: add `=`, as in method > definition > [WARNING] [Warn] > /spark/core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala:211: > procedure syntax is deprecated: instead, add `: Unit =` to explicitly > declare `run`'s return type > {code} > > For constructors method definition should be `this(...) = \{ }` not > `this(...) \{ }`, for without > `return type` methods definition should be `def methodName(...): Unit = {}` > not `def methodName(...) {}` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33314) Avro reader drops rows
[ https://issues.apache.org/jira/browse/SPARK-33314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33314. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30221 [https://github.com/apache/spark/pull/30221] > Avro reader drops rows > -- > > Key: SPARK-33314 > URL: https://issues.apache.org/jira/browse/SPARK-33314 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Blocker > Labels: correctness > Fix For: 3.1.0 > > > Under certain circumstances, the V1 Avro reader drops rows. For example: > {noformat} > scala> val df = spark.range(0, 25).toDF("index") > df: org.apache.spark.sql.DataFrame = [index: bigint] > scala> df.write.mode("overwrite").format("avro").save("index_avro") > scala> val loaded = spark.read.format("avro").load("index_avro") > loaded: org.apache.spark.sql.DataFrame = [index: bigint] > scala> loaded.collect.size > res1: Int = 25 > scala> loaded.orderBy("index").collect.size > res2: Int = 17 <== expected 25 > scala> > loaded.orderBy("index").write.mode("overwrite").format("parquet").save("index_as_parquet") > scala> spark.read.parquet("index_as_parquet").count > res4: Long = 17 > scala> > {noformat} > SPARK-32346 slightly refactored the AvroFileFormat and > AvroPartitionReaderFactory to use a new iterator-like trait called > AvroUtils#RowReader. RowReader#hasNextRow consumes a raw input record and > stores the deserialized row for the next call to RowReader#nextRow. > Unfortunately, sometimes hasNextRow is called twice before nextRow is called, > resulting in a lost row (see > [BypassMergeSortShuffleWriter#write|https://github.com/apache/spark/blob/69c27f49acf2fe6fbc8335bde2aac4afd4188678/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java#L132], > which calls records.hasNext once before calling it again > [here|https://github.com/apache/spark/blob/69c27f49acf2fe6fbc8335bde2aac4afd4188678/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java#L155]). > RowReader consumes the Avro record in hasNextRow, rather than nextRow, > because AvroDeserializer#deserialize potentially filters out the record. > Two possible fixes that I thought of: > 1) keep state in RowReader such that multiple calls to RowReader#hasNextRow > with no intervening call to RowReader#nextRow avoids consuming more than 1 > Avro record. This requires no changes to any code that extends RowReader, > just RowReader itself. > 2) Move record consumption to RowReader#nextRow (such that RowReader#nextRow > could potentially return None) and wrap any iterator that extends RowReader > with a new iterator created by flatMap. This last iterator will filter out > the Nones and extract rows from the Somes. This requires changes to > AvroFileFormat and AvroPartitionReaderFactory as well as RowReader. > The first one seems simplest and most straightfoward, and doesn't require > changes to AvroFileFormat and AvroPartitionReaderFactory, only to > AvroUtils#RowReader. So I propose this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33314) Avro reader drops rows
[ https://issues.apache.org/jira/browse/SPARK-33314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-33314: Assignee: Bruce Robbins > Avro reader drops rows > -- > > Key: SPARK-33314 > URL: https://issues.apache.org/jira/browse/SPARK-33314 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Blocker > Labels: correctness > > Under certain circumstances, the V1 Avro reader drops rows. For example: > {noformat} > scala> val df = spark.range(0, 25).toDF("index") > df: org.apache.spark.sql.DataFrame = [index: bigint] > scala> df.write.mode("overwrite").format("avro").save("index_avro") > scala> val loaded = spark.read.format("avro").load("index_avro") > loaded: org.apache.spark.sql.DataFrame = [index: bigint] > scala> loaded.collect.size > res1: Int = 25 > scala> loaded.orderBy("index").collect.size > res2: Int = 17 <== expected 25 > scala> > loaded.orderBy("index").write.mode("overwrite").format("parquet").save("index_as_parquet") > scala> spark.read.parquet("index_as_parquet").count > res4: Long = 17 > scala> > {noformat} > SPARK-32346 slightly refactored the AvroFileFormat and > AvroPartitionReaderFactory to use a new iterator-like trait called > AvroUtils#RowReader. RowReader#hasNextRow consumes a raw input record and > stores the deserialized row for the next call to RowReader#nextRow. > Unfortunately, sometimes hasNextRow is called twice before nextRow is called, > resulting in a lost row (see > [BypassMergeSortShuffleWriter#write|https://github.com/apache/spark/blob/69c27f49acf2fe6fbc8335bde2aac4afd4188678/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java#L132], > which calls records.hasNext once before calling it again > [here|https://github.com/apache/spark/blob/69c27f49acf2fe6fbc8335bde2aac4afd4188678/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java#L155]). > RowReader consumes the Avro record in hasNextRow, rather than nextRow, > because AvroDeserializer#deserialize potentially filters out the record. > Two possible fixes that I thought of: > 1) keep state in RowReader such that multiple calls to RowReader#hasNextRow > with no intervening call to RowReader#nextRow avoids consuming more than 1 > Avro record. This requires no changes to any code that extends RowReader, > just RowReader itself. > 2) Move record consumption to RowReader#nextRow (such that RowReader#nextRow > could potentially return None) and wrap any iterator that extends RowReader > with a new iterator created by flatMap. This last iterator will filter out > the Nones and extract rows from the Somes. This requires changes to > AvroFileFormat and AvroPartitionReaderFactory as well as RowReader. > The first one seems simplest and most straightfoward, and doesn't require > changes to AvroFileFormat and AvroPartitionReaderFactory, only to > AvroUtils#RowReader. So I propose this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33352) Fix procedure-like declaration compilation warning in Scala 2.13
Yang Jie created SPARK-33352: Summary: Fix procedure-like declaration compilation warning in Scala 2.13 Key: SPARK-33352 URL: https://issues.apache.org/jira/browse/SPARK-33352 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 3.1.0 Reporter: Yang Jie Similar to spark-29291, just to track Spark 3.1.0. There are two similar compilation warnings about procedure-like declaration in Scala 2.13.3: {code:java} [WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70: procedure syntax is deprecated for constructors: add `=`, as in method definition [WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala:211: procedure syntax is deprecated: instead, add `: Unit =` to explicitly declare `run`'s return type {code} For constructors method definition should be `this(...) = \{ }` not `this(...) \{ }`, for without `return type` methods definition should be `def methodName(...): Unit = {}` not `def methodName(...) {}` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33351) WithColumn should add a column with specific position
[ https://issues.apache.org/jira/browse/SPARK-33351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karl wang updated SPARK-33351: -- Description: In `DataSet`, WithColumn usually add a new col at the end of the DF. But sometime users want to add new col at the specific position. > WithColumn should add a column with specific position > - > > Key: SPARK-33351 > URL: https://issues.apache.org/jira/browse/SPARK-33351 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: karl wang >Priority: Major > > In `DataSet`, WithColumn usually add a new col at the end of the DF. > But sometime users want to add new col at the specific position. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33351) WithColumn should add a column with specific position
karl wang created SPARK-33351: - Summary: WithColumn should add a column with specific position Key: SPARK-33351 URL: https://issues.apache.org/jira/browse/SPARK-33351 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.1.0 Reporter: karl wang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33350) Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data
[ https://issues.apache.org/jira/browse/SPARK-33350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated SPARK-33350: -- Summary: Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data (was: Add support to DiskBlockManager to create merge directory and the ability to get the shuffle merged data) > Add support to DiskBlockManager to create merge directory and to get the > local shuffle merged data > -- > > Key: SPARK-33350 > URL: https://issues.apache.org/jira/browse/SPARK-33350 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Priority: Major > > DiskBlockManager should be able to create the {{merge_manager}} directory, > where the push-based merged shuffle files are written and also create > sub-dirs under it. > It should also be able to serve the local merged shuffle data/index/meta > files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33350) Add support to DiskBlockManager to create merge directory and the ability to get the shuffle merged data
Chandni Singh created SPARK-33350: - Summary: Add support to DiskBlockManager to create merge directory and the ability to get the shuffle merged data Key: SPARK-33350 URL: https://issues.apache.org/jira/browse/SPARK-33350 Project: Spark Issue Type: Sub-task Components: Shuffle Affects Versions: 3.1.0 Reporter: Chandni Singh DiskBlockManager should be able to create the {{merge_manager}} directory, where the push-based merged shuffle files are written and also create sub-dirs under it. It should also be able to serve the local merged shuffle data/index/meta files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33343) Fix the build with sbt to copy hadoop-client-runtime.jar
[ https://issues.apache.org/jira/browse/SPARK-33343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33343. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30250 [https://github.com/apache/spark/pull/30250] > Fix the build with sbt to copy hadoop-client-runtime.jar > > > Key: SPARK-33343 > URL: https://issues.apache.org/jira/browse/SPARK-33343 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.1.0 > > > With the current master, spark-shell doesn't work if it's built with sbt > package. > It's due to hadoop-client-runtime.jar isn't copied to > assembly/target/scala-2.12/jars. > {code} > $ bin/spark-shell > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper > at > org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426) > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 11 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31711) Register the executor source with the metrics system when running in local mode.
[ https://issues.apache.org/jira/browse/SPARK-31711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-31711. --- Fix Version/s: 3.1.0 Resolution: Fixed > Register the executor source with the metrics system when running in local > mode. > > > Key: SPARK-31711 > URL: https://issues.apache.org/jira/browse/SPARK-31711 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Luca Canali >Assignee: Luca Canali >Priority: Minor > Fix For: 3.1.0 > > > The Apache Spark metrics system provides many useful insights on the Spark > workload. In particular, the executor source metrics > (https://github.com/apache/spark/blob/master/docs/monitoring.md#component-instance--executor) > provide detailed info, including the number of active tasks, some I/O > metrics, and task metrics details. Executor source metrics, contrary to other > sources (for example ExecutorMetrics source), are not yet available when > running in local mode. > This JIRA proposes to register the executor source with the Spark metrics > system when running in local mode, as this can be very useful when testing > and troubleshooting Spark workloads. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31711) Register the executor source with the metrics system when running in local mode.
[ https://issues.apache.org/jira/browse/SPARK-31711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-31711: - Assignee: Luca Canali > Register the executor source with the metrics system when running in local > mode. > > > Key: SPARK-31711 > URL: https://issues.apache.org/jira/browse/SPARK-31711 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Luca Canali >Assignee: Luca Canali >Priority: Minor > > The Apache Spark metrics system provides many useful insights on the Spark > workload. In particular, the executor source metrics > (https://github.com/apache/spark/blob/master/docs/monitoring.md#component-instance--executor) > provide detailed info, including the number of active tasks, some I/O > metrics, and task metrics details. Executor source metrics, contrary to other > sources (for example ExecutorMetrics source), are not yet available when > running in local mode. > This JIRA proposes to register the executor source with the Spark metrics > system when running in local mode, as this can be very useful when testing > and troubleshooting Spark workloads. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33282) Replace Probot Autolabeler with Github Action
[ https://issues.apache.org/jira/browse/SPARK-33282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226414#comment-17226414 ] Apache Spark commented on SPARK-33282: -- User 'kbendick' has created a pull request for this issue: https://github.com/apache/spark/pull/30254 > Replace Probot Autolabeler with Github Action > - > > Key: SPARK-33282 > URL: https://issues.apache.org/jira/browse/SPARK-33282 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 3.0.1 >Reporter: Kyle Bendickson >Priority: Major > > The Probot Autolabeler that we were using in both the Iceberg and the Spark > repo is no longer working. I've confirmed that with the devleper, github user > [at]mithro, who has indicated that the Probot Autolabeler is end of life and > will not be maintained moving forward. > PRs have not been labeled for a few weeks now. > > As I'm already interfacing with ASF Infra to have the probot permissions > revoked from the Iceberg repo, and I've already submitted a patch to switch > Iceberg to the standard github labeler action, I figured I would go ahead and > volunteer myself to switch the Spark repo as well. > I will have a patch to switch to the new github labeler open within a few > days. > > Also thank you [~blue] (or [~holden]) for shepherding this! I didn't exactly > ask, but it was understood in our group meeting for Iceberg that I'd be > converting our labeler there so I figured I'd tackle the spark issue while > I'm getting my hands into the labeling configs anyway =) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33282) Replace Probot Autolabeler with Github Action
[ https://issues.apache.org/jira/browse/SPARK-33282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226413#comment-17226413 ] Apache Spark commented on SPARK-33282: -- User 'kbendick' has created a pull request for this issue: https://github.com/apache/spark/pull/30254 > Replace Probot Autolabeler with Github Action > - > > Key: SPARK-33282 > URL: https://issues.apache.org/jira/browse/SPARK-33282 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 3.0.1 >Reporter: Kyle Bendickson >Priority: Major > > The Probot Autolabeler that we were using in both the Iceberg and the Spark > repo is no longer working. I've confirmed that with the devleper, github user > [at]mithro, who has indicated that the Probot Autolabeler is end of life and > will not be maintained moving forward. > PRs have not been labeled for a few weeks now. > > As I'm already interfacing with ASF Infra to have the probot permissions > revoked from the Iceberg repo, and I've already submitted a patch to switch > Iceberg to the standard github labeler action, I figured I would go ahead and > volunteer myself to switch the Spark repo as well. > I will have a patch to switch to the new github labeler open within a few > days. > > Also thank you [~blue] (or [~holden]) for shepherding this! I didn't exactly > ask, but it was understood in our group meeting for Iceberg that I'd be > converting our labeler there so I figured I'd tackle the spark issue while > I'm getting my hands into the labeling configs anyway =) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33162) Use pre-built image at GitHub Action PySpark jobs
[ https://issues.apache.org/jira/browse/SPARK-33162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226408#comment-17226408 ] Apache Spark commented on SPARK-33162: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/30253 > Use pre-built image at GitHub Action PySpark jobs > - > > Key: SPARK-33162 > URL: https://issues.apache.org/jira/browse/SPARK-33162 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33162) Use pre-built image at GitHub Action PySpark jobs
[ https://issues.apache.org/jira/browse/SPARK-33162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226407#comment-17226407 ] Apache Spark commented on SPARK-33162: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/30253 > Use pre-built image at GitHub Action PySpark jobs > - > > Key: SPARK-33162 > URL: https://issues.apache.org/jira/browse/SPARK-33162 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
[ https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226395#comment-17226395 ] Dongjoon Hyun commented on SPARK-33349: --- I converted this to a subtask of SPARK-33005 to give more visibility. > ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed > -- > > Key: SPARK-33349 > URL: https://issues.apache.org/jira/browse/SPARK-33349 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.0.1, 3.0.2 >Reporter: Nicola Bova >Priority: Critical > > I launch my spark application with the > [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] > with the following yaml file: > {code:yaml} > apiVersion: sparkoperator.k8s.io/v1beta2 > kind: SparkApplication > metadata: > name: spark-kafka-streamer-test > namespace: kafka2hdfs > spec: > type: Scala > mode: cluster > image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 > imagePullPolicy: Always > timeToLiveSeconds: 259200 > mainClass: path.to.my.class.KafkaStreamer > mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar > sparkVersion: 3.0.1 > restartPolicy: > type: Always > sparkConf: > "spark.kafka.consumer.cache.capacity": "8192" > "spark.kubernetes.memoryOverheadFactor": "0.3" > deps: > jars: > - my > - jar > - list > hadoopConfigMap: hdfs-config > driver: > cores: 4 > memory: 12g > labels: > version: 3.0.1 > serviceAccount: default > javaOptions: > "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" > executor: > instances: 4 > cores: 4 > memory: 16g > labels: > version: 3.0.1 > javaOptions: > "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" > {code} > I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart > the watcher when we receive a version changed from > k8s"|https://github.com/apache/spark/pull/29533] patch. > This is the driver log: > {code} > 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > ... // my app log, it's a structured streaming app reading from kafka and > writing to hdfs > 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has > been closed (this is expected if the application is shutting down.) > io.fabric8.kubernetes.client.KubernetesClientException: too old resource > version: 1574101276 (1574213896) > at > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) > at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) > at > okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) > at > okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) > at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) > at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) > at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) > at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.base/java.lang.Thread.run(Unknown Source) > {code} > The error above appears after roughly 50 minutes. > After the exception above, no more logs are produced and the app hangs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
[ https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33349: -- Affects Version/s: 3.1.0 > ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed > -- > > Key: SPARK-33349 > URL: https://issues.apache.org/jira/browse/SPARK-33349 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.0.1, 3.0.2, 3.1.0 >Reporter: Nicola Bova >Priority: Critical > > I launch my spark application with the > [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] > with the following yaml file: > {code:yaml} > apiVersion: sparkoperator.k8s.io/v1beta2 > kind: SparkApplication > metadata: > name: spark-kafka-streamer-test > namespace: kafka2hdfs > spec: > type: Scala > mode: cluster > image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 > imagePullPolicy: Always > timeToLiveSeconds: 259200 > mainClass: path.to.my.class.KafkaStreamer > mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar > sparkVersion: 3.0.1 > restartPolicy: > type: Always > sparkConf: > "spark.kafka.consumer.cache.capacity": "8192" > "spark.kubernetes.memoryOverheadFactor": "0.3" > deps: > jars: > - my > - jar > - list > hadoopConfigMap: hdfs-config > driver: > cores: 4 > memory: 12g > labels: > version: 3.0.1 > serviceAccount: default > javaOptions: > "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" > executor: > instances: 4 > cores: 4 > memory: 16g > labels: > version: 3.0.1 > javaOptions: > "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" > {code} > I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart > the watcher when we receive a version changed from > k8s"|https://github.com/apache/spark/pull/29533] patch. > This is the driver log: > {code} > 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > ... // my app log, it's a structured streaming app reading from kafka and > writing to hdfs > 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has > been closed (this is expected if the application is shutting down.) > io.fabric8.kubernetes.client.KubernetesClientException: too old resource > version: 1574101276 (1574213896) > at > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) > at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) > at > okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) > at > okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) > at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) > at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) > at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) > at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.base/java.lang.Thread.run(Unknown Source) > {code} > The error above appears after roughly 50 minutes. > After the exception above, no more logs are produced and the app hangs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
[ https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33349: -- Parent: SPARK-33005 Issue Type: Sub-task (was: Bug) > ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed > -- > > Key: SPARK-33349 > URL: https://issues.apache.org/jira/browse/SPARK-33349 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.0.1, 3.0.2 >Reporter: Nicola Bova >Priority: Critical > > I launch my spark application with the > [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] > with the following yaml file: > {code:yaml} > apiVersion: sparkoperator.k8s.io/v1beta2 > kind: SparkApplication > metadata: > name: spark-kafka-streamer-test > namespace: kafka2hdfs > spec: > type: Scala > mode: cluster > image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 > imagePullPolicy: Always > timeToLiveSeconds: 259200 > mainClass: path.to.my.class.KafkaStreamer > mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar > sparkVersion: 3.0.1 > restartPolicy: > type: Always > sparkConf: > "spark.kafka.consumer.cache.capacity": "8192" > "spark.kubernetes.memoryOverheadFactor": "0.3" > deps: > jars: > - my > - jar > - list > hadoopConfigMap: hdfs-config > driver: > cores: 4 > memory: 12g > labels: > version: 3.0.1 > serviceAccount: default > javaOptions: > "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" > executor: > instances: 4 > cores: 4 > memory: 16g > labels: > version: 3.0.1 > javaOptions: > "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" > {code} > I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart > the watcher when we receive a version changed from > k8s"|https://github.com/apache/spark/pull/29533] patch. > This is the driver log: > {code} > 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > ... // my app log, it's a structured streaming app reading from kafka and > writing to hdfs > 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has > been closed (this is expected if the application is shutting down.) > io.fabric8.kubernetes.client.KubernetesClientException: too old resource > version: 1574101276 (1574213896) > at > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) > at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) > at > okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) > at > okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) > at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) > at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) > at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) > at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.base/java.lang.Thread.run(Unknown Source) > {code} > The error above appears after roughly 50 minutes. > After the exception above, no more logs are produced and the app hangs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
[ https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226393#comment-17226393 ] Dongjoon Hyun commented on SPARK-33349: --- Thanks, [~jkleckner]. It looks like a breaking change. {code:java} Note Minor breaking changes: - PR #2424 (#2414) slightly changes the API by adding the new WatchAndWaitable "combiner" interface. Most projects shouldn't require any additional changes. {code} > ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed > -- > > Key: SPARK-33349 > URL: https://issues.apache.org/jira/browse/SPARK-33349 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.1, 3.0.2 >Reporter: Nicola Bova >Priority: Critical > > I launch my spark application with the > [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] > with the following yaml file: > {code:yaml} > apiVersion: sparkoperator.k8s.io/v1beta2 > kind: SparkApplication > metadata: > name: spark-kafka-streamer-test > namespace: kafka2hdfs > spec: > type: Scala > mode: cluster > image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 > imagePullPolicy: Always > timeToLiveSeconds: 259200 > mainClass: path.to.my.class.KafkaStreamer > mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar > sparkVersion: 3.0.1 > restartPolicy: > type: Always > sparkConf: > "spark.kafka.consumer.cache.capacity": "8192" > "spark.kubernetes.memoryOverheadFactor": "0.3" > deps: > jars: > - my > - jar > - list > hadoopConfigMap: hdfs-config > driver: > cores: 4 > memory: 12g > labels: > version: 3.0.1 > serviceAccount: default > javaOptions: > "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" > executor: > instances: 4 > cores: 4 > memory: 16g > labels: > version: 3.0.1 > javaOptions: > "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" > {code} > I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart > the watcher when we receive a version changed from > k8s"|https://github.com/apache/spark/pull/29533] patch. > This is the driver log: > {code} > 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > ... // my app log, it's a structured streaming app reading from kafka and > writing to hdfs > 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has > been closed (this is expected if the application is shutting down.) > io.fabric8.kubernetes.client.KubernetesClientException: too old resource > version: 1574101276 (1574213896) > at > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) > at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) > at > okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) > at > okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) > at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) > at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) > at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) > at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.base/java.lang.Thread.run(Unknown Source) > {code} > The error above appears after roughly 50 minutes. > After the exception above, no more logs are produced and the app hangs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33331) Limit the number of pending blocks in memory and store blocks that collide
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated SPARK-1: -- Description: This jira addresses the below two points: 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately are stored in memory. The stream callback maintains a list of {{deferredBufs}}. When a block cannot be merged it is added to this list. Currently, there isn't a limit on the number of pending blocks. We can limit the number of pending blocks in memory. There has been a discussion around this here: [https://github.com/apache/spark/pull/30062#discussion_r514026014] 2. When a stream doesn't get an opportunity to merge, then {{RemoteBlockPushResolver}} ignores the data from that stream. Another approach is to store the data of the stream in {{AppShufflePartitionInfo}} when it reaches the worst-case scenario. This may increase the memory usage of the shuffle service though. However, given a limit introduced with 1 we can try this out. More information can be found in this discussion: [https://github.com/apache/spark/pull/30062#discussion_r517524546] was: This jira addresses the below two points: 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately are stored in memory. The stream callback maintains a list of {{deferredBufs}}. When a block cannot be merged it is added to this list. Currently, there isn't a limit on the number of pending blocks. We can limit the number of pending blocks in memory. There has been a discussion around this here: [https://github.com/apache/spark/pull/30062#discussion_r514026014 ] 2. When a stream doesn't get an opportunity to merge, then {{RemoteBlockPushResolver}} ignores the data from that stream. Another approach is to store the data of the stream in {{AppShufflePartitionInfo}} when it reaches the worst-case scenario. This may increase the memory usage of the shuffle service though. However, given a limit introduced with 1 we can try this out. More information can be found in this discussion: [https://github.com/apache/spark/pull/30062#discussion_r517524546] > Limit the number of pending blocks in memory and store blocks that collide > -- > > Key: SPARK-1 > URL: https://issues.apache.org/jira/browse/SPARK-1 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Priority: Major > > This jira addresses the below two points: > 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately > are stored in memory. The stream callback maintains a list of > {{deferredBufs}}. When a block cannot be merged it is added to this list. > Currently, there isn't a limit on the number of pending blocks. We can limit > the number of pending blocks in memory. There has been a discussion around > this here: > [https://github.com/apache/spark/pull/30062#discussion_r514026014] > 2. When a stream doesn't get an opportunity to merge, then > {{RemoteBlockPushResolver}} ignores the data from that stream. Another > approach is to store the data of the stream in {{AppShufflePartitionInfo}} > when it reaches the worst-case scenario. This may increase the memory usage > of the shuffle service though. However, given a limit introduced with 1 we > can try this out. > More information can be found in this discussion: > [https://github.com/apache/spark/pull/30062#discussion_r517524546] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33331) Limit the number of pending blocks in memory and store blocks that collide
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated SPARK-1: -- Description: This jira addresses the below two points: 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately are stored in memory. The stream callback maintains a list of {{deferredBufs}}. When a block cannot be merged it is added to this list. Currently, there isn't a limit on the number of pending blocks. We can limit the number of pending blocks in memory. There has been a discussion around this here: [https://github.com/apache/spark/pull/30062#discussion_r514026014 ] 2. When a stream doesn't get an opportunity to merge, then {{RemoteBlockPushResolver}} ignores the data from that stream. Another approach is to store the data of the stream in {{AppShufflePartitionInfo}} when it reaches the worst-case scenario. This may increase the memory usage of the shuffle service though. However, given a limit introduced with 1 we can try this out. More information can be found in this discussion: [https://github.com/apache/spark/pull/30062#discussion_r517524546] was: 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately are stored in memory. The stream callback maintains a list of {{deferredBufs}}. When a block cannot be merged it is added to this list. Currently, there isn't a limit on the number of pending blocks. There has been a discussion around this here: https://github.com/apache/spark/pull/30062#discussion_r514026014 2. When a stream doesn't get an opportunity to merge, then {{RemoteBlockPushResolver}} ignores the data from that stream. Another approach is to store the data of the stream in {{AppShufflePartitionInfo}} when it reaches the worst-case scenario. This may increase the memory usage of the shuffle service though. However, given a limit introduced with 1 we can try this out. > Limit the number of pending blocks in memory and store blocks that collide > -- > > Key: SPARK-1 > URL: https://issues.apache.org/jira/browse/SPARK-1 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Priority: Major > > This jira addresses the below two points: > 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately > are stored in memory. The stream callback maintains a list of > {{deferredBufs}}. When a block cannot be merged it is added to this list. > Currently, there isn't a limit on the number of pending blocks. We can limit > the number of pending blocks in memory. There has been a discussion around > this here: > [https://github.com/apache/spark/pull/30062#discussion_r514026014 > ] > 2. When a stream doesn't get an opportunity to merge, then > {{RemoteBlockPushResolver}} ignores the data from that stream. Another > approach is to store the data of the stream in {{AppShufflePartitionInfo}} > when it reaches the worst-case scenario. This may increase the memory usage > of the shuffle service though. However, given a limit introduced with 1 we > can try this out. > More information can be found in this discussion: > [https://github.com/apache/spark/pull/30062#discussion_r517524546] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33331) Limit the number of pending blocks in memory and store blocks that collide
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated SPARK-1: -- Summary: Limit the number of pending blocks in memory and store blocks that collide (was: Limit the number of pending blocks in memory when RemoteBlockPushResolver defers a block) > Limit the number of pending blocks in memory and store blocks that collide > -- > > Key: SPARK-1 > URL: https://issues.apache.org/jira/browse/SPARK-1 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Priority: Major > > 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately > are stored in memory. The stream callback maintains a list of > {{deferredBufs}}. When a block cannot be merged it is added to this list. > Currently, there isn't a limit on the number of pending blocks. There has > been a discussion around this here: > https://github.com/apache/spark/pull/30062#discussion_r514026014 > 2. When a stream doesn't get an opportunity to merge, then > {{RemoteBlockPushResolver}} ignores the data from that stream. Another > approach is to store the data of the stream in {{AppShufflePartitionInfo}} > when it reaches the worst-case scenario. This may increase the memory usage > of the shuffle service though. However, given a limit introduced with 1 we > can try this out. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33331) Limit the number of pending blocks in memory when RemoteBlockPushResolver defers a block
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated SPARK-1: -- Description: 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately are stored in memory. The stream callback maintains a list of {{deferredBufs}}. When a block cannot be merged it is added to this list. Currently, there isn't a limit on the number of pending blocks. There has been a discussion around this here: https://github.com/apache/spark/pull/30062#discussion_r514026014 2. When a stream doesn't get an opportunity to merge, then {{RemoteBlockPushResolver}} ignores the data from that stream. Another approach is to store the data of the stream in {{AppShufflePartitionInfo}} when it reaches the worst-case scenario. This may increase the memory usage of the shuffle service though. However, given a limit introduced with 1 we can try this out. was: This is to address the comment here: https://github.com/apache/spark/pull/30062#discussion_r514026014 > Limit the number of pending blocks in memory when RemoteBlockPushResolver > defers a block > > > Key: SPARK-1 > URL: https://issues.apache.org/jira/browse/SPARK-1 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Priority: Major > > 1. In {{RemoteBlockPushResolver}}, bytes that cannot be merged immediately > are stored in memory. The stream callback maintains a list of > {{deferredBufs}}. When a block cannot be merged it is added to this list. > Currently, there isn't a limit on the number of pending blocks. There has > been a discussion around this here: > https://github.com/apache/spark/pull/30062#discussion_r514026014 > 2. When a stream doesn't get an opportunity to merge, then > {{RemoteBlockPushResolver}} ignores the data from that stream. Another > approach is to store the data of the stream in {{AppShufflePartitionInfo}} > when it reaches the worst-case scenario. This may increase the memory usage > of the shuffle service though. However, given a limit introduced with 1 we > can try this out. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
[ https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226362#comment-17226362 ] Jim Kleckner commented on SPARK-33349: -- This fabric8 kubernetes-client issue/MR looks relevant: Repeated "too old resource version" exception with BaseOperation.waitUntilCondition(). #2414 * [https://github.com/fabric8io/kubernetes-client/issues/2414|https://github.com/fabric8io/kubernetes-client/issues/2414] * [https://github.com/fabric8io/kubernetes-client/pull/2424|https://github.com/fabric8io/kubernetes-client/pull/2424] This is released in [https://github.com/fabric8io/kubernetes-client/releases/tag/v4.12.0|https://github.com/fabric8io/kubernetes-client/releases/tag/v4.12.0] > ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed > -- > > Key: SPARK-33349 > URL: https://issues.apache.org/jira/browse/SPARK-33349 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.1, 3.0.2 >Reporter: Nicola Bova >Priority: Critical > > I launch my spark application with the > [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] > with the following yaml file: > {code:yaml} > apiVersion: sparkoperator.k8s.io/v1beta2 > kind: SparkApplication > metadata: > name: spark-kafka-streamer-test > namespace: kafka2hdfs > spec: > type: Scala > mode: cluster > image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 > imagePullPolicy: Always > timeToLiveSeconds: 259200 > mainClass: path.to.my.class.KafkaStreamer > mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar > sparkVersion: 3.0.1 > restartPolicy: > type: Always > sparkConf: > "spark.kafka.consumer.cache.capacity": "8192" > "spark.kubernetes.memoryOverheadFactor": "0.3" > deps: > jars: > - my > - jar > - list > hadoopConfigMap: hdfs-config > driver: > cores: 4 > memory: 12g > labels: > version: 3.0.1 > serviceAccount: default > javaOptions: > "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" > executor: > instances: 4 > cores: 4 > memory: 16g > labels: > version: 3.0.1 > javaOptions: > "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" > {code} > I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart > the watcher when we receive a version changed from > k8s"|https://github.com/apache/spark/pull/29533] patch. > This is the driver log: > {code} > 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > ... // my app log, it's a structured streaming app reading from kafka and > writing to hdfs > 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has > been closed (this is expected if the application is shutting down.) > io.fabric8.kubernetes.client.KubernetesClientException: too old resource > version: 1574101276 (1574213896) > at > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) > at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) > at > okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) > at > okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) > at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) > at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) > at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) > at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.base/java.lang.Thread.run(Unknown Source) > {code} > The error above appears after roughly 50 minutes. > After the exception above, no more logs are produced and the app hangs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
[ https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicola Bova updated SPARK-33349: Shepherd: Dongjoon Hyun > ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed > -- > > Key: SPARK-33349 > URL: https://issues.apache.org/jira/browse/SPARK-33349 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.1, 3.0.2 >Reporter: Nicola Bova >Priority: Critical > > I launch my spark application with the > [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] > with the following yaml file: > {code:yaml} > apiVersion: sparkoperator.k8s.io/v1beta2 > kind: SparkApplication > metadata: > name: spark-kafka-streamer-test > namespace: kafka2hdfs > spec: > type: Scala > mode: cluster > image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 > imagePullPolicy: Always > timeToLiveSeconds: 259200 > mainClass: path.to.my.class.KafkaStreamer > mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar > sparkVersion: 3.0.1 > restartPolicy: > type: Always > sparkConf: > "spark.kafka.consumer.cache.capacity": "8192" > "spark.kubernetes.memoryOverheadFactor": "0.3" > deps: > jars: > - my > - jar > - list > hadoopConfigMap: hdfs-config > driver: > cores: 4 > memory: 12g > labels: > version: 3.0.1 > serviceAccount: default > javaOptions: > "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" > executor: > instances: 4 > cores: 4 > memory: 16g > labels: > version: 3.0.1 > javaOptions: > "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" > {code} > I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart > the watcher when we receive a version changed from > k8s"|https://github.com/apache/spark/pull/29533] patch. > This is the driver log: > {code} > 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > ... // my app log, it's a structured streaming app reading from kafka and > writing to hdfs > 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has > been closed (this is expected if the application is shutting down.) > io.fabric8.kubernetes.client.KubernetesClientException: too old resource > version: 1574101276 (1574213896) > at > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) > at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) > at > okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) > at > okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) > at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) > at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) > at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) > at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.base/java.lang.Thread.run(Unknown Source) > {code} > The error above appears after roughly 50 minutes. > After the exception above, no more logs are produced and the app hangs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
[ https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicola Bova updated SPARK-33349: Description: I launch my spark application with the [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] with the following yaml file: {code:yaml} apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-kafka-streamer-test namespace: kafka2hdfs spec: type: Scala mode: cluster image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 imagePullPolicy: Always timeToLiveSeconds: 259200 mainClass: path.to.my.class.KafkaStreamer mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar sparkVersion: 3.0.1 restartPolicy: type: Always sparkConf: "spark.kafka.consumer.cache.capacity": "8192" "spark.kubernetes.memoryOverheadFactor": "0.3" deps: jars: - my - jar - list hadoopConfigMap: hdfs-config driver: cores: 4 memory: 12g labels: version: 3.0.1 serviceAccount: default javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" executor: instances: 4 cores: 4 memory: 16g labels: version: 3.0.1 javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" {code} I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart the watcher when we receive a version changed from k8s"|https://github.com/apache/spark/pull/29533] patch. This is the driver log: {code} 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ... // my app log, it's a structured streaming app reading from kafka and writing to hdfs 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 1574101276 (1574213896) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) {code} The error above appears after roughly 50 minutes. After the exception above, no more logs are produced and the app hangs. was: I launch my spark application with the [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] with the following yaml file: {code:yaml} apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-kafka-streamer-test namespace: kafka2hdfs spec: type: Scala mode: cluster image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 imagePullPolicy: Always timeToLiveSeconds: 259200 mainClass: path.to.my.class.KafkaStreamer mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar sparkVersion: 3.0.1 restartPolicy: type: Always sparkConf: "spark.kafka.consumer.cache.capacity": "8192" "spark.kubernetes.memoryOverheadFactor": "0.3" deps: jars: - my - jar - list hadoopConfigMap: hdfs-config driver: cores: 4 memory: 12g labels: version: 3.0.1 serviceAccount: default javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" executor: instances: 4 cores: 4 memory: 16g labels: version: 3.0.1 javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" {code} This is the driver log: {code} 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ... // my app log, it's a structured streaming app reading from kafka and writing to hdfs 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 1574101276 (1574213896) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealW
[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
[ https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicola Bova updated SPARK-33349: Description: I launch my spark application with the [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] with the following yaml file: {code:yaml} apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-kafka-streamer-test namespace: kafka2hdfs spec: type: Scala mode: cluster image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 imagePullPolicy: Always timeToLiveSeconds: 259200 mainClass: path.to.my.class.KafkaStreamer mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar sparkVersion: 3.0.1 restartPolicy: type: Always sparkConf: "spark.kafka.consumer.cache.capacity": "8192" "spark.kubernetes.memoryOverheadFactor": "0.3" deps: jars: - my - jar - list hadoopConfigMap: hdfs-config driver: cores: 4 memory: 12g labels: version: 3.0.1 serviceAccount: default javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" executor: instances: 4 cores: 4 memory: 16g labels: version: 3.0.1 javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" {code} This is the driver log: {code} 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ... // my app log, it's a structured streaming app reading from kafka and writing to hdfs 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 1574101276 (1574213896) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) {code} The error above appears after roughly 50 minutes. After the exception above, no more logs are produced and the app hangs. was: I launch my spark application with the [spark-on-kubernetes-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) with the following yaml file: {code:yaml} apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-kafka-streamer-test namespace: kafka2hdfs spec: type: Scala mode: cluster image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 imagePullPolicy: Always timeToLiveSeconds: 259200 mainClass: path.to.my.class.KafkaStreamer mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar sparkVersion: 3.0.1 restartPolicy: type: Always sparkConf: "spark.kafka.consumer.cache.capacity": "8192" "spark.kubernetes.memoryOverheadFactor": "0.3" deps: jars: - my - jar - list hadoopConfigMap: hdfs-config driver: cores: 4 memory: 12g labels: version: 3.0.1 serviceAccount: default javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" executor: instances: 4 cores: 4 memory: 16g labels: version: 3.0.1 javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" {code} This is the driver log: {code} 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ... // my app log, it's a structured streaming app reading from kafka and writing to hdfs 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 1574101276 (1574213896) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:10
[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
[ https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicola Bova updated SPARK-33349: Description: I launch my spark application with the [spark-on-kubernetes-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) with the following yaml file: {code:yaml} apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-kafka-streamer-test namespace: kafka2hdfs spec: type: Scala mode: cluster image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 imagePullPolicy: Always timeToLiveSeconds: 259200 mainClass: path.to.my.class.KafkaStreamer mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar sparkVersion: 3.0.1 restartPolicy: type: Always sparkConf: "spark.kafka.consumer.cache.capacity": "8192" "spark.kubernetes.memoryOverheadFactor": "0.3" deps: jars: - my - jar - list hadoopConfigMap: hdfs-config driver: cores: 4 memory: 12g labels: version: 3.0.1 serviceAccount: default javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" executor: instances: 4 cores: 4 memory: 16g labels: version: 3.0.1 javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" {code} This is the driver log: {code} 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ... // my app log, it's a structured streaming app reading from kafka and writing to hdfs 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 1574101276 (1574213896) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) {code} The error above appears after roughly 50 minutes. After the exception above, no more logs are produced and the app hangs. was: I launch my spark application with the [spark-on-kubernetes-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) with the following yaml file: {code:yaml} apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-kafka-streamer-test namespace: kafka2hdfs spec: type: Scala mode: cluster image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 imagePullPolicy: Always timeToLiveSeconds: 259200 mainClass: path.to.my.class.KafkaStreamer mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar sparkVersion: 3.0.1 restartPolicy: type: Always sparkConf: "spark.kafka.consumer.cache.capacity": "8192" "spark.kubernetes.memoryOverheadFactor": "0.3" deps: jars: - my - jar - list hadoopConfigMap: hdfs-config driver: cores: 4 memory: 12g labels: version: 3.0.1 serviceAccount: default javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" executor: instances: 4 cores: 4 memory: 16g labels: version: 3.0.1 javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" {code} This is the driver log: ``` 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ... // my app log, it's a structured streaming app reading from kafka and writing to hdfs 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 1574101276 (1574213896) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105
[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
[ https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicola Bova updated SPARK-33349: Description: I launch my spark application with the [spark-on-kubernetes-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) with the following yaml file: ``` apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-kafka-streamer-test namespace: kafka2hdfs spec: type: Scala mode: cluster image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 imagePullPolicy: Always timeToLiveSeconds: 259200 mainClass: path.to.my.class.KafkaStreamer mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar sparkVersion: 3.0.1 restartPolicy: type: Always sparkConf: "spark.kafka.consumer.cache.capacity": "8192" "spark.kubernetes.memoryOverheadFactor": "0.3" deps: jars: - my - jar - list hadoopConfigMap: hdfs-config driver: cores: 4 memory: 12g labels: version: 3.0.1 serviceAccount: default javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" executor: instances: 4 cores: 4 memory: 16g labels: version: 3.0.1 javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" ``` This is the driver log: ``` 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ... // my app log, it's a structured streaming app reading from kafka and writing to hdfs 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 1574101276 (1574213896) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) ``` The error above appears after roughly 50 minutes. After the exception above, no more logs are produced and the app hangs. was: I launch my spark application with the [spark-on-kubernetes-operator]([https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)] with the following yaml file: ``` apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-kafka-streamer-test namespace: kafka2hdfs spec: type: Scala mode: cluster image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 imagePullPolicy: Always timeToLiveSeconds: 259200 mainClass: path.to.my.class.KafkaStreamer mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar sparkVersion: 3.0.1 restartPolicy: type: Always sparkConf: "spark.kafka.consumer.cache.capacity": "8192" "spark.kubernetes.memoryOverheadFactor": "0.3" deps: jars: - my - jar - list hadoopConfigMap: hdfs-config driver: cores: 4 memory: 12g labels: version: 3.0.1 serviceAccount: default javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" executor: instances: 4 cores: 4 memory: 16g labels: version: 3.0.1 javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" ``` This is the driver log: ``` 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ... // my app log, it's a structured streaming app reading from kafka and writing to hdfs 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 1574101276 (1574213896) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) at okhttp3.internal
[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
[ https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicola Bova updated SPARK-33349: Description: I launch my spark application with the [spark-on-kubernetes-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) with the following yaml file: {code:yaml} apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-kafka-streamer-test namespace: kafka2hdfs spec: type: Scala mode: cluster image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 imagePullPolicy: Always timeToLiveSeconds: 259200 mainClass: path.to.my.class.KafkaStreamer mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar sparkVersion: 3.0.1 restartPolicy: type: Always sparkConf: "spark.kafka.consumer.cache.capacity": "8192" "spark.kubernetes.memoryOverheadFactor": "0.3" deps: jars: - my - jar - list hadoopConfigMap: hdfs-config driver: cores: 4 memory: 12g labels: version: 3.0.1 serviceAccount: default javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" executor: instances: 4 cores: 4 memory: 16g labels: version: 3.0.1 javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" {code} This is the driver log: ``` 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ... // my app log, it's a structured streaming app reading from kafka and writing to hdfs 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 1574101276 (1574213896) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) ``` The error above appears after roughly 50 minutes. After the exception above, no more logs are produced and the app hangs. was: I launch my spark application with the [spark-on-kubernetes-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) with the following yaml file: ``` apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-kafka-streamer-test namespace: kafka2hdfs spec: type: Scala mode: cluster image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 imagePullPolicy: Always timeToLiveSeconds: 259200 mainClass: path.to.my.class.KafkaStreamer mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar sparkVersion: 3.0.1 restartPolicy: type: Always sparkConf: "spark.kafka.consumer.cache.capacity": "8192" "spark.kubernetes.memoryOverheadFactor": "0.3" deps: jars: - my - jar - list hadoopConfigMap: hdfs-config driver: cores: 4 memory: 12g labels: version: 3.0.1 serviceAccount: default javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" executor: instances: 4 cores: 4 memory: 16g labels: version: 3.0.1 javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" ``` This is the driver log: ``` 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ... // my app log, it's a structured streaming app reading from kafka and writing to hdfs 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 1574101276 (1574213896) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) at okhttp3.
[jira] [Updated] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
[ https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicola Bova updated SPARK-33349: Description: I launch my spark application with the [spark-on-kubernetes-operator]([https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)] with the following yaml file: ``` apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-kafka-streamer-test namespace: kafka2hdfs spec: type: Scala mode: cluster image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 imagePullPolicy: Always timeToLiveSeconds: 259200 mainClass: path.to.my.class.KafkaStreamer mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar sparkVersion: 3.0.1 restartPolicy: type: Always sparkConf: "spark.kafka.consumer.cache.capacity": "8192" "spark.kubernetes.memoryOverheadFactor": "0.3" deps: jars: - my - jar - list hadoopConfigMap: hdfs-config driver: cores: 4 memory: 12g labels: version: 3.0.1 serviceAccount: default javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" executor: instances: 4 cores: 4 memory: 16g labels: version: 3.0.1 javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" ``` This is the driver log: ``` 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ... // my app log, it's a structured streaming app reading from kafka and writing to hdfs 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 1574101276 (1574213896) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) ``` The error above appears after roughly 50 minutes. After the exception above, no more logs are produced and the app hangs. was: I launch my spark application with the [spark-on-kubernetes-operator]([https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)] with the following yaml file: ``` apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-kafka-streamer-test namespace: kafka2hdfs spec: type: Scala mode: cluster image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 imagePullPolicy: Always timeToLiveSeconds: 259200 mainClass: path.to.my.class.KafkaStreamer mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar sparkVersion: 3.0.1 restartPolicy: type: Always sparkConf: "spark.kafka.consumer.cache.capacity": "8192" "spark.kubernetes.memoryOverheadFactor": "0.3" deps: jars: - my - jar - list hadoopConfigMap: hdfs-config driver: cores: 4 memory: 12g labels: version: 3.0.1 serviceAccount: default javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" executor: instances: 4 cores: 4 memory: 16g labels: version: 3.0.1 javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" ``` This is the driver log: ``` 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ... // my app log, it's a structured streaming app reading from kafka and writing to hdfs 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 1574101276 (1574213896) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) at okhttp3.inter
[jira] [Created] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
Nicola Bova created SPARK-33349: --- Summary: ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed Key: SPARK-33349 URL: https://issues.apache.org/jira/browse/SPARK-33349 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 3.0.1, 3.0.2 Reporter: Nicola Bova I launch my spark application with the [spark-on-kubernetes-operator]([https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)] with the following yaml file: ``` apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-kafka-streamer-test namespace: kafka2hdfs spec: type: Scala mode: cluster image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0 imagePullPolicy: Always timeToLiveSeconds: 259200 mainClass: path.to.my.class.KafkaStreamer mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar sparkVersion: 3.0.1 restartPolicy: type: Always sparkConf: "spark.kafka.consumer.cache.capacity": "8192" "spark.kubernetes.memoryOverheadFactor": "0.3" deps: jars: - my - jar - list hadoopConfigMap: hdfs-config driver: cores: 4 memory: 12g labels: version: 3.0.1 serviceAccount: default javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" executor: instances: 4 cores: 4 memory: 16g labels: version: 3.0.1 javaOptions: "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties" ``` This is the driver log: ``` 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ... // my app log, it's a structured streaming app reading from kafka and writing to hdfs 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 1574101276 (1574213896) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) ``` The error above appears after roughly 50 minutes. After the exception above, no more logs are produced and the app hangs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33348) Use scala.jdk.CollectionConverters replace scala.collection.JavaConverters
Yang Jie created SPARK-33348: Summary: Use scala.jdk.CollectionConverters replace scala.collection.JavaConverters Key: SPARK-33348 URL: https://issues.apache.org/jira/browse/SPARK-33348 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 3.1.0 Reporter: Yang Jie `scala.collection.JavaConverters` is deprecated in Scala 2.13, there are many compilation warnings about this, should use `scala.jdk.CollectionConverters` replace it. But `scala.jdk.CollectionConverters` only available in Scala 2.13. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33343) Fix the build with sbt to copy hadoop-client-runtime.jar
[ https://issues.apache.org/jira/browse/SPARK-33343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-33343: --- Description: With the current master, spark-shell doesn't work if it's built with sbt package. It's due to hadoop-client-runtime.jar isn't copied to assembly/target/scala-2.12/jars. {code} $ bin/spark-shell Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426) at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 11 more {code} was: With the current master, spark-shell doesn't work if it's built with sbt. It's due to hadoop-client-runtime.jar isn't copied to assembly/target/scala-2.12/jars. {code} $ bin/spark-shell Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426) at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 11 more {code} > Fix the build with sbt to copy hadoop-client-runtime.jar > > > Key: SPARK-33343 > URL: https://issues.apache.org/jira/browse/SPARK-33343 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > With the current master, spark-shell doesn't work if it's built with sbt > package. > It's due to hadoop-client-runtime.jar isn't copied to > assembly/target/scala-2.12/jars. > {code} > $ bin/spark-shell > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper > at > org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426) > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy
[jira] [Updated] (SPARK-33343) Fix the build with sbt to copy hadoop-client-runtime.jar
[ https://issues.apache.org/jira/browse/SPARK-33343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-33343: --- Priority: Major (was: Critical) > Fix the build with sbt to copy hadoop-client-runtime.jar > > > Key: SPARK-33343 > URL: https://issues.apache.org/jira/browse/SPARK-33343 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > With the current master, spark-shell doesn't work if it's built with sbt. > It's due to hadoop-client-runtime.jar isn't copied to > assembly/target/scala-2.12/jars. > {code} > $ bin/spark-shell > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper > at > org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426) > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 11 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33338) GROUP BY using literal map should not fail
[ https://issues.apache.org/jira/browse/SPARK-8?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-8. --- Fix Version/s: 2.4.8 3.0.2 3.1.0 Resolution: Fixed Issue resolved by pull request 30246 [https://github.com/apache/spark/pull/30246] > GROUP BY using literal map should not fail > -- > > Key: SPARK-8 > URL: https://issues.apache.org/jira/browse/SPARK-8 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.7, 3.0.1, 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0, 3.0.2, 2.4.8 > > > Apache Spark 2.x ~ 3.0.1 raise`RuntimeException` for the following queries. > *SQL* > {code} > CREATE TABLE t USING ORC AS SELECT map('k1', 'v1') m, 'k1' k > SELECT map('k1', 'v1')[k] FROM t GROUP BY 1 > SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k] > SELECT map('k1', 'v1')[k] a FROM t GROUP BY a > {code} > *ERROR* > {code} > Caused by: java.lang.RuntimeException: Couldn't find k#3 in [keys: [k1], > values: [v1][k#3]#6] > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:85) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:79) > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) > {code} > This is a regression from Apache Spark 1.6.x. > {code} > scala> sc.version > res1: String = 1.6.3 > scala> sqlContext.sql("SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', > 'v1')[k]").show > +---+ > |_c0| > +---+ > | v1| > +---+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33347) Clean up useless variables in MutableApplicationInfo
[ https://issues.apache.org/jira/browse/SPARK-33347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33347: Assignee: (was: Apache Spark) > Clean up useless variables in MutableApplicationInfo > > > Key: SPARK-33347 > URL: https://issues.apache.org/jira/browse/SPARK-33347 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Major > > > {code:java} > private class MutableApplicationInfo { > var id: String = null > var name: String = null > var coresGranted: Option[Int] = None > var maxCores: Option[Int] = None > var coresPerExecutor: Option[Int] = None > var memoryPerExecutorMB: Option[Int] = None > def toView(): ApplicationInfoWrapper = { > val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, > coresPerExecutor, > memoryPerExecutorMB, Nil) > new ApplicationInfoWrapper(apiInfo, List(attempt.toView())) > } > } > {code} > > coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None > and never reassign -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33347) Clean up useless variables in MutableApplicationInfo
[ https://issues.apache.org/jira/browse/SPARK-33347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226265#comment-17226265 ] Apache Spark commented on SPARK-33347: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/30251 > Clean up useless variables in MutableApplicationInfo > > > Key: SPARK-33347 > URL: https://issues.apache.org/jira/browse/SPARK-33347 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Major > > > {code:java} > private class MutableApplicationInfo { > var id: String = null > var name: String = null > var coresGranted: Option[Int] = None > var maxCores: Option[Int] = None > var coresPerExecutor: Option[Int] = None > var memoryPerExecutorMB: Option[Int] = None > def toView(): ApplicationInfoWrapper = { > val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, > coresPerExecutor, > memoryPerExecutorMB, Nil) > new ApplicationInfoWrapper(apiInfo, List(attempt.toView())) > } > } > {code} > > coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None > and never reassign -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33347) Clean up useless variables in MutableApplicationInfo
[ https://issues.apache.org/jira/browse/SPARK-33347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33347: Assignee: Apache Spark > Clean up useless variables in MutableApplicationInfo > > > Key: SPARK-33347 > URL: https://issues.apache.org/jira/browse/SPARK-33347 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > > {code:java} > private class MutableApplicationInfo { > var id: String = null > var name: String = null > var coresGranted: Option[Int] = None > var maxCores: Option[Int] = None > var coresPerExecutor: Option[Int] = None > var memoryPerExecutorMB: Option[Int] = None > def toView(): ApplicationInfoWrapper = { > val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, > coresPerExecutor, > memoryPerExecutorMB, Nil) > new ApplicationInfoWrapper(apiInfo, List(attempt.toView())) > } > } > {code} > > coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None > and never reassign -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33347) Clean up useless variables in MutableApplicationInfo
[ https://issues.apache.org/jira/browse/SPARK-33347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-33347: - Description: {code:java} private class MutableApplicationInfo { var id: String = null var name: String = null var coresGranted: Option[Int] = None var maxCores: Option[Int] = None var coresPerExecutor: Option[Int] = None var memoryPerExecutorMB: Option[Int] = None def toView(): ApplicationInfoWrapper = { val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, coresPerExecutor, memoryPerExecutorMB, Nil) new ApplicationInfoWrapper(apiInfo, List(attempt.toView())) } } {code} coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None and never reassign was: {code:java} private class MutableApplicationInfo { var id: String = null var name: String = null var coresGranted: Option[Int] = None var maxCores: Option[Int] = None var coresPerExecutor: Option[Int] = None var memoryPerExecutorMB: Option[Int] = None def toView(): ApplicationInfoWrapper = { val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, coresPerExecutor, memoryPerExecutorMB, Nil) new ApplicationInfoWrapper(apiInfo, List(attempt.toView())) } } {code} coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None and no place to reassign > Clean up useless variables in MutableApplicationInfo > > > Key: SPARK-33347 > URL: https://issues.apache.org/jira/browse/SPARK-33347 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Major > > > {code:java} > private class MutableApplicationInfo { > var id: String = null > var name: String = null > var coresGranted: Option[Int] = None > var maxCores: Option[Int] = None > var coresPerExecutor: Option[Int] = None > var memoryPerExecutorMB: Option[Int] = None > def toView(): ApplicationInfoWrapper = { > val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, > coresPerExecutor, > memoryPerExecutorMB, Nil) > new ApplicationInfoWrapper(apiInfo, List(attempt.toView())) > } > } > {code} > > coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None > and never reassign -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33347) Clean up useless variables in MutableApplicationInfo
[ https://issues.apache.org/jira/browse/SPARK-33347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-33347: - Description: {code:java} private class MutableApplicationInfo { var id: String = null var name: String = null var coresGranted: Option[Int] = None var maxCores: Option[Int] = None var coresPerExecutor: Option[Int] = None var memoryPerExecutorMB: Option[Int] = None def toView(): ApplicationInfoWrapper = { val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, coresPerExecutor, memoryPerExecutorMB, Nil) new ApplicationInfoWrapper(apiInfo, List(attempt.toView())) } } {code} coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None and no place to reassign was: {code:java} private class MutableApplicationInfo { var id: String = null var name: String = null var coresGranted: Option[Int] = None var maxCores: Option[Int] = None var coresPerExecutor: Option[Int] = None var memoryPerExecutorMB: Option[Int] = None def toView(): ApplicationInfoWrapper = { val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, coresPerExecutor, memoryPerExecutorMB, Nil) new ApplicationInfoWrapper(apiInfo, List(attempt.toView())) } } {code} coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None no place to reassign > Clean up useless variables in MutableApplicationInfo > > > Key: SPARK-33347 > URL: https://issues.apache.org/jira/browse/SPARK-33347 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Major > > > {code:java} > private class MutableApplicationInfo { > var id: String = null > var name: String = null > var coresGranted: Option[Int] = None > var maxCores: Option[Int] = None > var coresPerExecutor: Option[Int] = None > var memoryPerExecutorMB: Option[Int] = None > def toView(): ApplicationInfoWrapper = { > val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, > coresPerExecutor, > memoryPerExecutorMB, Nil) > new ApplicationInfoWrapper(apiInfo, List(attempt.toView())) > } > } > {code} > > coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None > and no place to reassign -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33347) Clean up useless variables in MutableApplicationInfo
[ https://issues.apache.org/jira/browse/SPARK-33347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-33347: - Description: {code:java} private class MutableApplicationInfo { var id: String = null var name: String = null var coresGranted: Option[Int] = None var maxCores: Option[Int] = None var coresPerExecutor: Option[Int] = None var memoryPerExecutorMB: Option[Int] = None def toView(): ApplicationInfoWrapper = { val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, coresPerExecutor, memoryPerExecutorMB, Nil) new ApplicationInfoWrapper(apiInfo, List(attempt.toView())) } } {code} coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None no place to reassign was: {code:java} private class MutableApplicationInfo { var id: String = null var name: String = null var coresGranted: Option[Int] = None var maxCores: Option[Int] = None var coresPerExecutor: Option[Int] = None var memoryPerExecutorMB: Option[Int] = None def toView(): ApplicationInfoWrapper = { val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, coresPerExecutor, memoryPerExecutorMB, Nil) new ApplicationInfoWrapper(apiInfo, List(attempt.toView())) } } {code} coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB alway None no place to reassign > Clean up useless variables in MutableApplicationInfo > > > Key: SPARK-33347 > URL: https://issues.apache.org/jira/browse/SPARK-33347 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Major > > > {code:java} > private class MutableApplicationInfo { > var id: String = null > var name: String = null > var coresGranted: Option[Int] = None > var maxCores: Option[Int] = None > var coresPerExecutor: Option[Int] = None > var memoryPerExecutorMB: Option[Int] = None > def toView(): ApplicationInfoWrapper = { > val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, > coresPerExecutor, > memoryPerExecutorMB, Nil) > new ApplicationInfoWrapper(apiInfo, List(attempt.toView())) > } > } > {code} > > coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB always None > no place to reassign -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33347) Clean up useless variables in MutableApplicationInfo
Yang Jie created SPARK-33347: Summary: Clean up useless variables in MutableApplicationInfo Key: SPARK-33347 URL: https://issues.apache.org/jira/browse/SPARK-33347 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.1.0 Reporter: Yang Jie {code:java} private class MutableApplicationInfo { var id: String = null var name: String = null var coresGranted: Option[Int] = None var maxCores: Option[Int] = None var coresPerExecutor: Option[Int] = None var memoryPerExecutorMB: Option[Int] = None def toView(): ApplicationInfoWrapper = { val apiInfo = ApplicationInfo(id, name, coresGranted, maxCores, coresPerExecutor, memoryPerExecutorMB, Nil) new ApplicationInfoWrapper(apiInfo, List(attempt.toView())) } } {code} coresGranted, maxCores, coresPerExecutor and memoryPerExecutorMB alway None no place to reassign -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33346) Change the never changed var to val
[ https://issues.apache.org/jira/browse/SPARK-33346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226255#comment-17226255 ] Yang Jie commented on SPARK-33346: -- Above case still can't use as val, also throw illegal access error at runtime now > Change the never changed var to val > --- > > Key: SPARK-33346 > URL: https://issues.apache.org/jira/browse/SPARK-33346 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Minor > > Some local variables are declared as "var", but they are never reassigned and > should be declared as "val". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33285) Too many "Auto-application to `()` is deprecated." related compilation warnings
[ https://issues.apache.org/jira/browse/SPARK-33285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226251#comment-17226251 ] Guillaume Martres commented on SPARK-33285: --- Note that Scala 2.13 has a configurable warning mechanism, making it possible to hide some warnings: [https://github.com/scala/scala/pull/8373,] this can be combined with {{-Xfatal-warnings}} to enforce a warning-free build without actually fixing all warnings. > Too many "Auto-application to `()` is deprecated." related compilation > warnings > > > Key: SPARK-33285 > URL: https://issues.apache.org/jira/browse/SPARK-33285 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Minor > > There are too many "Auto-application to `()` is deprecated." related > compilation warnings when compile with Scala 2.13 like > {code:java} > [WARNING] [Warn] > /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: > Auto-application to `()` is deprecated. Supply the empty argument list `()` > explicitly to invoke method stdev, > or remove the empty argument list from its definition (Java-defined methods > are exempt). > In Scala 3, an unapplied method like this will be eta-expanded into a > function. > {code} > A lot of them, but it's easy to fix. > If there is a definition as follows: > {code:java} > Class Foo { >def bar(): Unit = {} > } > val foo = new Foo{code} > Should be > {code:java} > foo.bar() > {code} > not > {code:java} > foo.bar {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33285) Too many "Auto-application to `()` is deprecated." related compilation warnings
[ https://issues.apache.org/jira/browse/SPARK-33285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226250#comment-17226250 ] Yang Jie commented on SPARK-33285: -- {quote}Replacing {{'foo}} by {{Symbol("foo")}} will get rid of the warning and is compatible with all Scala versions. {quote} [~smarter] , You're right. :) > Too many "Auto-application to `()` is deprecated." related compilation > warnings > > > Key: SPARK-33285 > URL: https://issues.apache.org/jira/browse/SPARK-33285 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Minor > > There are too many "Auto-application to `()` is deprecated." related > compilation warnings when compile with Scala 2.13 like > {code:java} > [WARNING] [Warn] > /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: > Auto-application to `()` is deprecated. Supply the empty argument list `()` > explicitly to invoke method stdev, > or remove the empty argument list from its definition (Java-defined methods > are exempt). > In Scala 3, an unapplied method like this will be eta-expanded into a > function. > {code} > A lot of them, but it's easy to fix. > If there is a definition as follows: > {code:java} > Class Foo { >def bar(): Unit = {} > } > val foo = new Foo{code} > Should be > {code:java} > foo.bar() > {code} > not > {code:java} > foo.bar {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33285) Too many "Auto-application to `()` is deprecated." related compilation warnings
[ https://issues.apache.org/jira/browse/SPARK-33285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226248#comment-17226248 ] Yang Jie commented on SPARK-33285: -- [~srowen] Yes, "Auto-application to `()` is deprecated." warnings will cover up other warnings because there are too many, but some of them are known, and I've add other JIRA > Too many "Auto-application to `()` is deprecated." related compilation > warnings > > > Key: SPARK-33285 > URL: https://issues.apache.org/jira/browse/SPARK-33285 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Minor > > There are too many "Auto-application to `()` is deprecated." related > compilation warnings when compile with Scala 2.13 like > {code:java} > [WARNING] [Warn] > /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: > Auto-application to `()` is deprecated. Supply the empty argument list `()` > explicitly to invoke method stdev, > or remove the empty argument list from its definition (Java-defined methods > are exempt). > In Scala 3, an unapplied method like this will be eta-expanded into a > function. > {code} > A lot of them, but it's easy to fix. > If there is a definition as follows: > {code:java} > Class Foo { >def bar(): Unit = {} > } > val foo = new Foo{code} > Should be > {code:java} > foo.bar() > {code} > not > {code:java} > foo.bar {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29392) Remove use of deprecated symbol literal " 'name " syntax in favor Symbol("name")
[ https://issues.apache.org/jira/browse/SPARK-29392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226244#comment-17226244 ] Yang Jie commented on SPARK-29392: -- OK > Remove use of deprecated symbol literal " 'name " syntax in favor > Symbol("name") > > > Key: SPARK-29392 > URL: https://issues.apache.org/jira/browse/SPARK-29392 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL, Tests >Affects Versions: 3.0.0 >Reporter: Sean R. Owen >Assignee: Sean R. Owen >Priority: Minor > Fix For: 3.0.0 > > > Example: > {code} > [WARNING] [Warn] > /Users/seanowen/Documents/spark_2.13/core/src/test/scala/org/apache/spark/memory/UnifiedMemoryManagerSuite.scala:308: > symbol literal is deprecated; use Symbol("assertInvariants") instead > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33341) Remove unnecessary semicolons
[ https://issues.apache.org/jira/browse/SPARK-33341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33341. -- Resolution: Won't Fix > Remove unnecessary semicolons > - > > Key: SPARK-33341 > URL: https://issues.apache.org/jira/browse/SPARK-33341 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Minor > > There are some unnecessary semicolons in Spark code because Scala doesn't > really need them, to unify the style, we should remove it -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29392) Remove use of deprecated symbol literal " 'name " syntax in favor Symbol("name")
[ https://issues.apache.org/jira/browse/SPARK-29392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226230#comment-17226230 ] Sean R. Owen commented on SPARK-29392: -- For this one, we already started I suppose, so OK to finish. > Remove use of deprecated symbol literal " 'name " syntax in favor > Symbol("name") > > > Key: SPARK-29392 > URL: https://issues.apache.org/jira/browse/SPARK-29392 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL, Tests >Affects Versions: 3.0.0 >Reporter: Sean R. Owen >Assignee: Sean R. Owen >Priority: Minor > Fix For: 3.0.0 > > > Example: > {code} > [WARNING] [Warn] > /Users/seanowen/Documents/spark_2.13/core/src/test/scala/org/apache/spark/memory/UnifiedMemoryManagerSuite.scala:308: > symbol literal is deprecated; use Symbol("assertInvariants") instead > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33346) Change the never changed var to val
[ https://issues.apache.org/jira/browse/SPARK-33346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226228#comment-17226228 ] Yang Jie edited comment on SPARK-33346 at 11/4/20, 3:25 PM: There are some code and comments as follow: {code:java} // They also should have been val's. We use var's because there is a Scala compiler bug that // would throw illegal access error at runtime if they are declared as val's. protected var grow = (newCapacity: Int) => { _oldValues = _values _values = new Array[V](newCapacity) } protected var move = (oldPos: Int, newPos: Int) => { _values(newPos) = _oldValues(oldPos) } {code} Need to test whether the current version of Scala(2.12 & 2.13) still has this problem, No additional information was found in the original pr was (Author: luciferyang): There are some code and comments as follow: {code:java} // They also should have been val's. We use var's because there is a Scala compiler bug that // would throw illegal access error at runtime if they are declared as val's. protected var grow = (newCapacity: Int) => { _oldValues = _values _values = new Array[V](newCapacity) } protected var move = (oldPos: Int, newPos: Int) => { _values(newPos) = _oldValues(oldPos) } {code} Need to test whether the current version of Scala(2.12 & 2.13) still has this problem > Change the never changed var to val > --- > > Key: SPARK-33346 > URL: https://issues.apache.org/jira/browse/SPARK-33346 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Minor > > Some local variables are declared as "var", but they are never reassigned and > should be declared as "val". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33346) Change the never changed var to val
[ https://issues.apache.org/jira/browse/SPARK-33346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226228#comment-17226228 ] Yang Jie commented on SPARK-33346: -- There are some code and comments as follow: {code:java} // They also should have been val's. We use var's because there is a Scala compiler bug that // would throw illegal access error at runtime if they are declared as val's. protected var grow = (newCapacity: Int) => { _oldValues = _values _values = new Array[V](newCapacity) } protected var move = (oldPos: Int, newPos: Int) => { _values(newPos) = _oldValues(oldPos) } {code} Need to test whether the current version of Scala(2.12 & 2.13) still has this problem > Change the never changed var to val > --- > > Key: SPARK-33346 > URL: https://issues.apache.org/jira/browse/SPARK-33346 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Minor > > Some local variables are declared as "var", but they are never reassigned and > should be declared as "val". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33285) Too many "Auto-application to `()` is deprecated." related compilation warnings
[ https://issues.apache.org/jira/browse/SPARK-33285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226189#comment-17226189 ] Sean R. Owen commented on SPARK-33285: -- I think it's OK to fix the Symbol issue (we already started that; it's separate). For this, it's such a big change right now that I'm neutral. If it's making it hard to detect real other warnings to fix, maybe. > Too many "Auto-application to `()` is deprecated." related compilation > warnings > > > Key: SPARK-33285 > URL: https://issues.apache.org/jira/browse/SPARK-33285 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Minor > > There are too many "Auto-application to `()` is deprecated." related > compilation warnings when compile with Scala 2.13 like > {code:java} > [WARNING] [Warn] > /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: > Auto-application to `()` is deprecated. Supply the empty argument list `()` > explicitly to invoke method stdev, > or remove the empty argument list from its definition (Java-defined methods > are exempt). > In Scala 3, an unapplied method like this will be eta-expanded into a > function. > {code} > A lot of them, but it's easy to fix. > If there is a definition as follows: > {code:java} > Class Foo { >def bar(): Unit = {} > } > val foo = new Foo{code} > Should be > {code:java} > foo.bar() > {code} > not > {code:java} > foo.bar {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33346) Change the never changed var to val
Yang Jie created SPARK-33346: Summary: Change the never changed var to val Key: SPARK-33346 URL: https://issues.apache.org/jira/browse/SPARK-33346 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Affects Versions: 3.1.0 Reporter: Yang Jie Some local variables are declared as "var", but they are never reassigned and should be declared as "val". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33345) Batch fix compilation warnings about "Widening conversion from XXX to XXX is deprecated"
Yang Jie created SPARK-33345: Summary: Batch fix compilation warnings about "Widening conversion from XXX to XXX is deprecated" Key: SPARK-33345 URL: https://issues.apache.org/jira/browse/SPARK-33345 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 3.1.0 Reporter: Yang Jie There is a batch of compilation warnings in Scala 2.13 as follows: {code:java} [WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/input/FixedLengthBinaryInputFormat.scala:77: Widening conversion from Long to Double is deprecated because it loses precision. Write `.toDouble` instead. {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33344) Fix Compilation warings of "multiarg infix syntax looks like a tuple and will be deprecated" in Scala 2.13
Yang Jie created SPARK-33344: Summary: Fix Compilation warings of "multiarg infix syntax looks like a tuple and will be deprecated" in Scala 2.13 Key: SPARK-33344 URL: https://issues.apache.org/jira/browse/SPARK-33344 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 3.1.0 Reporter: Yang Jie There is a batch of compilation warnings in Scala 2.13 as follow: {code:java} [WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala:656: multiarg infix syntax looks like a tuple and will be deprecated {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33343) Fix the build with sbt to copy hadoop-client-runtime.jar
[ https://issues.apache.org/jira/browse/SPARK-33343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226080#comment-17226080 ] Apache Spark commented on SPARK-33343: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/30250 > Fix the build with sbt to copy hadoop-client-runtime.jar > > > Key: SPARK-33343 > URL: https://issues.apache.org/jira/browse/SPARK-33343 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Critical > > With the current master, spark-shell doesn't work if it's built with sbt. > It's due to hadoop-client-runtime.jar isn't copied to > assembly/target/scala-2.12/jars. > {code} > $ bin/spark-shell > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper > at > org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426) > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 11 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33343) Fix the build with sbt to copy hadoop-client-runtime.jar
[ https://issues.apache.org/jira/browse/SPARK-33343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33343: Assignee: Apache Spark (was: Kousuke Saruta) > Fix the build with sbt to copy hadoop-client-runtime.jar > > > Key: SPARK-33343 > URL: https://issues.apache.org/jira/browse/SPARK-33343 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Critical > > With the current master, spark-shell doesn't work if it's built with sbt. > It's due to hadoop-client-runtime.jar isn't copied to > assembly/target/scala-2.12/jars. > {code} > $ bin/spark-shell > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper > at > org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426) > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 11 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33343) Fix the build with sbt to copy hadoop-client-runtime.jar
[ https://issues.apache.org/jira/browse/SPARK-33343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33343: Assignee: Kousuke Saruta (was: Apache Spark) > Fix the build with sbt to copy hadoop-client-runtime.jar > > > Key: SPARK-33343 > URL: https://issues.apache.org/jira/browse/SPARK-33343 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Critical > > With the current master, spark-shell doesn't work if it's built with sbt. > It's due to hadoop-client-runtime.jar isn't copied to > assembly/target/scala-2.12/jars. > {code} > $ bin/spark-shell > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper > at > org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426) > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 11 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33343) Fix the build with sbt to copy hadoop-client-runtime.jar
[ https://issues.apache.org/jira/browse/SPARK-33343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226079#comment-17226079 ] Apache Spark commented on SPARK-33343: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/30250 > Fix the build with sbt to copy hadoop-client-runtime.jar > > > Key: SPARK-33343 > URL: https://issues.apache.org/jira/browse/SPARK-33343 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Critical > > With the current master, spark-shell doesn't work if it's built with sbt. > It's due to hadoop-client-runtime.jar isn't copied to > assembly/target/scala-2.12/jars. > {code} > $ bin/spark-shell > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper > at > org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426) > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 11 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33343) Fix the build with sbt to copy hadoop-client-runtime.jar
Kousuke Saruta created SPARK-33343: -- Summary: Fix the build with sbt to copy hadoop-client-runtime.jar Key: SPARK-33343 URL: https://issues.apache.org/jira/browse/SPARK-33343 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.1.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta With the current master, spark-shell doesn't work if it's built with sbt. It's due to hadoop-client-runtime.jar isn't copied to assembly/target/scala-2.12/jars. {code} $ bin/spark-shell Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/shaded/com/ctc/wstx/io/InputBootstrapper at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:426) at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:877) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.shaded.com.ctc.wstx.io.InputBootstrapper at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 11 more {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23086) Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog
[ https://issues.apache.org/jira/browse/SPARK-23086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226069#comment-17226069 ] gaofeng commented on SPARK-23086: - I also encountered this problem in the production environment. it's emergency.Please help me . Thank you very much!:) > Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog > -- > > Key: SPARK-23086 > URL: https://issues.apache.org/jira/browse/SPARK-23086 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 > Environment: * Spark 2.2.1 >Reporter: pin_zhang >Priority: Major > Labels: bulk-closed > > * Hive metastore is mysql > * Set hive.server2.thrift.max.worker.threads=500 > create table test (id string ) partitioned by (index int) stored as > parquet; > insert into test partition (index=1) values('id1'); > * 100 Clients run SQL“select * from table” on table > * Many clients (97%) blocked at HiveExternalCatalog.withClient > * Is synchronized expected when only run query against tables? > "pool-21-thread-65" #1178 prio=5 os_prio=0 tid=0x2aaac8e06800 nid=0x1e70 > waiting for monitor entry [0x4e19a000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) > - waiting to lock <0xc06a3ba8> (a > org.apache.spark.sql.hive.HiveExternalCatalog) > at > org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:674) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:667) > - locked <0xc41ab748> (a > org.apache.spark.sql.hive.HiveSessionCatalog) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:646) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:601) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:631) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:624) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:61) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:59) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:624) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:570) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82) > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124) > at scala.collection.immutable.List.foldLeft(List.scala:84) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74) > at scala.collection.immutable.List.foreach(List.scala:381) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74) > at > org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69) > - locked <0xff491c48> (a > org.apache.spark.sql.execution.QueryExecution) > at > org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryE
[jira] [Commented] (SPARK-29392) Remove use of deprecated symbol literal " 'name " syntax in favor Symbol("name")
[ https://issues.apache.org/jira/browse/SPARK-29392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226070#comment-17226070 ] Yang Jie commented on SPARK-29392: -- I can take the time to fix them by module, but is this the right time, or do I have to wait until Scala 2.13 becomes the default option? > Remove use of deprecated symbol literal " 'name " syntax in favor > Symbol("name") > > > Key: SPARK-29392 > URL: https://issues.apache.org/jira/browse/SPARK-29392 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL, Tests >Affects Versions: 3.0.0 >Reporter: Sean R. Owen >Assignee: Sean R. Owen >Priority: Minor > Fix For: 3.0.0 > > > Example: > {code} > [WARNING] [Warn] > /Users/seanowen/Documents/spark_2.13/core/src/test/scala/org/apache/spark/memory/UnifiedMemoryManagerSuite.scala:308: > symbol literal is deprecated; use Symbol("assertInvariants") instead > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23086) Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog
[ https://issues.apache.org/jira/browse/SPARK-23086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226067#comment-17226067 ] gaofeng commented on SPARK-23086: - hi?how to resolve this problem > Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog > -- > > Key: SPARK-23086 > URL: https://issues.apache.org/jira/browse/SPARK-23086 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 > Environment: * Spark 2.2.1 >Reporter: pin_zhang >Priority: Major > Labels: bulk-closed > > * Hive metastore is mysql > * Set hive.server2.thrift.max.worker.threads=500 > create table test (id string ) partitioned by (index int) stored as > parquet; > insert into test partition (index=1) values('id1'); > * 100 Clients run SQL“select * from table” on table > * Many clients (97%) blocked at HiveExternalCatalog.withClient > * Is synchronized expected when only run query against tables? > "pool-21-thread-65" #1178 prio=5 os_prio=0 tid=0x2aaac8e06800 nid=0x1e70 > waiting for monitor entry [0x4e19a000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) > - waiting to lock <0xc06a3ba8> (a > org.apache.spark.sql.hive.HiveExternalCatalog) > at > org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:674) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:667) > - locked <0xc41ab748> (a > org.apache.spark.sql.hive.HiveSessionCatalog) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:646) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:601) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:631) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:624) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:61) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:59) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:624) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:570) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82) > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124) > at scala.collection.immutable.List.foldLeft(List.scala:84) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74) > at scala.collection.immutable.List.foreach(List.scala:381) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74) > at > org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69) > - locked <0xff491c48> (a > org.apache.spark.sql.execution.QueryExecution) > at > org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:67) >
[jira] [Updated] (SPARK-33256) Update contribution guide about NumPy documentation style
[ https://issues.apache.org/jira/browse/SPARK-33256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33256: - Description: We should document that PySpark uses NumPy documentation style. See also https://github.com/apache/spark/pull/30181#discussion_r517314341 was:We should document that PySpark uses NumPy documentation style. > Update contribution guide about NumPy documentation style > - > > Key: SPARK-33256 > URL: https://issues.apache.org/jira/browse/SPARK-33256 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > We should document that PySpark uses NumPy documentation style. > See also https://github.com/apache/spark/pull/30181#discussion_r517314341 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33285) Too many "Auto-application to `()` is deprecated." related compilation warnings
[ https://issues.apache.org/jira/browse/SPARK-33285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226040#comment-17226040 ] Guillaume Martres commented on SPARK-33285: --- {quote} Similarly, there are many "symbol literal is degraded" warnings too, but this can only be fixed after Scala 2.12 is no longer supported {quote} Replacing {{'foo}} by {{Symbol("foo")}} will get rid of the warning and is compatible with all Scala versions. > Too many "Auto-application to `()` is deprecated." related compilation > warnings > > > Key: SPARK-33285 > URL: https://issues.apache.org/jira/browse/SPARK-33285 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Minor > > There are too many "Auto-application to `()` is deprecated." related > compilation warnings when compile with Scala 2.13 like > {code:java} > [WARNING] [Warn] > /spark-src/core/src/test/scala/org/apache/spark/PartitioningSuite.scala:246: > Auto-application to `()` is deprecated. Supply the empty argument list `()` > explicitly to invoke method stdev, > or remove the empty argument list from its definition (Java-defined methods > are exempt). > In Scala 3, an unapplied method like this will be eta-expanded into a > function. > {code} > A lot of them, but it's easy to fix. > If there is a definition as follows: > {code:java} > Class Foo { >def bar(): Unit = {} > } > val foo = new Foo{code} > Should be > {code:java} > foo.bar() > {code} > not > {code:java} > foo.bar {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32894) Timestamp cast in exernal orc table
[ https://issues.apache.org/jira/browse/SPARK-32894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226016#comment-17226016 ] Yang Jie commented on SPARK-32894: -- What is the data generation process?It seems that the data type is inconsistent with the schema of the table > Timestamp cast in exernal orc table > --- > > Key: SPARK-32894 > URL: https://issues.apache.org/jira/browse/SPARK-32894 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 3.0.0 > Environment: Spark 3.0.0 > Java 1.8 > Hadoop 3.3.0 > Hive 3.1.2 > Python 3.7 (from pyspark) >Reporter: Grigory Skvortsov >Priority: Major > > I have the external hive table stored as orc. I want to work with timestamp > column in my table using pyspark. > For example, I try this: > spark.sql('select id, time_ from mydb.table1`).show() > > Py4JJavaError: An error occurred while calling o2877.showString. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 > (TID 19, 172.29.14.241, executor 1): java.lang.ClassCastException: > org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Long > at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107) > at > org.apache.spark.sql.catalyst.expressions.MutableLong.update(SpecificInternalRow.scala:148) > at > org.apache.spark.sql.catalyst.expressions.SpecificInternalRow.update(SpecificInternalRow.scala:228) > at > org.apache.spark.sql.hive.HiveInspectors.$anonfun$unwrapperFor$53(HiveInspectors.scala:730) > at > org.apache.spark.sql.hive.HiveInspectors.$anonfun$unwrapperFor$53$adapted(HiveInspectors.scala:730) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$.$anonfun$unwrapOrcStructs$4(OrcFileFormat.scala:351) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:96) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:127) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2023) > at > org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1972) > at > org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1971) > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1971) > at > org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:950) > at > org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:950) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:950) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2203) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.o
[jira] [Commented] (SPARK-33325) Spark executors pod are not shutting down when losing driver connection
[ https://issues.apache.org/jira/browse/SPARK-33325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226009#comment-17226009 ] Hadrien Kohl commented on SPARK-33325: -- It looks one thread gets stuck here on the awaitTermination [https://github.com/apache/spark/blob/2b147c4cd50da32fe2b4167f97c8142102a0510d/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L52-L61] {code:java} def stop(): Unit = { synchronized { if (!stopped) { setActive(MessageLoop.PoisonPill) threadpool.shutdown() stopped = true } } threadpool.awaitTermination(Long.MaxValue, TimeUnit.MILLISECONDS) } {code} > Spark executors pod are not shutting down when losing driver connection > --- > > Key: SPARK-33325 > URL: https://issues.apache.org/jira/browse/SPARK-33325 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.1 >Reporter: Hadrien Kohl >Priority: Major > > In situations where the executors lose contact with the driver, the java > process does not die. I am looking at what on the kubernetes cluster could > prevent proper clean-up. > The spark driver is started in it's own pod in client mode (pyspark shell > started by jupyter). I works fine most of the time but if the driver process > crashes (OOM or kill signal for instance) the executor complains about the > connection reset by peer and then hangs. > Here's the log from an executor pod that hangs: > {code:java} > 20/11/03 07:35:30 WARN TransportChannelHandler: Exception in connection from > /10.17.0.152:37161 > java.io.IOException: Connection reset by peer > at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at java.base/sun.nio.ch.SocketDispatcher.read(Unknown Source) > at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source) > at java.base/sun.nio.ch.IOUtil.read(Unknown Source) > at java.base/sun.nio.ch.IOUtil.read(Unknown Source) > at java.base/sun.nio.ch.SocketChannelImpl.read(Unknown Source) > at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:253) > at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1133) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:350) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) > at > io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > at > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Unknown Source) > 20/11/03 07:35:30 ERROR CoarseGrainedExecutorBackend: Executor self-exiting > due to : Driver 10.17.0.152:37161 disassociated! Shutting down. > 20/11/03 07:35:31 INFO MemoryStore: MemoryStore cleared > 20/11/03 07:35:31 INFO BlockManager: BlockManager stopped > {code} > When start a shell in the pod I can see the process are still running: > {code:java} > UID PIDPPID CSZ RSS PSR STIME TTY TIME CMD > 185 125 0 0 5045 3968 2 10:07 pts/000:00:00 /bin/bash > 185 166 125 0 9019 3364 1 10:39 pts/000:00:00 \_ ps > -AF --forest > 1851 0 0 1130 768 0 07:34 ?00:00:00 > /usr/bin/tini -s -- /opt/java/openjdk/ > 185 14 1 0 1935527 493976 3 07:34 ? 00:00:21 > /opt/java/openjdk/bin/java -Dspark.dri > {code} > Here's the full command used to start the executor: > {code:java} > /opt/java/openjdk/ > bin/java -Dspark.driver.port=37161 -Xms4g -Xmx4g -cp :/opt/spark/jars/*: > org.apache.spark.executor.CoarseG > rainedExecutorBackend --driver-url > spark://CoarseGrainedScheduler@10.17.0.152:37161 --executor-id 1 --core > s 1 --app-id spark-application-1604388891044 --hostname 10.17.2.151 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org