[jira] [Updated] (SPARK-48396) Support configuring limit control for SQL to use maximum cores

2024-05-22 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-48396:
-
Description: 
When there is a long-running shared Spark SQL cluster, there may be a situation 
where a large SQL occupies all the cores of the cluster, affecting the 
execution of other SQLs. Therefore, it is hoped that there is a configuration 
that can limit the maximum cores used by SQL.
 

> Support configuring limit control for SQL to use maximum cores
> --
>
> Key: SPARK-48396
> URL: https://issues.apache.org/jira/browse/SPARK-48396
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Mars
>Priority: Major
>
> When there is a long-running shared Spark SQL cluster, there may be a 
> situation where a large SQL occupies all the cores of the cluster, affecting 
> the execution of other SQLs. Therefore, it is hoped that there is a 
> configuration that can limit the maximum cores used by SQL.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48396) Support configuring limit control for SQL to use maximum cores

2024-05-22 Thread Mars (Jira)
Mars created SPARK-48396:


 Summary: Support configuring limit control for SQL to use maximum 
cores
 Key: SPARK-48396
 URL: https://issues.apache.org/jira/browse/SPARK-48396
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.1
Reporter: Mars






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46710) Clean up the broadcast data generated when sql execution ends

2024-01-13 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-46710:
-
Description: 
Broadcast data cleaning can only rely on cleaning when GC is triggered, which 
may lead to a lot of waste of memory usage , and may also cause query 
instability if a single GC takes too long.
Actually we can clean the broadcast data generated during the execution of the 
sql after sql execution to reduce memory on driver or executor.

  was:
Broadcast data cleaning can only rely on cleaning when GC is triggered, which 
may lead to a lot of waste of memory usage , and may also cause query 
instability if a single GC takes too long.
After the execution of sql is completed, the broadcast data generated during 
the execution of the sql can be cleaned to reduce memory on driver or executor.


> Clean up the broadcast data generated when sql execution ends
> -
>
> Key: SPARK-46710
> URL: https://issues.apache.org/jira/browse/SPARK-46710
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mars
>Priority: Major
>  Labels: pull-request-available
>
> Broadcast data cleaning can only rely on cleaning when GC is triggered, which 
> may lead to a lot of waste of memory usage , and may also cause query 
> instability if a single GC takes too long.
> Actually we can clean the broadcast data generated during the execution of 
> the sql after sql execution to reduce memory on driver or executor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46710) Clean up the broadcast data generated when sql execution ends

2024-01-13 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-46710:
-
Description: 
Broadcast data cleaning can only rely on cleaning when GC is triggered, which 
may lead to a lot of waste of memory usage , and may also cause query 
instability if a single GC takes too long.
After the execution of sql is completed, the broadcast data generated during 
the execution of the sql can be cleaned to reduce memory on driver or executor.

  was:Faster cleaning of broadcast data generated by sql is beneficial to 
saving driver/executor memory and avoiding long-term GC. This can make a long 
running spark service more stable


> Clean up the broadcast data generated when sql execution ends
> -
>
> Key: SPARK-46710
> URL: https://issues.apache.org/jira/browse/SPARK-46710
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mars
>Priority: Major
>  Labels: pull-request-available
>
> Broadcast data cleaning can only rely on cleaning when GC is triggered, which 
> may lead to a lot of waste of memory usage , and may also cause query 
> instability if a single GC takes too long.
> After the execution of sql is completed, the broadcast data generated during 
> the execution of the sql can be cleaned to reduce memory on driver or 
> executor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46710) Clean up the broadcast data generated when sql execution ends

2024-01-13 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-46710:
-
Summary: Clean up the broadcast data generated when sql execution ends  
(was: Clean up the broadcast data generated by SQL faster)

> Clean up the broadcast data generated when sql execution ends
> -
>
> Key: SPARK-46710
> URL: https://issues.apache.org/jira/browse/SPARK-46710
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mars
>Priority: Major
>
> Faster cleaning of broadcast data generated by sql is beneficial to saving 
> driver/executor memory and avoiding long-term GC. This can make a long 
> running spark service more stable



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42388) Avoid unnecessary parquet footer reads when no filters in vectorized reader

2023-02-09 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-42388:
-
Description: 
Parquet footer is now read twice even if there are no filters requiring 
pushdown in vectorized parquet reader.
When the NameNode is under high pressure, it will cost time to read twice. 
Actually we can avoid this unnecessary parquet footer reads and use footer 
metadata in {{{}VectorizedParquetRecordReader{}}}.

> Avoid unnecessary parquet footer reads when no filters in vectorized reader
> ---
>
> Key: SPARK-42388
> URL: https://issues.apache.org/jira/browse/SPARK-42388
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Mars
>Priority: Major
>
> Parquet footer is now read twice even if there are no filters requiring 
> pushdown in vectorized parquet reader.
> When the NameNode is under high pressure, it will cost time to read twice. 
> Actually we can avoid this unnecessary parquet footer reads and use footer 
> metadata in {{{}VectorizedParquetRecordReader{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42388) Avoid unnecessary parquet footer reads when no filters in vectorized reader

2023-02-09 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-42388:
-
Summary: Avoid unnecessary parquet footer reads when no filters in 
vectorized reader  (was: Avoid unnecessary parquet footer reads when no filters 
in vectorized parquet reader)

> Avoid unnecessary parquet footer reads when no filters in vectorized reader
> ---
>
> Key: SPARK-42388
> URL: https://issues.apache.org/jira/browse/SPARK-42388
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Mars
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42388) Avoid unnecessary parquet footer reads when no filters in vectorized parquet reader

2023-02-08 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-42388:
-
Summary: Avoid unnecessary parquet footer reads when no filters in 
vectorized parquet reader  (was: Avoid unnecessary parquet footer reads when no 
filters)

> Avoid unnecessary parquet footer reads when no filters in vectorized parquet 
> reader
> ---
>
> Key: SPARK-42388
> URL: https://issues.apache.org/jira/browse/SPARK-42388
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Mars
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42388) Avoid unnecessary parquet footer reads when no filters

2023-02-08 Thread Mars (Jira)
Mars created SPARK-42388:


 Summary: Avoid unnecessary parquet footer reads when no filters
 Key: SPARK-42388
 URL: https://issues.apache.org/jira/browse/SPARK-42388
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Mars






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42387) Avoid unnecessary parquet footer reads when no filters

2023-02-08 Thread Mars (Jira)
Mars created SPARK-42387:


 Summary: Avoid unnecessary parquet footer reads when no filters
 Key: SPARK-42387
 URL: https://issues.apache.org/jira/browse/SPARK-42387
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Mars






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38005) Support cleaning up merged shuffle files and state from external shuffle service

2023-02-08 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-38005:
-
Fix Version/s: 3.4.0

> Support cleaning up merged shuffle files and state from external shuffle 
> service
> 
>
> Key: SPARK-38005
> URL: https://issues.apache.org/jira/browse/SPARK-38005
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.2.0
>Reporter: Chandni Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently merged shuffle files and state is not cleaned up until an 
> application ends. SPARK-37618 handles the cleanup of regular shuffle files. 
> This jira will address cleaning up of merged shuffle files/state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38005) Support cleaning up merged shuffle files and state from external shuffle service

2023-02-08 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars resolved SPARK-38005.
--
Resolution: Fixed

> Support cleaning up merged shuffle files and state from external shuffle 
> service
> 
>
> Key: SPARK-38005
> URL: https://issues.apache.org/jira/browse/SPARK-38005
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.2.0
>Reporter: Chandni Singh
>Priority: Major
>
> Currently merged shuffle files and state is not cleaned up until an 
> application ends. SPARK-37618 handles the cleanup of regular shuffle files. 
> This jira will address cleaning up of merged shuffle files/state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-41470) SPJ: Spark shouldn't assume InternalRow implements equals and hashCode

2023-01-21 Thread Mars (Jira)


[ https://issues.apache.org/jira/browse/SPARK-41470 ]


Mars deleted comment on SPARK-41470:
--

was (Author: JIRAUSER290821):
[~csun] I want to fix it ~

> SPJ: Spark shouldn't assume InternalRow implements equals and hashCode
> --
>
> Key: SPARK-41470
> URL: https://issues.apache.org/jira/browse/SPARK-41470
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Chao Sun
>Priority: Major
>
> Currently SPJ (Storage-Partitioned Join) actually assumes the {{InternalRow}} 
> returned by {{HasPartitionKey}} implements {{equals}} and {{{}hashCode{}}}. 
> We should remove this restriction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-41470) SPJ: Spark shouldn't assume InternalRow implements equals and hashCode

2023-01-20 Thread Mars (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679059#comment-17679059
 ] 

Mars edited comment on SPARK-41470 at 1/20/23 8:43 AM:
---

[~csun] I want to fix it ~


was (Author: JIRAUSER290821):
[~csun] I want to take it ~

> SPJ: Spark shouldn't assume InternalRow implements equals and hashCode
> --
>
> Key: SPARK-41470
> URL: https://issues.apache.org/jira/browse/SPARK-41470
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Chao Sun
>Priority: Major
>
> Currently SPJ (Storage-Partitioned Join) actually assumes the {{InternalRow}} 
> returned by {{HasPartitionKey}} implements {{equals}} and {{{}hashCode{}}}. 
> We should remove this restriction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41470) SPJ: Spark shouldn't assume InternalRow implements equals and hashCode

2023-01-20 Thread Mars (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679059#comment-17679059
 ] 

Mars commented on SPARK-41470:
--

[~csun] I want to take it ~

> SPJ: Spark shouldn't assume InternalRow implements equals and hashCode
> --
>
> Key: SPARK-41470
> URL: https://issues.apache.org/jira/browse/SPARK-41470
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Chao Sun
>Priority: Major
>
> Currently SPJ (Storage-Partitioned Join) actually assumes the {{InternalRow}} 
> returned by {{HasPartitionKey}} implements {{equals}} and {{{}hashCode{}}}. 
> We should remove this restriction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-41471) SPJ: Reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning

2023-01-16 Thread Mars (Jira)


[ https://issues.apache.org/jira/browse/SPARK-41471 ]


Mars deleted comment on SPARK-41471:
--

was (Author: JIRAUSER290821):
[~csun] Hi, I want to take it :)

> SPJ: Reduce Spark shuffle when only one side of a join is 
> KeyGroupedPartitioning
> 
>
> Key: SPARK-41471
> URL: https://issues.apache.org/jira/browse/SPARK-41471
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Chao Sun
>Priority: Major
>
> When only one side of a SPJ (Storage-Partitioned Join) is 
> {{{}KeyGroupedPartitioning{}}}, Spark currently needs to shuffle both sides 
> using {{{}HashPartitioning{}}}. However, we may just need to shuffle the 
> other side according to the partition transforms defined in 
> {{{}KeyGroupedPartitioning{}}}. This is especially useful when the other side 
> is relatively small.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41471) SPJ: Reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning

2022-12-28 Thread Mars (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652484#comment-17652484
 ] 

Mars commented on SPARK-41471:
--

[~csun] Hi, I want to take it :)

> SPJ: Reduce Spark shuffle when only one side of a join is 
> KeyGroupedPartitioning
> 
>
> Key: SPARK-41471
> URL: https://issues.apache.org/jira/browse/SPARK-41471
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Chao Sun
>Priority: Major
>
> When only one side of a SPJ (Storage-Partitioned Join) is 
> {{{}KeyGroupedPartitioning{}}}, Spark currently needs to shuffle both sides 
> using {{{}HashPartitioning{}}}. However, we may just need to shuffle the 
> other side according to the partition transforms defined in 
> {{{}KeyGroupedPartitioning{}}}. This is especially useful when the other side 
> is relatively small.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41365) Stages UI page fails to load for proxy in some yarn versions

2022-12-02 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-41365:
-
Description: 
My environment CDH 5.8 , click to enter the spark UI from the yarn interface
when visit the stage URI, it fails to load,  URI is
{code:java}
http://:8088/proxy/application_1669877165233_0021/stages/stage/?id=0&attempt=0
 {code}
!image-2022-12-02-17-53-03-003.png|width=430,height=697!

Server error stack trace:
{code:java}
Caused by: java.lang.NullPointerException
at 
org.apache.spark.status.api.v1.StagesResource.$anonfun$doPagination$1(StagesResource.scala:207)
at 
org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:142)
at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:147)
at 
org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:137)
at 
org.apache.spark.status.api.v1.BaseAppResource.withUI$(ApiRootResource.scala:135)
at 
org.apache.spark.status.api.v1.StagesResource.withUI(StagesResource.scala:31)
at 
org.apache.spark.status.api.v1.StagesResource.doPagination(StagesResource.scala:206)
at 
org.apache.spark.status.api.v1.StagesResource.$anonfun$taskTable$1(StagesResource.scala:161)
at 
org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:142)
at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:147)
at 
org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:137)
at 
org.apache.spark.status.api.v1.BaseAppResource.withUI$(ApiRootResource.scala:135)
at 
org.apache.spark.status.api.v1.StagesResource.withUI(StagesResource.scala:31)
at 
org.apache.spark.status.api.v1.StagesResource.taskTable(StagesResource.scala:145)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){code}
 

The issue is similar to, the final phenomenon of the issue is the same, because 
the parameter encode twice
https://issues.apache.org/jira/browse/SPARK-32467
https://issues.apache.org/jira/browse/SPARK-33611

The two issues solve two scenarios to avoid encode twice:
1. https redirect proxy
2. set reverse proxy enabled (spark.ui.reverseProxy)  in Nginx 

But if encode twice due to other reasons, such as this issue (yarn proxy), it 
will also fail

  was:
My environment CDH 5.8 , click to enter the spark UI from the yarn interface
when visit the stage URI, it fails to load,  URI is
{code:java}
http://:8088/proxy/application_1669877165233_0021/stages/stage/?id=0&attempt=0
 {code}
!image-2022-12-02-17-53-03-003.png|width=430,height=697!

Server error stack trace:
{code}
Caused by: java.lang.NullPointerException
at 
org.apache.spark.status.api.v1.StagesResource.$anonfun$doPagination$1(StagesResource.scala:207)
at 
org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:142)
at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:147)
at 
org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:137)
at 
org.apache.spark.status.api.v1.BaseAppResource.withUI$(ApiRootResource.scala:135)
at 
org.apache.spark.status.api.v1.StagesResource.withUI(StagesResource.scala:31)
at 
org.apache.spark.status.api.v1.StagesResource.doPagination(StagesResource.scala:206)
at 
org.apache.spark.status.api.v1.StagesResource.$anonfun$taskTable$1(StagesResource.scala:161)
at 
org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:142)
at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:147)
at 
org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:137)
at 
org.apache.spark.status.api.v1.BaseAppResource.withUI$(ApiRootResource.scala:135)
at 
org.apache.spark.status.api.v1.StagesResource.withUI(StagesResource.scala:31)
at 
org.apache.spark.status.api.v1.StagesResource.taskTable(StagesResource.scala:145)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){code}


> Stages UI page fails to load for proxy in some yarn versions 
> -
>
> Key: SPARK-41365
> URL: https://issues.apache.org/jira/browse/SPARK-41365
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.1
> Environment: as above
>Reporter: Mars
>Priority: Major
> Attachments: image-2022-12-02-17-53-03-003.png
>
>
> My environment CDH 5.8 , click to enter the spark UI from the yarn interface
> when visit the stage URI, it fails to load,  URI is
> {code:java}
> http://:808

[jira] [Updated] (SPARK-41365) Stages UI page fails to load for proxy in some yarn versions

2022-12-02 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-41365:
-
Description: 
My environment CDH 5.8 , click to enter the spark UI from the yarn interface
when visit the stage URI, it fails to load,  URI is
{code:java}
http://:8088/proxy/application_1669877165233_0021/stages/stage/?id=0&attempt=0
 {code}
!image-2022-12-02-17-53-03-003.png|width=430,height=697!

Server error stack trace:
{code}
Caused by: java.lang.NullPointerException
at 
org.apache.spark.status.api.v1.StagesResource.$anonfun$doPagination$1(StagesResource.scala:207)
at 
org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:142)
at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:147)
at 
org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:137)
at 
org.apache.spark.status.api.v1.BaseAppResource.withUI$(ApiRootResource.scala:135)
at 
org.apache.spark.status.api.v1.StagesResource.withUI(StagesResource.scala:31)
at 
org.apache.spark.status.api.v1.StagesResource.doPagination(StagesResource.scala:206)
at 
org.apache.spark.status.api.v1.StagesResource.$anonfun$taskTable$1(StagesResource.scala:161)
at 
org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:142)
at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:147)
at 
org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:137)
at 
org.apache.spark.status.api.v1.BaseAppResource.withUI$(ApiRootResource.scala:135)
at 
org.apache.spark.status.api.v1.StagesResource.withUI(StagesResource.scala:31)
at 
org.apache.spark.status.api.v1.StagesResource.taskTable(StagesResource.scala:145)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){code}

  was:as above


> Stages UI page fails to load for proxy in some yarn versions 
> -
>
> Key: SPARK-41365
> URL: https://issues.apache.org/jira/browse/SPARK-41365
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.1
> Environment: as above
>Reporter: Mars
>Priority: Major
> Attachments: image-2022-12-02-17-53-03-003.png
>
>
> My environment CDH 5.8 , click to enter the spark UI from the yarn interface
> when visit the stage URI, it fails to load,  URI is
> {code:java}
> http://:8088/proxy/application_1669877165233_0021/stages/stage/?id=0&attempt=0
>  {code}
> !image-2022-12-02-17-53-03-003.png|width=430,height=697!
> Server error stack trace:
> {code}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.status.api.v1.StagesResource.$anonfun$doPagination$1(StagesResource.scala:207)
>   at 
> org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:142)
>   at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:147)
>   at 
> org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:137)
>   at 
> org.apache.spark.status.api.v1.BaseAppResource.withUI$(ApiRootResource.scala:135)
>   at 
> org.apache.spark.status.api.v1.StagesResource.withUI(StagesResource.scala:31)
>   at 
> org.apache.spark.status.api.v1.StagesResource.doPagination(StagesResource.scala:206)
>   at 
> org.apache.spark.status.api.v1.StagesResource.$anonfun$taskTable$1(StagesResource.scala:161)
>   at 
> org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:142)
>   at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:147)
>   at 
> org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:137)
>   at 
> org.apache.spark.status.api.v1.BaseAppResource.withUI$(ApiRootResource.scala:135)
>   at 
> org.apache.spark.status.api.v1.StagesResource.withUI(StagesResource.scala:31)
>   at 
> org.apache.spark.status.api.v1.StagesResource.taskTable(StagesResource.scala:145)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41365) Stages UI page fails to load for proxy in some yarn versions

2022-12-02 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-41365:
-
Attachment: image-2022-12-02-17-53-03-003.png

> Stages UI page fails to load for proxy in some yarn versions 
> -
>
> Key: SPARK-41365
> URL: https://issues.apache.org/jira/browse/SPARK-41365
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.1
> Environment: as above
>Reporter: Mars
>Priority: Major
> Attachments: image-2022-12-02-17-53-03-003.png
>
>
> as above



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41365) Stages UI page fails to load for proxy in some yarn versions

2022-12-02 Thread Mars (Jira)
Mars created SPARK-41365:


 Summary: Stages UI page fails to load for proxy in some yarn 
versions 
 Key: SPARK-41365
 URL: https://issues.apache.org/jira/browse/SPARK-41365
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.3.1
 Environment: as above
Reporter: Mars


as above



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-37313) Child stage using merged output or not should be based on the availability of merged output from parent stage

2022-11-22 Thread Mars (Jira)


[ https://issues.apache.org/jira/browse/SPARK-37313 ]


Mars deleted comment on SPARK-37313:
--

was (Author: JIRAUSER290821):
as comment said 
[https://github.com/apache/spark/pull/34461#issuecomment-964557253]
I'm working on this Issue and trying to implement this functionality [~minyang] 
[~mridul] 

> Child stage using merged output or not should be based on the availability of 
> merged output from parent stage
> -
>
> Key: SPARK-37313
> URL: https://issues.apache.org/jira/browse/SPARK-37313
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.2.1
>Reporter: Minchu Yang
>Priority: Minor
>
> As discussed in the 
> [thread|https://github.com/apache/spark/pull/34461#pullrequestreview-799701494]
>  in SPARK-37023, during a stage retry, if parent stage has already generated 
> merged output in the previous attempt, with current behavior, the child stage 
> would not able to fetch the merged output, as this is controlled by 
> dependency.shuffleMergeEnabled (see current implementation 
> [here|https://github.com/apache/spark/blob/31b6f614d3173c8a5852243bf7d0b6200788432d/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala#L134-L136])
>  during the stage retry.
> Instead of using a single variable to control behavior at both mapper side 
> (push side) and reducer side (using merged output), whether child stage uses 
> merged output or not must only be based on whether merged output is available 
> for it to use(as discussed 
> [here|https://github.com/apache/spark/pull/34461#issuecomment-964557253]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38093) Set shuffleMergeAllowed to false for a determinate stage after the stage is finalized

2022-11-18 Thread Mars (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635786#comment-17635786
 ] 

Mars commented on SPARK-38093:
--

comment https://github.com/apache/spark/pull/34122#discussion_r796929787

> Set shuffleMergeAllowed to false for a determinate stage after the stage is 
> finalized
> -
>
> Key: SPARK-38093
> URL: https://issues.apache.org/jira/browse/SPARK-38093
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.2.1
>Reporter: Venkata krishnan Sowrirajan
>Priority: Major
>
> Currently we are setting shuffleMergeAllowed to false before 
> prepareShuffleServicesForShuffleMapStage if the shuffle dependency is already 
> finalized. Ideally it is better to do it right after shuffle dependency 
> finalization for a determinate stage. cc [~mridulm80]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37313) Child stage using merged output or not should be based on the availability of merged output from parent stage

2022-11-17 Thread Mars (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635314#comment-17635314
 ] 

Mars commented on SPARK-37313:
--

as comment said 
[https://github.com/apache/spark/pull/34461#issuecomment-964557253]
I'm working on this Issue and trying to implement this functionality [~minyang] 
[~mridul] 

> Child stage using merged output or not should be based on the availability of 
> merged output from parent stage
> -
>
> Key: SPARK-37313
> URL: https://issues.apache.org/jira/browse/SPARK-37313
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.2.1
>Reporter: Minchu Yang
>Priority: Minor
>
> As discussed in the 
> [thread|https://github.com/apache/spark/pull/34461#pullrequestreview-799701494]
>  in SPARK-37023, during a stage retry, if parent stage has already generated 
> merged output in the previous attempt, with current behavior, the child stage 
> would not able to fetch the merged output, as this is controlled by 
> dependency.shuffleMergeEnabled (see current implementation 
> [here|https://github.com/apache/spark/blob/31b6f614d3173c8a5852243bf7d0b6200788432d/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala#L134-L136])
>  during the stage retry.
> Instead of using a single variable to control behavior at both mapper side 
> (push side) and reducer side (using merged output), whether child stage uses 
> merged output or not must only be based on whether merged output is available 
> for it to use(as discussed 
> [here|https://github.com/apache/spark/pull/34461#issuecomment-964557253]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38005) Support cleaning up merged shuffle files and state from external shuffle service

2022-10-27 Thread Mars (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17625006#comment-17625006
 ] 

Mars commented on SPARK-38005:
--

[~mridulm80] [~csingh] Hi~ I want to take this issue. I had understood some 
background and plan to start working on this issue now.

> Support cleaning up merged shuffle files and state from external shuffle 
> service
> 
>
> Key: SPARK-38005
> URL: https://issues.apache.org/jira/browse/SPARK-38005
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.2.0
>Reporter: Chandni Singh
>Priority: Major
>
> Currently merged shuffle files and state is not cleaned up until an 
> application ends. SPARK-37618 handles the cleanup of regular shuffle files. 
> This jira will address cleaning up of merged shuffle files/state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung

2022-09-07 Thread Mars (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601543#comment-17601543
 ] 

Mars edited comment on SPARK-40320 at 9/7/22 10:50 PM:
---

[~Ngone51] 
Shouldn't it bring up a new `receiveLoop()` to serve RPC messages?

Yes, my previous thinking was wrong. I remote debug on Executor and I found 
that it did catch the fatal error in 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L82-L89]
 .
It will resubmit receiveLoop and in the second time it will block by 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L69]
This Executor did not initialize successfully in the first time , so it didn't 
send LaunchedExecutor to Driver (you can see 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L172]
 )

So the Executor can't launch task, related PR 
[https://github.com/apache/spark/pull/25964] .

Why SparkUncaughtExceptionHandler doesn't catch the fatal error?
See 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L284]
 plugins is private variable, so it was broken when initialize Executor at the 
beginning.


was (Author: JIRAUSER290821):
[~Ngone51] 
Shouldn't it bring up a new `receiveLoop()` to serve RPC messages?

Yes, my previous thinking was wrong. I remote debug on Executor and I found 
that it did catch the fatal error in 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L82-L89]
 .
It will resubmit receiveLoop and in the second time it will block by 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L69]
This Executor did not initialize successfully in the first time and didn't send 
LaunchedExecutor to Driver (you can see 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L172]
 )

So the Executor can't launch task, related PR 
[https://github.com/apache/spark/pull/25964] .

Why SparkUncaughtExceptionHandler doesn't catch the fatal error?
See 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L284]
 plugins is private variable, so it was broken when initialize Executor at the 
beginning.

> When the Executor plugin fails to initialize, the Executor shows active but 
> does not accept tasks forever, just like being hung
> ---
>
> Key: SPARK-40320
> URL: https://issues.apache.org/jira/browse/SPARK-40320
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.0.0
>Reporter: Mars
>Priority: Major
>
> *Reproduce step:*
> set `spark.plugins=ErrorSparkPlugin`
> `ErrorSparkPlugin` && `ErrorExecutorPlugin` class as below (I abbreviate the 
> code to make it clearer):
> {code:java}
> class ErrorSparkPlugin extends SparkPlugin {
>   /**
>*/
>   override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()
>   /**
>*/
>   override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
> }{code}
> {code:java}
> class ErrorExecutorPlugin extends ExecutorPlugin {
>   private val checkingInterval: Long = 1
>   override def init(_ctx: PluginContext, extraConf: util.Map[String, 
> String]): Unit = {
> if (checkingInterval == 1) {
>   throw new UnsatisfiedLinkError("My Exception error")
> }
>   }
> } {code}
> The Executor is active when we check in spark-ui, however it was broken and 
> doesn't receive any task.
> *Root Cause:*
> I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` 
> it will throw fatal error (`UnsatisfiedLinkError` is fatal erro ) in method 
> `dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM 
> process  is active but the  communication thread is no longer working ( 
> please see  `MessageLoop#receiveLoopRunnable` , `receiveLoop()` was broken, 
> so executor doesn't receive any message)
> Some ideas:
> I think it is very hard to know what happened here unless we check in the 
> code. The Executor is active but it can't do anything. We will wonder if the 
> driver is broken or the Executor problem.  I think at least the Executor 
> status shouldn't be active here or the Executor can exitExecutor (kill itself)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h

[jira] [Commented] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung

2022-09-07 Thread Mars (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601543#comment-17601543
 ] 

Mars commented on SPARK-40320:
--

[~Ngone51] 
Shouldn't it bring up a new `receiveLoop()` to serve RPC messages?
Yes, my previous thinking was wrong. I remote debug on Executor and I found 
that it did catch the fatal error in 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L82-L89]
 .
It will resubmit receiveLoop and block in 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L69]
But this Executor did not initialize successfully and didn't send 
LaunchedExecutor to Driver (you can see 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L172]
 )

So the Executor can't launch task, related PR 
[https://github.com/apache/spark/pull/25964] .

Why SparkUncaughtExceptionHandler doesn't catch the fatal error?
See 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L284]
 plugins is private variable, so it was broken when initialize Executor at the 
beginning.

> When the Executor plugin fails to initialize, the Executor shows active but 
> does not accept tasks forever, just like being hung
> ---
>
> Key: SPARK-40320
> URL: https://issues.apache.org/jira/browse/SPARK-40320
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.0.0
>Reporter: Mars
>Priority: Major
>
> *Reproduce step:*
> set `spark.plugins=ErrorSparkPlugin`
> `ErrorSparkPlugin` && `ErrorExecutorPlugin` class as below (I abbreviate the 
> code to make it clearer):
> {code:java}
> class ErrorSparkPlugin extends SparkPlugin {
>   /**
>*/
>   override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()
>   /**
>*/
>   override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
> }{code}
> {code:java}
> class ErrorExecutorPlugin extends ExecutorPlugin {
>   private val checkingInterval: Long = 1
>   override def init(_ctx: PluginContext, extraConf: util.Map[String, 
> String]): Unit = {
> if (checkingInterval == 1) {
>   throw new UnsatisfiedLinkError("My Exception error")
> }
>   }
> } {code}
> The Executor is active when we check in spark-ui, however it was broken and 
> doesn't receive any task.
> *Root Cause:*
> I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` 
> it will throw fatal error (`UnsatisfiedLinkError` is fatal erro ) in method 
> `dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM 
> process  is active but the  communication thread is no longer working ( 
> please see  `MessageLoop#receiveLoopRunnable` , `receiveLoop()` was broken, 
> so executor doesn't receive any message)
> Some ideas:
> I think it is very hard to know what happened here unless we check in the 
> code. The Executor is active but it can't do anything. We will wonder if the 
> driver is broken or the Executor problem.  I think at least the Executor 
> status shouldn't be active here or the Executor can exitExecutor (kill itself)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung

2022-09-04 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-40320:
-
Description: 
*Reproduce step:*
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class as below (I abbreviate the 
code to make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
if (checkingInterval == 1) {
  throw new UnsatisfiedLinkError("My Exception error")
}
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

*Root Cause:*

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal erro ) in method 
`dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM process 
 is active but the  communication thread is no longer working ( please see  
`MessageLoop#receiveLoopRunnable` , `receiveLoop()` was broken, so executor 
doesn't receive any message)

Some ideas:
I think it is very hard to know what happened here unless we check in the code. 
The Executor is active but it can't do anything. We will wonder if the driver 
is broken or the Executor problem.  I think at least the Executor status 
shouldn't be active here or the Executor can exitExecutor (kill itself)

 

  was:
*Reproduce step:*
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class as below (I abbreviate the 
code to make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
if (checkingInterval == 1) {
  throw new UnsatisfiedLinkError("My Exception error")
}
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

*Root Cause:*

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM process 
 is active but the  communication thread is no longer working ( please see  
`MessageLoop#receiveLoopRunnable` , `receiveLoop()` was broken, so executor 
doesn't receive any message)

Some ideas:
I think it is very hard to know what happened here unless we check in the code. 
The Executor is active but it can't do anything. We will wonder if the driver 
is broken or the Executor problem.  I think at least the Executor status 
shouldn't be active here or the Executor can exitExecutor (kill itself)

 


> When the Executor plugin fails to initialize, the Executor shows active but 
> does not accept tasks forever, just like being hung
> ---
>
> Key: SPARK-40320
> URL: https://issues.apache.org/jira/browse/SPARK-40320
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.0.0
>Reporter: Mars
>Priority: Major
>
> *Reproduce step:*
> set `spark.plugins=ErrorSparkPlugin`
> `ErrorSparkPlugin` && `ErrorExecutorPlugin` class as below (I abbreviate the 
> code to make it clearer):
> {code:java}
> class ErrorSparkPlugin extends SparkPlugin {
>   /**
>*/
>   override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()
>   /**
>*/
>   override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
> }{code}
> {code:java}
> class ErrorExecutorPlugin extends ExecutorPlugin {
>   private val checkingInterval: Long = 1
>   override def init(_ctx: PluginContext, extraConf: util.Map[String, 
> String]): Unit = {
> if (checkingInterval == 1) {
>   throw new UnsatisfiedLinkError("My Exception error")
> }
>   }
> } {code}
> The Executor is active when we check in spark-ui, however it was broken and 
> doesn't receive any task.
> *Root Cause:*
> I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` 
> it will throw fatal error (`UnsatisfiedLinkError` is fatal erro ) in method 
> `dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM 
> process

[jira] [Updated] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung

2022-09-04 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-40320:
-
Description: 
*Reproduce step:*
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class as below (I abbreviate the 
code to make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
if (checkingInterval == 1) {
  throw new UnsatisfiedLinkError("My Exception error")
}
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

*Root Cause:*

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM process 
 is active but the  communication thread is no longer working ( please see  
`MessageLoop#receiveLoopRunnable` , `receiveLoop()` was broken, so executor 
doesn't receive any message)

Some ideas:
I think it is very hard to know what happened here unless we check in the code. 
The Executor is active but it can't do anything. We will wonder if the driver 
is broken or the Executor problem.  I think at least the Executor status 
shouldn't be active here or the Executor can exitExecutor (kill itself)

 

  was:
*Reproduce step:*
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
if (checkingInterval == 1) {
  throw new UnsatisfiedLinkError("LCL my Exception error2")
}
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

*Root Cause:*

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM process 
 is active but the  communication thread is no longer working ( please see  
`MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was broken here, so 
executor doesn't receive any message)

Some ideas:
I think it is very hard to know what happened here unless we check in the code. 
The Executor is active but it can't do anything. We will wonder if the driver 
is broken or the Executor problem.  I think at least the Executor status 
shouldn't be active here or the Executor can exitExecutor (kill itself)

 


> When the Executor plugin fails to initialize, the Executor shows active but 
> does not accept tasks forever, just like being hung
> ---
>
> Key: SPARK-40320
> URL: https://issues.apache.org/jira/browse/SPARK-40320
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.0.0
>Reporter: Mars
>Priority: Major
>
> *Reproduce step:*
> set `spark.plugins=ErrorSparkPlugin`
> `ErrorSparkPlugin` && `ErrorExecutorPlugin` class as below (I abbreviate the 
> code to make it clearer):
> {code:java}
> class ErrorSparkPlugin extends SparkPlugin {
>   /**
>*/
>   override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()
>   /**
>*/
>   override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
> }{code}
> {code:java}
> class ErrorExecutorPlugin extends ExecutorPlugin {
>   private val checkingInterval: Long = 1
>   override def init(_ctx: PluginContext, extraConf: util.Map[String, 
> String]): Unit = {
> if (checkingInterval == 1) {
>   throw new UnsatisfiedLinkError("My Exception error")
> }
>   }
> } {code}
> The Executor is active when we check in spark-ui, however it was broken and 
> doesn't receive any task.
> *Root Cause:*
> I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` 
> it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in 
> method `dealWithFatalError` . Actually the  `CoarseGraine

[jira] [Updated] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung

2022-09-02 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-40320:
-
Description: 
Reproduce step:
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
if (checkingInterval == 1) {
  throw new UnsatisfiedLinkError("LCL my Exception error2")
}
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

Root Cause:

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM process 
 is active but the  communication thread is no longer working ( please see  
`MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was broken here, so 
executor doesn't receive any message)

Solution:
I think it is very hard to know what happened here unless we check in the code. 
The Executor is active but it can't do anything. We will wonder if the driver 
is broken or the Executor problem.  I think at least the Executor status 
shouldn't be active here or the Executor can exitExecutor (kill itself)

 

  was:
Reproduce step:
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
if (checkingInterval == 1) {
  throw new UnsatisfiedLinkError("LCL my Exception error2")
}
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

Root Cause:

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM process 
 is active but the  communication thread is no longer working

 


> When the Executor plugin fails to initialize, the Executor shows active but 
> does not accept tasks forever, just like being hung
> ---
>
> Key: SPARK-40320
> URL: https://issues.apache.org/jira/browse/SPARK-40320
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.0.0
>Reporter: Mars
>Priority: Major
>
> Reproduce step:
> set `spark.plugins=ErrorSparkPlugin`
> `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
> make it clearer):
> {code:java}
> class ErrorSparkPlugin extends SparkPlugin {
>   /**
>*/
>   override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()
>   /**
>*/
>   override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
> }{code}
> {code:java}
> class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
>   private val checkingInterval: Long = 1
>   override def init(_ctx: PluginContext, extraConf: util.Map[String, 
> String]): Unit = {
> if (checkingInterval == 1) {
>   throw new UnsatisfiedLinkError("LCL my Exception error2")
> }
>   }
> } {code}
> The Executor is active when we check in spark-ui, however it was broken and 
> doesn't receive any task.
> Root Cause:
> I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` 
> it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in 
> method `dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` 
> JVM process  is active but the  communication thread is no longer working ( 
> please see  `MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was 
> broken here, so executor doesn't receive any message)
> Solution:
> I think it is very hard to know what happened here unless we check in the 
> code. The Executor is active but it can't do anything. We will wonder if the 
> driver is broken or the Executor p

[jira] [Updated] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung

2022-09-02 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-40320:
-
Description: 
*Reproduce step:*
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
if (checkingInterval == 1) {
  throw new UnsatisfiedLinkError("LCL my Exception error2")
}
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

*Root Cause:*

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM process 
 is active but the  communication thread is no longer working ( please see  
`MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was broken here, so 
executor doesn't receive any message)

Some ideas:
I think it is very hard to know what happened here unless we check in the code. 
The Executor is active but it can't do anything. We will wonder if the driver 
is broken or the Executor problem.  I think at least the Executor status 
shouldn't be active here or the Executor can exitExecutor (kill itself)

 

  was:
*Reproduce step:*
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
if (checkingInterval == 1) {
  throw new UnsatisfiedLinkError("LCL my Exception error2")
}
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

*Root Cause:*

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM process 
 is active but the  communication thread is no longer working ( please see  
`MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was broken here, so 
executor doesn't receive any message)

Some idea:
I think it is very hard to know what happened here unless we check in the code. 
The Executor is active but it can't do anything. We will wonder if the driver 
is broken or the Executor problem.  I think at least the Executor status 
shouldn't be active here or the Executor can exitExecutor (kill itself)

 


> When the Executor plugin fails to initialize, the Executor shows active but 
> does not accept tasks forever, just like being hung
> ---
>
> Key: SPARK-40320
> URL: https://issues.apache.org/jira/browse/SPARK-40320
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.0.0
>Reporter: Mars
>Priority: Major
>
> *Reproduce step:*
> set `spark.plugins=ErrorSparkPlugin`
> `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
> make it clearer):
> {code:java}
> class ErrorSparkPlugin extends SparkPlugin {
>   /**
>*/
>   override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()
>   /**
>*/
>   override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
> }{code}
> {code:java}
> class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
>   private val checkingInterval: Long = 1
>   override def init(_ctx: PluginContext, extraConf: util.Map[String, 
> String]): Unit = {
> if (checkingInterval == 1) {
>   throw new UnsatisfiedLinkError("LCL my Exception error2")
> }
>   }
> } {code}
> The Executor is active when we check in spark-ui, however it was broken and 
> doesn't receive any task.
> *Root Cause:*
> I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` 
> it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in 
> method `dealWithFatalError` .

[jira] [Updated] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung

2022-09-02 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-40320:
-
Description: 
*Reproduce step:*
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
if (checkingInterval == 1) {
  throw new UnsatisfiedLinkError("LCL my Exception error2")
}
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

*Root Cause:*

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM process 
 is active but the  communication thread is no longer working ( please see  
`MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was broken here, so 
executor doesn't receive any message)

Some idea:
I think it is very hard to know what happened here unless we check in the code. 
The Executor is active but it can't do anything. We will wonder if the driver 
is broken or the Executor problem.  I think at least the Executor status 
shouldn't be active here or the Executor can exitExecutor (kill itself)

 

  was:
Reproduce step:
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
if (checkingInterval == 1) {
  throw new UnsatisfiedLinkError("LCL my Exception error2")
}
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

Root Cause:

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM process 
 is active but the  communication thread is no longer working ( please see  
`MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was broken here, so 
executor doesn't receive any message)

Solution:
I think it is very hard to know what happened here unless we check in the code. 
The Executor is active but it can't do anything. We will wonder if the driver 
is broken or the Executor problem.  I think at least the Executor status 
shouldn't be active here or the Executor can exitExecutor (kill itself)

 


> When the Executor plugin fails to initialize, the Executor shows active but 
> does not accept tasks forever, just like being hung
> ---
>
> Key: SPARK-40320
> URL: https://issues.apache.org/jira/browse/SPARK-40320
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.0.0
>Reporter: Mars
>Priority: Major
>
> *Reproduce step:*
> set `spark.plugins=ErrorSparkPlugin`
> `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
> make it clearer):
> {code:java}
> class ErrorSparkPlugin extends SparkPlugin {
>   /**
>*/
>   override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()
>   /**
>*/
>   override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
> }{code}
> {code:java}
> class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
>   private val checkingInterval: Long = 1
>   override def init(_ctx: PluginContext, extraConf: util.Map[String, 
> String]): Unit = {
> if (checkingInterval == 1) {
>   throw new UnsatisfiedLinkError("LCL my Exception error2")
> }
>   }
> } {code}
> The Executor is active when we check in spark-ui, however it was broken and 
> doesn't receive any task.
> *Root Cause:*
> I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` 
> it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in 
> method `dealWithFatalError` . Actua

[jira] [Updated] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung

2022-09-02 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-40320:
-
Description: 
Reproduce step:
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
if (checkingInterval == 1) {
  throw new UnsatisfiedLinkError("LCL my Exception error2")
}
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

Root Cause:

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` JVM process 
 is active but the  communication thread is no longer working

 

  was:
Reproduce step:
set `spark.plugins=ErrorSparkPlugin`
`ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
make it clearer):
{code:java}
class ErrorSparkPlugin extends SparkPlugin {
  /**
   */
  override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()

  /**
   */
  override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
}{code}
{code:java}
class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
  private val checkingInterval: Long = 1

  override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): 
Unit = {
if (checkingInterval == 1) {
  throw new UnsatisfiedLinkError("LCL my Exception error2")
}
  }
} {code}
The Executor is active when we check in spark-ui, however it was broken and 
doesn't receive any task.

Root Cause:

I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it 
will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method 
`dealWithFatalError` . Actually the executor 

 


> When the Executor plugin fails to initialize, the Executor shows active but 
> does not accept tasks forever, just like being hung
> ---
>
> Key: SPARK-40320
> URL: https://issues.apache.org/jira/browse/SPARK-40320
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.0.0
>Reporter: Mars
>Priority: Major
>
> Reproduce step:
> set `spark.plugins=ErrorSparkPlugin`
> `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to 
> make it clearer):
> {code:java}
> class ErrorSparkPlugin extends SparkPlugin {
>   /**
>*/
>   override def driverPlugin(): DriverPlugin =  new ErrorDriverPlugin()
>   /**
>*/
>   override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin()
> }{code}
> {code:java}
> class ErrorExecutorPlugin extends ExecutorPlugin with Logging {
>   private val checkingInterval: Long = 1
>   override def init(_ctx: PluginContext, extraConf: util.Map[String, 
> String]): Unit = {
> if (checkingInterval == 1) {
>   throw new UnsatisfiedLinkError("LCL my Exception error2")
> }
>   }
> } {code}
> The Executor is active when we check in spark-ui, however it was broken and 
> doesn't receive any task.
> Root Cause:
> I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` 
> it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in 
> method `dealWithFatalError` . Actually the  `CoarseGrainedExecutorBackend` 
> JVM process  is active but the  communication thread is no longer working
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung

2022-09-02 Thread Mars (Jira)
Mars created SPARK-40320:


 Summary: When the Executor plugin fails to initialize, the 
Executor shows active but does not accept tasks forever, just like being hung
 Key: SPARK-40320
 URL: https://issues.apache.org/jira/browse/SPARK-40320
 Project: Spark
  Issue Type: Bug
  Components: Scheduler
Affects Versions: 3.0.0
Reporter: Mars






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40113) Reactor ParquetScanBuilder DataSourceV2 interface implementation

2022-08-16 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-40113:
-
Summary: Reactor ParquetScanBuilder DataSourceV2 interface implementation  
(was: Unify ParquetScanBuilder DataSourceV2 interface implementation)

> Reactor ParquetScanBuilder DataSourceV2 interface implementation
> 
>
> Key: SPARK-40113
> URL: https://issues.apache.org/jira/browse/SPARK-40113
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 3.3.0
>Reporter: Mars
>Priority: Minor
>
> Now `FileScanBuilder` interface is not fully implemented in 
> `ParquetScanBuilder` like 
> `OrcScanBuilder`,`AvroScanBuilder`,`CSVScanBuilder`
> In order to unify the logic of the code and make it clearer, this part of the 
> implementation is unified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40113) Unify ParquetScanBuilder DataSourceV2 interface implementation

2022-08-16 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-40113:
-
Priority: Minor  (was: Major)

> Unify ParquetScanBuilder DataSourceV2 interface implementation
> --
>
> Key: SPARK-40113
> URL: https://issues.apache.org/jira/browse/SPARK-40113
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 3.3.0
>Reporter: Mars
>Priority: Minor
>
> Now `FileScanBuilder` interface is not fully implemented in 
> `ParquetScanBuilder` like 
> `OrcScanBuilder`,`AvroScanBuilder`,`CSVScanBuilder`
> In order to unify the logic of the code and make it clearer, this part of the 
> implementation is unified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40113) Unify ParquetScanBuilder DataSourceV2 interface implementation

2022-08-16 Thread Mars (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-40113:
-
Summary: Unify ParquetScanBuilder DataSourceV2 interface implementation  
(was: Unified ParquetScanBuilder DataSourceV2 interface implementation)

> Unify ParquetScanBuilder DataSourceV2 interface implementation
> --
>
> Key: SPARK-40113
> URL: https://issues.apache.org/jira/browse/SPARK-40113
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 3.3.0
>Reporter: Mars
>Priority: Major
>
> Now `FileScanBuilder` interface is not fully implemented in 
> `ParquetScanBuilder` like 
> `OrcScanBuilder`,`AvroScanBuilder`,`CSVScanBuilder`
> In order to unify the logic of the code and make it clearer, this part of the 
> implementation is unified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40113) Unified ParquetScanBuilder DataSourceV2 interface implementation

2022-08-16 Thread Mars (Jira)
Mars created SPARK-40113:


 Summary: Unified ParquetScanBuilder DataSourceV2 interface 
implementation
 Key: SPARK-40113
 URL: https://issues.apache.org/jira/browse/SPARK-40113
 Project: Spark
  Issue Type: Improvement
  Components: Optimizer
Affects Versions: 3.3.0
Reporter: Mars


Now `FileScanBuilder` interface is not fully implemented in 
`ParquetScanBuilder` like 
`OrcScanBuilder`,`AvroScanBuilder`,`CSVScanBuilder`
In order to unify the logic of the code and make it clearer, this part of the 
implementation is unified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39909) Organize the check of push down information for JDBCV2Suite

2022-07-29 Thread Mars (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573197#comment-17573197
 ] 

Mars commented on SPARK-39909:
--

[~huaxingao] This is me :D [~miracle]  , my Jira id: miracle

> Organize the check of push down information for JDBCV2Suite
> ---
>
> Key: SPARK-39909
> URL: https://issues.apache.org/jira/browse/SPARK-39909
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, JDBCV2Suite have many test cases check the push-down information 
> looks not clean.
> For example,
> {code:java}
> checkPushedInfo(df,
>   "PushedFilters: [DEPT IS NOT NULL, DEPT > 1], PushedLimit: LIMIT 1, ")
> {code}
> If we change it to below looks better.
> {code:java}
> checkPushedInfo(df,
>   "PushedFilters: [DEPT IS NOT NULL, DEPT > 1]",
>   "PushedLimit: LIMIT 1")
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39909) Organize the check of push down information for JDBCV2Suite

2022-07-28 Thread Mars (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572362#comment-17572362
 ] 

Mars commented on SPARK-39909:
--


assign [~miracle]

> Organize the check of push down information for JDBCV2Suite
> ---
>
> Key: SPARK-39909
> URL: https://issues.apache.org/jira/browse/SPARK-39909
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, JDBCV2Suite have many test cases check the push-down information 
> looks not clean.
> For example,
> {code:java}
> checkPushedInfo(df,
>   "PushedFilters: [DEPT IS NOT NULL, DEPT > 1], PushedLimit: LIMIT 1, ")
> {code}
> If we change it to below looks better.
> {code:java}
> checkPushedInfo(df,
>   "PushedFilters: [DEPT IS NOT NULL, DEPT > 1]",
>   "PushedLimit: LIMIT 1")
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org