[jira] [Updated] (SPARK-48396) Support configuring limit control for SQL to use maximum cores
[ https://issues.apache.org/jira/browse/SPARK-48396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-48396: - Description: When there is a long-running shared Spark SQL cluster, there may be a situation where a large SQL occupies all the cores of the cluster, affecting the execution of other SQLs. Therefore, it is hoped that there is a configuration that can limit the maximum cores used by SQL. > Support configuring limit control for SQL to use maximum cores > -- > > Key: SPARK-48396 > URL: https://issues.apache.org/jira/browse/SPARK-48396 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Mars >Priority: Major > > When there is a long-running shared Spark SQL cluster, there may be a > situation where a large SQL occupies all the cores of the cluster, affecting > the execution of other SQLs. Therefore, it is hoped that there is a > configuration that can limit the maximum cores used by SQL. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48396) Support configuring limit control for SQL to use maximum cores
Mars created SPARK-48396: Summary: Support configuring limit control for SQL to use maximum cores Key: SPARK-48396 URL: https://issues.apache.org/jira/browse/SPARK-48396 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.1 Reporter: Mars -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46710) Clean up the broadcast data generated when sql execution ends
[ https://issues.apache.org/jira/browse/SPARK-46710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-46710: - Description: Broadcast data cleaning can only rely on cleaning when GC is triggered, which may lead to a lot of waste of memory usage , and may also cause query instability if a single GC takes too long. Actually we can clean the broadcast data generated during the execution of the sql after sql execution to reduce memory on driver or executor. was: Broadcast data cleaning can only rely on cleaning when GC is triggered, which may lead to a lot of waste of memory usage , and may also cause query instability if a single GC takes too long. After the execution of sql is completed, the broadcast data generated during the execution of the sql can be cleaned to reduce memory on driver or executor. > Clean up the broadcast data generated when sql execution ends > - > > Key: SPARK-46710 > URL: https://issues.apache.org/jira/browse/SPARK-46710 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mars >Priority: Major > Labels: pull-request-available > > Broadcast data cleaning can only rely on cleaning when GC is triggered, which > may lead to a lot of waste of memory usage , and may also cause query > instability if a single GC takes too long. > Actually we can clean the broadcast data generated during the execution of > the sql after sql execution to reduce memory on driver or executor. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46710) Clean up the broadcast data generated when sql execution ends
[ https://issues.apache.org/jira/browse/SPARK-46710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-46710: - Description: Broadcast data cleaning can only rely on cleaning when GC is triggered, which may lead to a lot of waste of memory usage , and may also cause query instability if a single GC takes too long. After the execution of sql is completed, the broadcast data generated during the execution of the sql can be cleaned to reduce memory on driver or executor. was:Faster cleaning of broadcast data generated by sql is beneficial to saving driver/executor memory and avoiding long-term GC. This can make a long running spark service more stable > Clean up the broadcast data generated when sql execution ends > - > > Key: SPARK-46710 > URL: https://issues.apache.org/jira/browse/SPARK-46710 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mars >Priority: Major > Labels: pull-request-available > > Broadcast data cleaning can only rely on cleaning when GC is triggered, which > may lead to a lot of waste of memory usage , and may also cause query > instability if a single GC takes too long. > After the execution of sql is completed, the broadcast data generated during > the execution of the sql can be cleaned to reduce memory on driver or > executor. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46710) Clean up the broadcast data generated when sql execution ends
[ https://issues.apache.org/jira/browse/SPARK-46710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-46710: - Summary: Clean up the broadcast data generated when sql execution ends (was: Clean up the broadcast data generated by SQL faster) > Clean up the broadcast data generated when sql execution ends > - > > Key: SPARK-46710 > URL: https://issues.apache.org/jira/browse/SPARK-46710 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mars >Priority: Major > > Faster cleaning of broadcast data generated by sql is beneficial to saving > driver/executor memory and avoiding long-term GC. This can make a long > running spark service more stable -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42388) Avoid unnecessary parquet footer reads when no filters in vectorized reader
[ https://issues.apache.org/jira/browse/SPARK-42388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-42388: - Description: Parquet footer is now read twice even if there are no filters requiring pushdown in vectorized parquet reader. When the NameNode is under high pressure, it will cost time to read twice. Actually we can avoid this unnecessary parquet footer reads and use footer metadata in {{{}VectorizedParquetRecordReader{}}}. > Avoid unnecessary parquet footer reads when no filters in vectorized reader > --- > > Key: SPARK-42388 > URL: https://issues.apache.org/jira/browse/SPARK-42388 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Mars >Priority: Major > > Parquet footer is now read twice even if there are no filters requiring > pushdown in vectorized parquet reader. > When the NameNode is under high pressure, it will cost time to read twice. > Actually we can avoid this unnecessary parquet footer reads and use footer > metadata in {{{}VectorizedParquetRecordReader{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42388) Avoid unnecessary parquet footer reads when no filters in vectorized reader
[ https://issues.apache.org/jira/browse/SPARK-42388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-42388: - Summary: Avoid unnecessary parquet footer reads when no filters in vectorized reader (was: Avoid unnecessary parquet footer reads when no filters in vectorized parquet reader) > Avoid unnecessary parquet footer reads when no filters in vectorized reader > --- > > Key: SPARK-42388 > URL: https://issues.apache.org/jira/browse/SPARK-42388 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Mars >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42388) Avoid unnecessary parquet footer reads when no filters in vectorized parquet reader
[ https://issues.apache.org/jira/browse/SPARK-42388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-42388: - Summary: Avoid unnecessary parquet footer reads when no filters in vectorized parquet reader (was: Avoid unnecessary parquet footer reads when no filters) > Avoid unnecessary parquet footer reads when no filters in vectorized parquet > reader > --- > > Key: SPARK-42388 > URL: https://issues.apache.org/jira/browse/SPARK-42388 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Mars >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42388) Avoid unnecessary parquet footer reads when no filters
Mars created SPARK-42388: Summary: Avoid unnecessary parquet footer reads when no filters Key: SPARK-42388 URL: https://issues.apache.org/jira/browse/SPARK-42388 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Mars -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42387) Avoid unnecessary parquet footer reads when no filters
Mars created SPARK-42387: Summary: Avoid unnecessary parquet footer reads when no filters Key: SPARK-42387 URL: https://issues.apache.org/jira/browse/SPARK-42387 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Mars -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38005) Support cleaning up merged shuffle files and state from external shuffle service
[ https://issues.apache.org/jira/browse/SPARK-38005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-38005: - Fix Version/s: 3.4.0 > Support cleaning up merged shuffle files and state from external shuffle > service > > > Key: SPARK-38005 > URL: https://issues.apache.org/jira/browse/SPARK-38005 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.2.0 >Reporter: Chandni Singh >Priority: Major > Fix For: 3.4.0 > > > Currently merged shuffle files and state is not cleaned up until an > application ends. SPARK-37618 handles the cleanup of regular shuffle files. > This jira will address cleaning up of merged shuffle files/state. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38005) Support cleaning up merged shuffle files and state from external shuffle service
[ https://issues.apache.org/jira/browse/SPARK-38005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars resolved SPARK-38005. -- Resolution: Fixed > Support cleaning up merged shuffle files and state from external shuffle > service > > > Key: SPARK-38005 > URL: https://issues.apache.org/jira/browse/SPARK-38005 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.2.0 >Reporter: Chandni Singh >Priority: Major > > Currently merged shuffle files and state is not cleaned up until an > application ends. SPARK-37618 handles the cleanup of regular shuffle files. > This jira will address cleaning up of merged shuffle files/state. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-41470) SPJ: Spark shouldn't assume InternalRow implements equals and hashCode
[ https://issues.apache.org/jira/browse/SPARK-41470 ] Mars deleted comment on SPARK-41470: -- was (Author: JIRAUSER290821): [~csun] I want to fix it ~ > SPJ: Spark shouldn't assume InternalRow implements equals and hashCode > -- > > Key: SPARK-41470 > URL: https://issues.apache.org/jira/browse/SPARK-41470 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.1 >Reporter: Chao Sun >Priority: Major > > Currently SPJ (Storage-Partitioned Join) actually assumes the {{InternalRow}} > returned by {{HasPartitionKey}} implements {{equals}} and {{{}hashCode{}}}. > We should remove this restriction. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-41470) SPJ: Spark shouldn't assume InternalRow implements equals and hashCode
[ https://issues.apache.org/jira/browse/SPARK-41470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679059#comment-17679059 ] Mars edited comment on SPARK-41470 at 1/20/23 8:43 AM: --- [~csun] I want to fix it ~ was (Author: JIRAUSER290821): [~csun] I want to take it ~ > SPJ: Spark shouldn't assume InternalRow implements equals and hashCode > -- > > Key: SPARK-41470 > URL: https://issues.apache.org/jira/browse/SPARK-41470 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.1 >Reporter: Chao Sun >Priority: Major > > Currently SPJ (Storage-Partitioned Join) actually assumes the {{InternalRow}} > returned by {{HasPartitionKey}} implements {{equals}} and {{{}hashCode{}}}. > We should remove this restriction. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41470) SPJ: Spark shouldn't assume InternalRow implements equals and hashCode
[ https://issues.apache.org/jira/browse/SPARK-41470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679059#comment-17679059 ] Mars commented on SPARK-41470: -- [~csun] I want to take it ~ > SPJ: Spark shouldn't assume InternalRow implements equals and hashCode > -- > > Key: SPARK-41470 > URL: https://issues.apache.org/jira/browse/SPARK-41470 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.1 >Reporter: Chao Sun >Priority: Major > > Currently SPJ (Storage-Partitioned Join) actually assumes the {{InternalRow}} > returned by {{HasPartitionKey}} implements {{equals}} and {{{}hashCode{}}}. > We should remove this restriction. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-41471) SPJ: Reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning
[ https://issues.apache.org/jira/browse/SPARK-41471 ] Mars deleted comment on SPARK-41471: -- was (Author: JIRAUSER290821): [~csun] Hi, I want to take it :) > SPJ: Reduce Spark shuffle when only one side of a join is > KeyGroupedPartitioning > > > Key: SPARK-41471 > URL: https://issues.apache.org/jira/browse/SPARK-41471 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.1 >Reporter: Chao Sun >Priority: Major > > When only one side of a SPJ (Storage-Partitioned Join) is > {{{}KeyGroupedPartitioning{}}}, Spark currently needs to shuffle both sides > using {{{}HashPartitioning{}}}. However, we may just need to shuffle the > other side according to the partition transforms defined in > {{{}KeyGroupedPartitioning{}}}. This is especially useful when the other side > is relatively small. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41471) SPJ: Reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning
[ https://issues.apache.org/jira/browse/SPARK-41471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652484#comment-17652484 ] Mars commented on SPARK-41471: -- [~csun] Hi, I want to take it :) > SPJ: Reduce Spark shuffle when only one side of a join is > KeyGroupedPartitioning > > > Key: SPARK-41471 > URL: https://issues.apache.org/jira/browse/SPARK-41471 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.1 >Reporter: Chao Sun >Priority: Major > > When only one side of a SPJ (Storage-Partitioned Join) is > {{{}KeyGroupedPartitioning{}}}, Spark currently needs to shuffle both sides > using {{{}HashPartitioning{}}}. However, we may just need to shuffle the > other side according to the partition transforms defined in > {{{}KeyGroupedPartitioning{}}}. This is especially useful when the other side > is relatively small. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41365) Stages UI page fails to load for proxy in some yarn versions
[ https://issues.apache.org/jira/browse/SPARK-41365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-41365: - Description: My environment CDH 5.8 , click to enter the spark UI from the yarn interface when visit the stage URI, it fails to load, URI is {code:java} http://:8088/proxy/application_1669877165233_0021/stages/stage/?id=0&attempt=0 {code} !image-2022-12-02-17-53-03-003.png|width=430,height=697! Server error stack trace: {code:java} Caused by: java.lang.NullPointerException at org.apache.spark.status.api.v1.StagesResource.$anonfun$doPagination$1(StagesResource.scala:207) at org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:142) at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:147) at org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:137) at org.apache.spark.status.api.v1.BaseAppResource.withUI$(ApiRootResource.scala:135) at org.apache.spark.status.api.v1.StagesResource.withUI(StagesResource.scala:31) at org.apache.spark.status.api.v1.StagesResource.doPagination(StagesResource.scala:206) at org.apache.spark.status.api.v1.StagesResource.$anonfun$taskTable$1(StagesResource.scala:161) at org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:142) at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:147) at org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:137) at org.apache.spark.status.api.v1.BaseAppResource.withUI$(ApiRootResource.scala:135) at org.apache.spark.status.api.v1.StagesResource.withUI(StagesResource.scala:31) at org.apache.spark.status.api.v1.StagesResource.taskTable(StagesResource.scala:145) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){code} The issue is similar to, the final phenomenon of the issue is the same, because the parameter encode twice https://issues.apache.org/jira/browse/SPARK-32467 https://issues.apache.org/jira/browse/SPARK-33611 The two issues solve two scenarios to avoid encode twice: 1. https redirect proxy 2. set reverse proxy enabled (spark.ui.reverseProxy) in Nginx But if encode twice due to other reasons, such as this issue (yarn proxy), it will also fail was: My environment CDH 5.8 , click to enter the spark UI from the yarn interface when visit the stage URI, it fails to load, URI is {code:java} http://:8088/proxy/application_1669877165233_0021/stages/stage/?id=0&attempt=0 {code} !image-2022-12-02-17-53-03-003.png|width=430,height=697! Server error stack trace: {code} Caused by: java.lang.NullPointerException at org.apache.spark.status.api.v1.StagesResource.$anonfun$doPagination$1(StagesResource.scala:207) at org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:142) at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:147) at org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:137) at org.apache.spark.status.api.v1.BaseAppResource.withUI$(ApiRootResource.scala:135) at org.apache.spark.status.api.v1.StagesResource.withUI(StagesResource.scala:31) at org.apache.spark.status.api.v1.StagesResource.doPagination(StagesResource.scala:206) at org.apache.spark.status.api.v1.StagesResource.$anonfun$taskTable$1(StagesResource.scala:161) at org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:142) at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:147) at org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:137) at org.apache.spark.status.api.v1.BaseAppResource.withUI$(ApiRootResource.scala:135) at org.apache.spark.status.api.v1.StagesResource.withUI(StagesResource.scala:31) at org.apache.spark.status.api.v1.StagesResource.taskTable(StagesResource.scala:145) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){code} > Stages UI page fails to load for proxy in some yarn versions > - > > Key: SPARK-41365 > URL: https://issues.apache.org/jira/browse/SPARK-41365 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.1 > Environment: as above >Reporter: Mars >Priority: Major > Attachments: image-2022-12-02-17-53-03-003.png > > > My environment CDH 5.8 , click to enter the spark UI from the yarn interface > when visit the stage URI, it fails to load, URI is > {code:java} > http://:808
[jira] [Updated] (SPARK-41365) Stages UI page fails to load for proxy in some yarn versions
[ https://issues.apache.org/jira/browse/SPARK-41365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-41365: - Description: My environment CDH 5.8 , click to enter the spark UI from the yarn interface when visit the stage URI, it fails to load, URI is {code:java} http://:8088/proxy/application_1669877165233_0021/stages/stage/?id=0&attempt=0 {code} !image-2022-12-02-17-53-03-003.png|width=430,height=697! Server error stack trace: {code} Caused by: java.lang.NullPointerException at org.apache.spark.status.api.v1.StagesResource.$anonfun$doPagination$1(StagesResource.scala:207) at org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:142) at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:147) at org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:137) at org.apache.spark.status.api.v1.BaseAppResource.withUI$(ApiRootResource.scala:135) at org.apache.spark.status.api.v1.StagesResource.withUI(StagesResource.scala:31) at org.apache.spark.status.api.v1.StagesResource.doPagination(StagesResource.scala:206) at org.apache.spark.status.api.v1.StagesResource.$anonfun$taskTable$1(StagesResource.scala:161) at org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:142) at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:147) at org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:137) at org.apache.spark.status.api.v1.BaseAppResource.withUI$(ApiRootResource.scala:135) at org.apache.spark.status.api.v1.StagesResource.withUI(StagesResource.scala:31) at org.apache.spark.status.api.v1.StagesResource.taskTable(StagesResource.scala:145) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){code} was:as above > Stages UI page fails to load for proxy in some yarn versions > - > > Key: SPARK-41365 > URL: https://issues.apache.org/jira/browse/SPARK-41365 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.1 > Environment: as above >Reporter: Mars >Priority: Major > Attachments: image-2022-12-02-17-53-03-003.png > > > My environment CDH 5.8 , click to enter the spark UI from the yarn interface > when visit the stage URI, it fails to load, URI is > {code:java} > http://:8088/proxy/application_1669877165233_0021/stages/stage/?id=0&attempt=0 > {code} > !image-2022-12-02-17-53-03-003.png|width=430,height=697! > Server error stack trace: > {code} > Caused by: java.lang.NullPointerException > at > org.apache.spark.status.api.v1.StagesResource.$anonfun$doPagination$1(StagesResource.scala:207) > at > org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:142) > at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:147) > at > org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:137) > at > org.apache.spark.status.api.v1.BaseAppResource.withUI$(ApiRootResource.scala:135) > at > org.apache.spark.status.api.v1.StagesResource.withUI(StagesResource.scala:31) > at > org.apache.spark.status.api.v1.StagesResource.doPagination(StagesResource.scala:206) > at > org.apache.spark.status.api.v1.StagesResource.$anonfun$taskTable$1(StagesResource.scala:161) > at > org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:142) > at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:147) > at > org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:137) > at > org.apache.spark.status.api.v1.BaseAppResource.withUI$(ApiRootResource.scala:135) > at > org.apache.spark.status.api.v1.StagesResource.withUI(StagesResource.scala:31) > at > org.apache.spark.status.api.v1.StagesResource.taskTable(StagesResource.scala:145) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41365) Stages UI page fails to load for proxy in some yarn versions
[ https://issues.apache.org/jira/browse/SPARK-41365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-41365: - Attachment: image-2022-12-02-17-53-03-003.png > Stages UI page fails to load for proxy in some yarn versions > - > > Key: SPARK-41365 > URL: https://issues.apache.org/jira/browse/SPARK-41365 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.1 > Environment: as above >Reporter: Mars >Priority: Major > Attachments: image-2022-12-02-17-53-03-003.png > > > as above -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41365) Stages UI page fails to load for proxy in some yarn versions
Mars created SPARK-41365: Summary: Stages UI page fails to load for proxy in some yarn versions Key: SPARK-41365 URL: https://issues.apache.org/jira/browse/SPARK-41365 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 3.3.1 Environment: as above Reporter: Mars as above -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-37313) Child stage using merged output or not should be based on the availability of merged output from parent stage
[ https://issues.apache.org/jira/browse/SPARK-37313 ] Mars deleted comment on SPARK-37313: -- was (Author: JIRAUSER290821): as comment said [https://github.com/apache/spark/pull/34461#issuecomment-964557253] I'm working on this Issue and trying to implement this functionality [~minyang] [~mridul] > Child stage using merged output or not should be based on the availability of > merged output from parent stage > - > > Key: SPARK-37313 > URL: https://issues.apache.org/jira/browse/SPARK-37313 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.2.1 >Reporter: Minchu Yang >Priority: Minor > > As discussed in the > [thread|https://github.com/apache/spark/pull/34461#pullrequestreview-799701494] > in SPARK-37023, during a stage retry, if parent stage has already generated > merged output in the previous attempt, with current behavior, the child stage > would not able to fetch the merged output, as this is controlled by > dependency.shuffleMergeEnabled (see current implementation > [here|https://github.com/apache/spark/blob/31b6f614d3173c8a5852243bf7d0b6200788432d/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala#L134-L136]) > during the stage retry. > Instead of using a single variable to control behavior at both mapper side > (push side) and reducer side (using merged output), whether child stage uses > merged output or not must only be based on whether merged output is available > for it to use(as discussed > [here|https://github.com/apache/spark/pull/34461#issuecomment-964557253]). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38093) Set shuffleMergeAllowed to false for a determinate stage after the stage is finalized
[ https://issues.apache.org/jira/browse/SPARK-38093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635786#comment-17635786 ] Mars commented on SPARK-38093: -- comment https://github.com/apache/spark/pull/34122#discussion_r796929787 > Set shuffleMergeAllowed to false for a determinate stage after the stage is > finalized > - > > Key: SPARK-38093 > URL: https://issues.apache.org/jira/browse/SPARK-38093 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Affects Versions: 3.2.1 >Reporter: Venkata krishnan Sowrirajan >Priority: Major > > Currently we are setting shuffleMergeAllowed to false before > prepareShuffleServicesForShuffleMapStage if the shuffle dependency is already > finalized. Ideally it is better to do it right after shuffle dependency > finalization for a determinate stage. cc [~mridulm80] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37313) Child stage using merged output or not should be based on the availability of merged output from parent stage
[ https://issues.apache.org/jira/browse/SPARK-37313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635314#comment-17635314 ] Mars commented on SPARK-37313: -- as comment said [https://github.com/apache/spark/pull/34461#issuecomment-964557253] I'm working on this Issue and trying to implement this functionality [~minyang] [~mridul] > Child stage using merged output or not should be based on the availability of > merged output from parent stage > - > > Key: SPARK-37313 > URL: https://issues.apache.org/jira/browse/SPARK-37313 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.2.1 >Reporter: Minchu Yang >Priority: Minor > > As discussed in the > [thread|https://github.com/apache/spark/pull/34461#pullrequestreview-799701494] > in SPARK-37023, during a stage retry, if parent stage has already generated > merged output in the previous attempt, with current behavior, the child stage > would not able to fetch the merged output, as this is controlled by > dependency.shuffleMergeEnabled (see current implementation > [here|https://github.com/apache/spark/blob/31b6f614d3173c8a5852243bf7d0b6200788432d/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala#L134-L136]) > during the stage retry. > Instead of using a single variable to control behavior at both mapper side > (push side) and reducer side (using merged output), whether child stage uses > merged output or not must only be based on whether merged output is available > for it to use(as discussed > [here|https://github.com/apache/spark/pull/34461#issuecomment-964557253]). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38005) Support cleaning up merged shuffle files and state from external shuffle service
[ https://issues.apache.org/jira/browse/SPARK-38005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17625006#comment-17625006 ] Mars commented on SPARK-38005: -- [~mridulm80] [~csingh] Hi~ I want to take this issue. I had understood some background and plan to start working on this issue now. > Support cleaning up merged shuffle files and state from external shuffle > service > > > Key: SPARK-38005 > URL: https://issues.apache.org/jira/browse/SPARK-38005 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.2.0 >Reporter: Chandni Singh >Priority: Major > > Currently merged shuffle files and state is not cleaned up until an > application ends. SPARK-37618 handles the cleanup of regular shuffle files. > This jira will address cleaning up of merged shuffle files/state. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung
[ https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601543#comment-17601543 ] Mars edited comment on SPARK-40320 at 9/7/22 10:50 PM: --- [~Ngone51] Shouldn't it bring up a new `receiveLoop()` to serve RPC messages? Yes, my previous thinking was wrong. I remote debug on Executor and I found that it did catch the fatal error in [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L82-L89] . It will resubmit receiveLoop and in the second time it will block by [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L69] This Executor did not initialize successfully in the first time , so it didn't send LaunchedExecutor to Driver (you can see [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L172] ) So the Executor can't launch task, related PR [https://github.com/apache/spark/pull/25964] . Why SparkUncaughtExceptionHandler doesn't catch the fatal error? See [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L284] plugins is private variable, so it was broken when initialize Executor at the beginning. was (Author: JIRAUSER290821): [~Ngone51] Shouldn't it bring up a new `receiveLoop()` to serve RPC messages? Yes, my previous thinking was wrong. I remote debug on Executor and I found that it did catch the fatal error in [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L82-L89] . It will resubmit receiveLoop and in the second time it will block by [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L69] This Executor did not initialize successfully in the first time and didn't send LaunchedExecutor to Driver (you can see [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L172] ) So the Executor can't launch task, related PR [https://github.com/apache/spark/pull/25964] . Why SparkUncaughtExceptionHandler doesn't catch the fatal error? See [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L284] plugins is private variable, so it was broken when initialize Executor at the beginning. > When the Executor plugin fails to initialize, the Executor shows active but > does not accept tasks forever, just like being hung > --- > > Key: SPARK-40320 > URL: https://issues.apache.org/jira/browse/SPARK-40320 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 3.0.0 >Reporter: Mars >Priority: Major > > *Reproduce step:* > set `spark.plugins=ErrorSparkPlugin` > `ErrorSparkPlugin` && `ErrorExecutorPlugin` class as below (I abbreviate the > code to make it clearer): > {code:java} > class ErrorSparkPlugin extends SparkPlugin { > /** >*/ > override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() > /** >*/ > override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() > }{code} > {code:java} > class ErrorExecutorPlugin extends ExecutorPlugin { > private val checkingInterval: Long = 1 > override def init(_ctx: PluginContext, extraConf: util.Map[String, > String]): Unit = { > if (checkingInterval == 1) { > throw new UnsatisfiedLinkError("My Exception error") > } > } > } {code} > The Executor is active when we check in spark-ui, however it was broken and > doesn't receive any task. > *Root Cause:* > I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` > it will throw fatal error (`UnsatisfiedLinkError` is fatal erro ) in method > `dealWithFatalError` . Actually the `CoarseGrainedExecutorBackend` JVM > process is active but the communication thread is no longer working ( > please see `MessageLoop#receiveLoopRunnable` , `receiveLoop()` was broken, > so executor doesn't receive any message) > Some ideas: > I think it is very hard to know what happened here unless we check in the > code. The Executor is active but it can't do anything. We will wonder if the > driver is broken or the Executor problem. I think at least the Executor > status shouldn't be active here or the Executor can exitExecutor (kill itself) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h
[jira] [Commented] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung
[ https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601543#comment-17601543 ] Mars commented on SPARK-40320: -- [~Ngone51] Shouldn't it bring up a new `receiveLoop()` to serve RPC messages? Yes, my previous thinking was wrong. I remote debug on Executor and I found that it did catch the fatal error in [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L82-L89] . It will resubmit receiveLoop and block in [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L69] But this Executor did not initialize successfully and didn't send LaunchedExecutor to Driver (you can see [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L172] ) So the Executor can't launch task, related PR [https://github.com/apache/spark/pull/25964] . Why SparkUncaughtExceptionHandler doesn't catch the fatal error? See [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L284] plugins is private variable, so it was broken when initialize Executor at the beginning. > When the Executor plugin fails to initialize, the Executor shows active but > does not accept tasks forever, just like being hung > --- > > Key: SPARK-40320 > URL: https://issues.apache.org/jira/browse/SPARK-40320 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 3.0.0 >Reporter: Mars >Priority: Major > > *Reproduce step:* > set `spark.plugins=ErrorSparkPlugin` > `ErrorSparkPlugin` && `ErrorExecutorPlugin` class as below (I abbreviate the > code to make it clearer): > {code:java} > class ErrorSparkPlugin extends SparkPlugin { > /** >*/ > override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() > /** >*/ > override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() > }{code} > {code:java} > class ErrorExecutorPlugin extends ExecutorPlugin { > private val checkingInterval: Long = 1 > override def init(_ctx: PluginContext, extraConf: util.Map[String, > String]): Unit = { > if (checkingInterval == 1) { > throw new UnsatisfiedLinkError("My Exception error") > } > } > } {code} > The Executor is active when we check in spark-ui, however it was broken and > doesn't receive any task. > *Root Cause:* > I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` > it will throw fatal error (`UnsatisfiedLinkError` is fatal erro ) in method > `dealWithFatalError` . Actually the `CoarseGrainedExecutorBackend` JVM > process is active but the communication thread is no longer working ( > please see `MessageLoop#receiveLoopRunnable` , `receiveLoop()` was broken, > so executor doesn't receive any message) > Some ideas: > I think it is very hard to know what happened here unless we check in the > code. The Executor is active but it can't do anything. We will wonder if the > driver is broken or the Executor problem. I think at least the Executor > status shouldn't be active here or the Executor can exitExecutor (kill itself) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung
[ https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-40320: - Description: *Reproduce step:* set `spark.plugins=ErrorSparkPlugin` `ErrorSparkPlugin` && `ErrorExecutorPlugin` class as below (I abbreviate the code to make it clearer): {code:java} class ErrorSparkPlugin extends SparkPlugin { /** */ override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() /** */ override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() }{code} {code:java} class ErrorExecutorPlugin extends ExecutorPlugin { private val checkingInterval: Long = 1 override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): Unit = { if (checkingInterval == 1) { throw new UnsatisfiedLinkError("My Exception error") } } } {code} The Executor is active when we check in spark-ui, however it was broken and doesn't receive any task. *Root Cause:* I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it will throw fatal error (`UnsatisfiedLinkError` is fatal erro ) in method `dealWithFatalError` . Actually the `CoarseGrainedExecutorBackend` JVM process is active but the communication thread is no longer working ( please see `MessageLoop#receiveLoopRunnable` , `receiveLoop()` was broken, so executor doesn't receive any message) Some ideas: I think it is very hard to know what happened here unless we check in the code. The Executor is active but it can't do anything. We will wonder if the driver is broken or the Executor problem. I think at least the Executor status shouldn't be active here or the Executor can exitExecutor (kill itself) was: *Reproduce step:* set `spark.plugins=ErrorSparkPlugin` `ErrorSparkPlugin` && `ErrorExecutorPlugin` class as below (I abbreviate the code to make it clearer): {code:java} class ErrorSparkPlugin extends SparkPlugin { /** */ override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() /** */ override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() }{code} {code:java} class ErrorExecutorPlugin extends ExecutorPlugin { private val checkingInterval: Long = 1 override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): Unit = { if (checkingInterval == 1) { throw new UnsatisfiedLinkError("My Exception error") } } } {code} The Executor is active when we check in spark-ui, however it was broken and doesn't receive any task. *Root Cause:* I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method `dealWithFatalError` . Actually the `CoarseGrainedExecutorBackend` JVM process is active but the communication thread is no longer working ( please see `MessageLoop#receiveLoopRunnable` , `receiveLoop()` was broken, so executor doesn't receive any message) Some ideas: I think it is very hard to know what happened here unless we check in the code. The Executor is active but it can't do anything. We will wonder if the driver is broken or the Executor problem. I think at least the Executor status shouldn't be active here or the Executor can exitExecutor (kill itself) > When the Executor plugin fails to initialize, the Executor shows active but > does not accept tasks forever, just like being hung > --- > > Key: SPARK-40320 > URL: https://issues.apache.org/jira/browse/SPARK-40320 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 3.0.0 >Reporter: Mars >Priority: Major > > *Reproduce step:* > set `spark.plugins=ErrorSparkPlugin` > `ErrorSparkPlugin` && `ErrorExecutorPlugin` class as below (I abbreviate the > code to make it clearer): > {code:java} > class ErrorSparkPlugin extends SparkPlugin { > /** >*/ > override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() > /** >*/ > override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() > }{code} > {code:java} > class ErrorExecutorPlugin extends ExecutorPlugin { > private val checkingInterval: Long = 1 > override def init(_ctx: PluginContext, extraConf: util.Map[String, > String]): Unit = { > if (checkingInterval == 1) { > throw new UnsatisfiedLinkError("My Exception error") > } > } > } {code} > The Executor is active when we check in spark-ui, however it was broken and > doesn't receive any task. > *Root Cause:* > I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` > it will throw fatal error (`UnsatisfiedLinkError` is fatal erro ) in method > `dealWithFatalError` . Actually the `CoarseGrainedExecutorBackend` JVM > process
[jira] [Updated] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung
[ https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-40320: - Description: *Reproduce step:* set `spark.plugins=ErrorSparkPlugin` `ErrorSparkPlugin` && `ErrorExecutorPlugin` class as below (I abbreviate the code to make it clearer): {code:java} class ErrorSparkPlugin extends SparkPlugin { /** */ override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() /** */ override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() }{code} {code:java} class ErrorExecutorPlugin extends ExecutorPlugin { private val checkingInterval: Long = 1 override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): Unit = { if (checkingInterval == 1) { throw new UnsatisfiedLinkError("My Exception error") } } } {code} The Executor is active when we check in spark-ui, however it was broken and doesn't receive any task. *Root Cause:* I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method `dealWithFatalError` . Actually the `CoarseGrainedExecutorBackend` JVM process is active but the communication thread is no longer working ( please see `MessageLoop#receiveLoopRunnable` , `receiveLoop()` was broken, so executor doesn't receive any message) Some ideas: I think it is very hard to know what happened here unless we check in the code. The Executor is active but it can't do anything. We will wonder if the driver is broken or the Executor problem. I think at least the Executor status shouldn't be active here or the Executor can exitExecutor (kill itself) was: *Reproduce step:* set `spark.plugins=ErrorSparkPlugin` `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to make it clearer): {code:java} class ErrorSparkPlugin extends SparkPlugin { /** */ override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() /** */ override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() }{code} {code:java} class ErrorExecutorPlugin extends ExecutorPlugin with Logging { private val checkingInterval: Long = 1 override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): Unit = { if (checkingInterval == 1) { throw new UnsatisfiedLinkError("LCL my Exception error2") } } } {code} The Executor is active when we check in spark-ui, however it was broken and doesn't receive any task. *Root Cause:* I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method `dealWithFatalError` . Actually the `CoarseGrainedExecutorBackend` JVM process is active but the communication thread is no longer working ( please see `MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was broken here, so executor doesn't receive any message) Some ideas: I think it is very hard to know what happened here unless we check in the code. The Executor is active but it can't do anything. We will wonder if the driver is broken or the Executor problem. I think at least the Executor status shouldn't be active here or the Executor can exitExecutor (kill itself) > When the Executor plugin fails to initialize, the Executor shows active but > does not accept tasks forever, just like being hung > --- > > Key: SPARK-40320 > URL: https://issues.apache.org/jira/browse/SPARK-40320 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 3.0.0 >Reporter: Mars >Priority: Major > > *Reproduce step:* > set `spark.plugins=ErrorSparkPlugin` > `ErrorSparkPlugin` && `ErrorExecutorPlugin` class as below (I abbreviate the > code to make it clearer): > {code:java} > class ErrorSparkPlugin extends SparkPlugin { > /** >*/ > override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() > /** >*/ > override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() > }{code} > {code:java} > class ErrorExecutorPlugin extends ExecutorPlugin { > private val checkingInterval: Long = 1 > override def init(_ctx: PluginContext, extraConf: util.Map[String, > String]): Unit = { > if (checkingInterval == 1) { > throw new UnsatisfiedLinkError("My Exception error") > } > } > } {code} > The Executor is active when we check in spark-ui, however it was broken and > doesn't receive any task. > *Root Cause:* > I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` > it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in > method `dealWithFatalError` . Actually the `CoarseGraine
[jira] [Updated] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung
[ https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-40320: - Description: Reproduce step: set `spark.plugins=ErrorSparkPlugin` `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to make it clearer): {code:java} class ErrorSparkPlugin extends SparkPlugin { /** */ override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() /** */ override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() }{code} {code:java} class ErrorExecutorPlugin extends ExecutorPlugin with Logging { private val checkingInterval: Long = 1 override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): Unit = { if (checkingInterval == 1) { throw new UnsatisfiedLinkError("LCL my Exception error2") } } } {code} The Executor is active when we check in spark-ui, however it was broken and doesn't receive any task. Root Cause: I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method `dealWithFatalError` . Actually the `CoarseGrainedExecutorBackend` JVM process is active but the communication thread is no longer working ( please see `MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was broken here, so executor doesn't receive any message) Solution: I think it is very hard to know what happened here unless we check in the code. The Executor is active but it can't do anything. We will wonder if the driver is broken or the Executor problem. I think at least the Executor status shouldn't be active here or the Executor can exitExecutor (kill itself) was: Reproduce step: set `spark.plugins=ErrorSparkPlugin` `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to make it clearer): {code:java} class ErrorSparkPlugin extends SparkPlugin { /** */ override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() /** */ override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() }{code} {code:java} class ErrorExecutorPlugin extends ExecutorPlugin with Logging { private val checkingInterval: Long = 1 override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): Unit = { if (checkingInterval == 1) { throw new UnsatisfiedLinkError("LCL my Exception error2") } } } {code} The Executor is active when we check in spark-ui, however it was broken and doesn't receive any task. Root Cause: I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method `dealWithFatalError` . Actually the `CoarseGrainedExecutorBackend` JVM process is active but the communication thread is no longer working > When the Executor plugin fails to initialize, the Executor shows active but > does not accept tasks forever, just like being hung > --- > > Key: SPARK-40320 > URL: https://issues.apache.org/jira/browse/SPARK-40320 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 3.0.0 >Reporter: Mars >Priority: Major > > Reproduce step: > set `spark.plugins=ErrorSparkPlugin` > `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to > make it clearer): > {code:java} > class ErrorSparkPlugin extends SparkPlugin { > /** >*/ > override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() > /** >*/ > override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() > }{code} > {code:java} > class ErrorExecutorPlugin extends ExecutorPlugin with Logging { > private val checkingInterval: Long = 1 > override def init(_ctx: PluginContext, extraConf: util.Map[String, > String]): Unit = { > if (checkingInterval == 1) { > throw new UnsatisfiedLinkError("LCL my Exception error2") > } > } > } {code} > The Executor is active when we check in spark-ui, however it was broken and > doesn't receive any task. > Root Cause: > I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` > it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in > method `dealWithFatalError` . Actually the `CoarseGrainedExecutorBackend` > JVM process is active but the communication thread is no longer working ( > please see `MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was > broken here, so executor doesn't receive any message) > Solution: > I think it is very hard to know what happened here unless we check in the > code. The Executor is active but it can't do anything. We will wonder if the > driver is broken or the Executor p
[jira] [Updated] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung
[ https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-40320: - Description: *Reproduce step:* set `spark.plugins=ErrorSparkPlugin` `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to make it clearer): {code:java} class ErrorSparkPlugin extends SparkPlugin { /** */ override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() /** */ override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() }{code} {code:java} class ErrorExecutorPlugin extends ExecutorPlugin with Logging { private val checkingInterval: Long = 1 override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): Unit = { if (checkingInterval == 1) { throw new UnsatisfiedLinkError("LCL my Exception error2") } } } {code} The Executor is active when we check in spark-ui, however it was broken and doesn't receive any task. *Root Cause:* I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method `dealWithFatalError` . Actually the `CoarseGrainedExecutorBackend` JVM process is active but the communication thread is no longer working ( please see `MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was broken here, so executor doesn't receive any message) Some ideas: I think it is very hard to know what happened here unless we check in the code. The Executor is active but it can't do anything. We will wonder if the driver is broken or the Executor problem. I think at least the Executor status shouldn't be active here or the Executor can exitExecutor (kill itself) was: *Reproduce step:* set `spark.plugins=ErrorSparkPlugin` `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to make it clearer): {code:java} class ErrorSparkPlugin extends SparkPlugin { /** */ override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() /** */ override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() }{code} {code:java} class ErrorExecutorPlugin extends ExecutorPlugin with Logging { private val checkingInterval: Long = 1 override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): Unit = { if (checkingInterval == 1) { throw new UnsatisfiedLinkError("LCL my Exception error2") } } } {code} The Executor is active when we check in spark-ui, however it was broken and doesn't receive any task. *Root Cause:* I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method `dealWithFatalError` . Actually the `CoarseGrainedExecutorBackend` JVM process is active but the communication thread is no longer working ( please see `MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was broken here, so executor doesn't receive any message) Some idea: I think it is very hard to know what happened here unless we check in the code. The Executor is active but it can't do anything. We will wonder if the driver is broken or the Executor problem. I think at least the Executor status shouldn't be active here or the Executor can exitExecutor (kill itself) > When the Executor plugin fails to initialize, the Executor shows active but > does not accept tasks forever, just like being hung > --- > > Key: SPARK-40320 > URL: https://issues.apache.org/jira/browse/SPARK-40320 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 3.0.0 >Reporter: Mars >Priority: Major > > *Reproduce step:* > set `spark.plugins=ErrorSparkPlugin` > `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to > make it clearer): > {code:java} > class ErrorSparkPlugin extends SparkPlugin { > /** >*/ > override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() > /** >*/ > override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() > }{code} > {code:java} > class ErrorExecutorPlugin extends ExecutorPlugin with Logging { > private val checkingInterval: Long = 1 > override def init(_ctx: PluginContext, extraConf: util.Map[String, > String]): Unit = { > if (checkingInterval == 1) { > throw new UnsatisfiedLinkError("LCL my Exception error2") > } > } > } {code} > The Executor is active when we check in spark-ui, however it was broken and > doesn't receive any task. > *Root Cause:* > I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` > it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in > method `dealWithFatalError` .
[jira] [Updated] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung
[ https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-40320: - Description: *Reproduce step:* set `spark.plugins=ErrorSparkPlugin` `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to make it clearer): {code:java} class ErrorSparkPlugin extends SparkPlugin { /** */ override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() /** */ override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() }{code} {code:java} class ErrorExecutorPlugin extends ExecutorPlugin with Logging { private val checkingInterval: Long = 1 override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): Unit = { if (checkingInterval == 1) { throw new UnsatisfiedLinkError("LCL my Exception error2") } } } {code} The Executor is active when we check in spark-ui, however it was broken and doesn't receive any task. *Root Cause:* I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method `dealWithFatalError` . Actually the `CoarseGrainedExecutorBackend` JVM process is active but the communication thread is no longer working ( please see `MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was broken here, so executor doesn't receive any message) Some idea: I think it is very hard to know what happened here unless we check in the code. The Executor is active but it can't do anything. We will wonder if the driver is broken or the Executor problem. I think at least the Executor status shouldn't be active here or the Executor can exitExecutor (kill itself) was: Reproduce step: set `spark.plugins=ErrorSparkPlugin` `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to make it clearer): {code:java} class ErrorSparkPlugin extends SparkPlugin { /** */ override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() /** */ override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() }{code} {code:java} class ErrorExecutorPlugin extends ExecutorPlugin with Logging { private val checkingInterval: Long = 1 override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): Unit = { if (checkingInterval == 1) { throw new UnsatisfiedLinkError("LCL my Exception error2") } } } {code} The Executor is active when we check in spark-ui, however it was broken and doesn't receive any task. Root Cause: I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method `dealWithFatalError` . Actually the `CoarseGrainedExecutorBackend` JVM process is active but the communication thread is no longer working ( please see `MessageLoop#receiveLoopRunnable` , `receiveLoop()` while was broken here, so executor doesn't receive any message) Solution: I think it is very hard to know what happened here unless we check in the code. The Executor is active but it can't do anything. We will wonder if the driver is broken or the Executor problem. I think at least the Executor status shouldn't be active here or the Executor can exitExecutor (kill itself) > When the Executor plugin fails to initialize, the Executor shows active but > does not accept tasks forever, just like being hung > --- > > Key: SPARK-40320 > URL: https://issues.apache.org/jira/browse/SPARK-40320 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 3.0.0 >Reporter: Mars >Priority: Major > > *Reproduce step:* > set `spark.plugins=ErrorSparkPlugin` > `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to > make it clearer): > {code:java} > class ErrorSparkPlugin extends SparkPlugin { > /** >*/ > override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() > /** >*/ > override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() > }{code} > {code:java} > class ErrorExecutorPlugin extends ExecutorPlugin with Logging { > private val checkingInterval: Long = 1 > override def init(_ctx: PluginContext, extraConf: util.Map[String, > String]): Unit = { > if (checkingInterval == 1) { > throw new UnsatisfiedLinkError("LCL my Exception error2") > } > } > } {code} > The Executor is active when we check in spark-ui, however it was broken and > doesn't receive any task. > *Root Cause:* > I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` > it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in > method `dealWithFatalError` . Actua
[jira] [Updated] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung
[ https://issues.apache.org/jira/browse/SPARK-40320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-40320: - Description: Reproduce step: set `spark.plugins=ErrorSparkPlugin` `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to make it clearer): {code:java} class ErrorSparkPlugin extends SparkPlugin { /** */ override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() /** */ override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() }{code} {code:java} class ErrorExecutorPlugin extends ExecutorPlugin with Logging { private val checkingInterval: Long = 1 override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): Unit = { if (checkingInterval == 1) { throw new UnsatisfiedLinkError("LCL my Exception error2") } } } {code} The Executor is active when we check in spark-ui, however it was broken and doesn't receive any task. Root Cause: I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method `dealWithFatalError` . Actually the `CoarseGrainedExecutorBackend` JVM process is active but the communication thread is no longer working was: Reproduce step: set `spark.plugins=ErrorSparkPlugin` `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to make it clearer): {code:java} class ErrorSparkPlugin extends SparkPlugin { /** */ override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() /** */ override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() }{code} {code:java} class ErrorExecutorPlugin extends ExecutorPlugin with Logging { private val checkingInterval: Long = 1 override def init(_ctx: PluginContext, extraConf: util.Map[String, String]): Unit = { if (checkingInterval == 1) { throw new UnsatisfiedLinkError("LCL my Exception error2") } } } {code} The Executor is active when we check in spark-ui, however it was broken and doesn't receive any task. Root Cause: I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in method `dealWithFatalError` . Actually the executor > When the Executor plugin fails to initialize, the Executor shows active but > does not accept tasks forever, just like being hung > --- > > Key: SPARK-40320 > URL: https://issues.apache.org/jira/browse/SPARK-40320 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 3.0.0 >Reporter: Mars >Priority: Major > > Reproduce step: > set `spark.plugins=ErrorSparkPlugin` > `ErrorSparkPlugin` && `ErrorExecutorPlugin` class (I abbreviate the code to > make it clearer): > {code:java} > class ErrorSparkPlugin extends SparkPlugin { > /** >*/ > override def driverPlugin(): DriverPlugin = new ErrorDriverPlugin() > /** >*/ > override def executorPlugin(): ExecutorPlugin = new ErrorExecutorPlugin() > }{code} > {code:java} > class ErrorExecutorPlugin extends ExecutorPlugin with Logging { > private val checkingInterval: Long = 1 > override def init(_ctx: PluginContext, extraConf: util.Map[String, > String]): Unit = { > if (checkingInterval == 1) { > throw new UnsatisfiedLinkError("LCL my Exception error2") > } > } > } {code} > The Executor is active when we check in spark-ui, however it was broken and > doesn't receive any task. > Root Cause: > I check the code and I find in `org.apache.spark.rpc.netty.Inbox#safelyCall` > it will throw fatal error (`UnsatisfiedLinkError` is fatal error here ) in > method `dealWithFatalError` . Actually the `CoarseGrainedExecutorBackend` > JVM process is active but the communication thread is no longer working > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40320) When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung
Mars created SPARK-40320: Summary: When the Executor plugin fails to initialize, the Executor shows active but does not accept tasks forever, just like being hung Key: SPARK-40320 URL: https://issues.apache.org/jira/browse/SPARK-40320 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 3.0.0 Reporter: Mars -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40113) Reactor ParquetScanBuilder DataSourceV2 interface implementation
[ https://issues.apache.org/jira/browse/SPARK-40113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-40113: - Summary: Reactor ParquetScanBuilder DataSourceV2 interface implementation (was: Unify ParquetScanBuilder DataSourceV2 interface implementation) > Reactor ParquetScanBuilder DataSourceV2 interface implementation > > > Key: SPARK-40113 > URL: https://issues.apache.org/jira/browse/SPARK-40113 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 3.3.0 >Reporter: Mars >Priority: Minor > > Now `FileScanBuilder` interface is not fully implemented in > `ParquetScanBuilder` like > `OrcScanBuilder`,`AvroScanBuilder`,`CSVScanBuilder` > In order to unify the logic of the code and make it clearer, this part of the > implementation is unified. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40113) Unify ParquetScanBuilder DataSourceV2 interface implementation
[ https://issues.apache.org/jira/browse/SPARK-40113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-40113: - Priority: Minor (was: Major) > Unify ParquetScanBuilder DataSourceV2 interface implementation > -- > > Key: SPARK-40113 > URL: https://issues.apache.org/jira/browse/SPARK-40113 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 3.3.0 >Reporter: Mars >Priority: Minor > > Now `FileScanBuilder` interface is not fully implemented in > `ParquetScanBuilder` like > `OrcScanBuilder`,`AvroScanBuilder`,`CSVScanBuilder` > In order to unify the logic of the code and make it clearer, this part of the > implementation is unified. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40113) Unify ParquetScanBuilder DataSourceV2 interface implementation
[ https://issues.apache.org/jira/browse/SPARK-40113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars updated SPARK-40113: - Summary: Unify ParquetScanBuilder DataSourceV2 interface implementation (was: Unified ParquetScanBuilder DataSourceV2 interface implementation) > Unify ParquetScanBuilder DataSourceV2 interface implementation > -- > > Key: SPARK-40113 > URL: https://issues.apache.org/jira/browse/SPARK-40113 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 3.3.0 >Reporter: Mars >Priority: Major > > Now `FileScanBuilder` interface is not fully implemented in > `ParquetScanBuilder` like > `OrcScanBuilder`,`AvroScanBuilder`,`CSVScanBuilder` > In order to unify the logic of the code and make it clearer, this part of the > implementation is unified. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40113) Unified ParquetScanBuilder DataSourceV2 interface implementation
Mars created SPARK-40113: Summary: Unified ParquetScanBuilder DataSourceV2 interface implementation Key: SPARK-40113 URL: https://issues.apache.org/jira/browse/SPARK-40113 Project: Spark Issue Type: Improvement Components: Optimizer Affects Versions: 3.3.0 Reporter: Mars Now `FileScanBuilder` interface is not fully implemented in `ParquetScanBuilder` like `OrcScanBuilder`,`AvroScanBuilder`,`CSVScanBuilder` In order to unify the logic of the code and make it clearer, this part of the implementation is unified. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39909) Organize the check of push down information for JDBCV2Suite
[ https://issues.apache.org/jira/browse/SPARK-39909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573197#comment-17573197 ] Mars commented on SPARK-39909: -- [~huaxingao] This is me :D [~miracle] , my Jira id: miracle > Organize the check of push down information for JDBCV2Suite > --- > > Key: SPARK-39909 > URL: https://issues.apache.org/jira/browse/SPARK-39909 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Currently, JDBCV2Suite have many test cases check the push-down information > looks not clean. > For example, > {code:java} > checkPushedInfo(df, > "PushedFilters: [DEPT IS NOT NULL, DEPT > 1], PushedLimit: LIMIT 1, ") > {code} > If we change it to below looks better. > {code:java} > checkPushedInfo(df, > "PushedFilters: [DEPT IS NOT NULL, DEPT > 1]", > "PushedLimit: LIMIT 1") > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39909) Organize the check of push down information for JDBCV2Suite
[ https://issues.apache.org/jira/browse/SPARK-39909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572362#comment-17572362 ] Mars commented on SPARK-39909: -- assign [~miracle] > Organize the check of push down information for JDBCV2Suite > --- > > Key: SPARK-39909 > URL: https://issues.apache.org/jira/browse/SPARK-39909 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Currently, JDBCV2Suite have many test cases check the push-down information > looks not clean. > For example, > {code:java} > checkPushedInfo(df, > "PushedFilters: [DEPT IS NOT NULL, DEPT > 1], PushedLimit: LIMIT 1, ") > {code} > If we change it to below looks better. > {code:java} > checkPushedInfo(df, > "PushedFilters: [DEPT IS NOT NULL, DEPT > 1]", > "PushedLimit: LIMIT 1") > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org