[jira] [Updated] (SPARK-47612) Improve picking the side of partially clustered distribution accroding to partition size

2024-03-26 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated SPARK-47612:
---
Description: 
Now we pick up the side of partially clustered distribution:

SPJ currently relies on a simple heuristic and always pick the side with less 
data size based on table statistics as the side fully clustered, even though it 
could also contain skewed partitions. 


We can potentially do fine-grained comparison based on partition values, since 
we have the information now.

  was:
Now we pick up the side of partially clustered distribution:


Using plan statistics to determine which side of join to fully
cluster partition values.

We can optimize to use partition size since we have the information now.


> Improve picking the side of partially clustered distribution accroding to 
> partition size
> 
>
> Key: SPARK-47612
> URL: https://issues.apache.org/jira/browse/SPARK-47612
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Qi Zhu
>Priority: Major
>
> Now we pick up the side of partially clustered distribution:
> SPJ currently relies on a simple heuristic and always pick the side with less 
> data size based on table statistics as the side fully clustered, even though 
> it could also contain skewed partitions. 
> We can potentially do fine-grained comparison based on partition values, 
> since we have the information now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47612) Improve picking the side of partially clustered distribution accroding to partition size

2024-03-26 Thread Qi Zhu (Jira)
Qi Zhu created SPARK-47612:
--

 Summary: Improve picking the side of partially clustered 
distribution accroding to partition size
 Key: SPARK-47612
 URL: https://issues.apache.org/jira/browse/SPARK-47612
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Qi Zhu


Now we pick up the side of partially clustered distribution:


Using plan statistics to determine which side of join to fully
cluster partition values.

We can optimize to use partition size since we have the information now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47284) We should ensure enough parallelism when ShuffleExchangeLike join with specs without shuffle

2024-03-05 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated SPARK-47284:
---
Description: 
The following case is introduced by 
https://issues.apache.org/jira/browse/SPARK-35703

// When choosing specs, we should consider those children with no 
`ShuffleExchangeLike` node
// first. For instance, if we have:
// A: (No_Exchange, 100) <---> B: (Exchange, 120)
// it's better to pick A and change B to (Exchange, 100) instead of picking B 
and insert a
// new shuffle for A.


*But we'd better improve it in some cases, for example:*
A: (No_Exchange, 2) <---> B: (Exchange, 100)

The current logic will change to:
A: (No_Exchange, 2) <---> B: (Exchange,2)

It actually not ensure enough parallelism, it will reduce the performance i 
think.

  was:
The following case is introduced by 
https://issues.apache.org/jira/browse/SPARK-35703


// When choosing specs, we should consider those children with no 
`ShuffleExchangeLike` node
// first. For instance, if we have:
// A: (No_Exchange, 100) <---> B: (Exchange, 120)
// it's better to pick A and change B to (Exchange, 100) instead of picking B 
and insert a
// new shuffle for A.



But we'd better improve it in some cases, for example:
A: (No_Exchange, 2) <---> B: (Exchange, 100)


The current logic will change to:
A: (No_Exchange, 2) <---> B: (Exchange,2)

It actually not ensure enough parallelism, it will reduce the performance i 
think.


> We should ensure enough parallelism when ShuffleExchangeLike join with specs 
> without shuffle
> 
>
> Key: SPARK-47284
> URL: https://issues.apache.org/jira/browse/SPARK-47284
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Qi Zhu
>Priority: Major
>
> The following case is introduced by 
> https://issues.apache.org/jira/browse/SPARK-35703
> // When choosing specs, we should consider those children with no 
> `ShuffleExchangeLike` node
> // first. For instance, if we have:
> // A: (No_Exchange, 100) <---> B: (Exchange, 120)
> // it's better to pick A and change B to (Exchange, 100) instead of picking B 
> and insert a
> // new shuffle for A.
> *But we'd better improve it in some cases, for example:*
> A: (No_Exchange, 2) <---> B: (Exchange, 100)
> The current logic will change to:
> A: (No_Exchange, 2) <---> B: (Exchange,2)
> It actually not ensure enough parallelism, it will reduce the performance i 
> think.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47284) We should ensure enough parallelism when ShuffleExchangeLike join with specs without shuffle

2024-03-05 Thread Qi Zhu (Jira)
Qi Zhu created SPARK-47284:
--

 Summary: We should ensure enough parallelism when 
ShuffleExchangeLike join with specs without shuffle
 Key: SPARK-47284
 URL: https://issues.apache.org/jira/browse/SPARK-47284
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Qi Zhu


The following case is introduced by 
https://issues.apache.org/jira/browse/SPARK-35703


// When choosing specs, we should consider those children with no 
`ShuffleExchangeLike` node
// first. For instance, if we have:
// A: (No_Exchange, 100) <---> B: (Exchange, 120)
// it's better to pick A and change B to (Exchange, 100) instead of picking B 
and insert a
// new shuffle for A.



But we'd better improve it in some cases, for example:
A: (No_Exchange, 2) <---> B: (Exchange, 100)


The current logic will change to:
A: (No_Exchange, 2) <---> B: (Exchange,2)

It actually not ensure enough parallelism, it will reduce the performance i 
think.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44698) Create table like other table should also copy table stats.

2023-08-07 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu resolved SPARK-44698.

Resolution: Not A Problem

Sorry i misunderstand, the create table like don't need to copy data actually!

> Create table like other table should also copy table stats.
> ---
>
> Key: SPARK-44698
> URL: https://issues.apache.org/jira/browse/SPARK-44698
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 4.0.0
>Reporter: Qi Zhu
>Priority: Major
>
> For example:
> describe table extended tbl;
> col0                    int
> col1                    int
> col2                    int
> col3                    int
> Detailed Table Information
> Catalog                 spark_catalog
> Database                default
> Table                   tbl
> Owner                   zhuqi
> Created Time            Mon Aug 07 14:02:30 CST 2023
> Last Access             UNKNOWN
> Created By              Spark 4.0.0-SNAPSHOT
> Type                    MANAGED
> Provider                hive
> Table Properties        [transient_lastDdlTime=1691388473]
> Statistics              30 bytes
> Location                
> [file:/Users/zhuqi/spark/spark/spark-warehouse/tbl|file:///Users/zhuqi/spark/spark/spark-warehouse/tbl]
> Serde Library           org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> InputFormat             org.apache.hadoop.mapred.TextInputFormat
> OutputFormat            
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Storage Properties      [serialization.format=1]
> Partition Provider      Catalog
> Time taken: 0.032 seconds, Fetched 23 row(s)
> create table tbl2 like tbl;
> 23/08/07 14:14:07 WARN HiveMetaStore: Location: 
> [file:/Users/zhuqi/spark/spark/spark-warehouse/tbl2|file:///Users/zhuqi/spark/spark/spark-warehouse/tbl2]
>  specified for non-external table:tbl2
> Time taken: 0.098 seconds
> spark-sql (default)> describe table extended tbl2;
> col0                    int
> col1                    int
> col2                    int
> col3                    int
> Detailed Table Information
> Catalog                 spark_catalog
> Database                default
> Table                   tbl2
> Owner                   zhuqi
> Created Time            Mon Aug 07 14:14:07 CST 2023
> Last Access             UNKNOWN
> Created By              Spark 4.0.0-SNAPSHOT
> Type                    MANAGED
> Provider                hive
> Table Properties        [transient_lastDdlTime=1691388847]
> Location                
> [file:/Users/zhuqi/spark/spark/spark-warehouse/tbl2|file:///Users/zhuqi/spark/spark/spark-warehouse/tbl2]
> Serde Library           org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> InputFormat             org.apache.hadoop.mapred.TextInputFormat
> OutputFormat            
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Storage Properties      [serialization.format=1]
> Partition Provider      Catalog
> Time taken: 0.03 seconds, Fetched 22 row(s)
> The table stats are missing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44698) Create table like other table should also copy table stats.

2023-08-06 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated SPARK-44698:
---
Description: 
For example:
describe table extended tbl;

col0                    int
col1                    int
col2                    int
col3                    int

Detailed Table Information
Catalog                 spark_catalog
Database                default
Table                   tbl
Owner                   zhuqi
Created Time            Mon Aug 07 14:02:30 CST 2023
Last Access             UNKNOWN
Created By              Spark 4.0.0-SNAPSHOT
Type                    MANAGED
Provider                hive
Table Properties        [transient_lastDdlTime=1691388473]
Statistics              30 bytes
Location                
[file:/Users/zhuqi/spark/spark/spark-warehouse/tbl|file:///Users/zhuqi/spark/spark/spark-warehouse/tbl]
Serde Library           org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat             org.apache.hadoop.mapred.TextInputFormat
OutputFormat            
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Storage Properties      [serialization.format=1]
Partition Provider      Catalog
Time taken: 0.032 seconds, Fetched 23 row(s)

create table tbl2 like tbl;
23/08/07 14:14:07 WARN HiveMetaStore: Location: 
[file:/Users/zhuqi/spark/spark/spark-warehouse/tbl2|file:///Users/zhuqi/spark/spark/spark-warehouse/tbl2]
 specified for non-external table:tbl2
Time taken: 0.098 seconds
spark-sql (default)> describe table extended tbl2;
col0                    int
col1                    int
col2                    int
col3                    int

Detailed Table Information
Catalog                 spark_catalog
Database                default
Table                   tbl2
Owner                   zhuqi
Created Time            Mon Aug 07 14:14:07 CST 2023
Last Access             UNKNOWN
Created By              Spark 4.0.0-SNAPSHOT
Type                    MANAGED
Provider                hive
Table Properties        [transient_lastDdlTime=1691388847]
Location                
[file:/Users/zhuqi/spark/spark/spark-warehouse/tbl2|file:///Users/zhuqi/spark/spark/spark-warehouse/tbl2]
Serde Library           org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat             org.apache.hadoop.mapred.TextInputFormat
OutputFormat            
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Storage Properties      [serialization.format=1]
Partition Provider      Catalog
Time taken: 0.03 seconds, Fetched 22 row(s)

The table stats are missing.

  was:
For example:
describe table extended tbl;

col0                    int
col1                    int
col2                    int
col3                    int

# Detailed Table Information
Catalog                 spark_catalog
Database                default
Table                   tbl
Owner                   zhuqi
Created Time            Mon Aug 07 14:02:30 CST 2023
Last Access             UNKNOWN
Created By              Spark 4.0.0-SNAPSHOT
Type                    MANAGED
Provider                hive
Table Properties        [transient_lastDdlTime=1691388473]
Statistics              30 bytes
Location                file:/Users/zhuqi/spark/spark/spark-warehouse/tbl
Serde Library           org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat             org.apache.hadoop.mapred.TextInputFormat
OutputFormat            
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Storage Properties      [serialization.format=1]
Partition Provider      Catalog
Time taken: 0.032 seconds, Fetched 23 row(s)



create table tbl2 like tbl;
23/08/07 14:14:07 WARN HiveMetaStore: Location: 
file:/Users/zhuqi/spark/spark/spark-warehouse/tbl2 specified for non-external 
table:tbl2
Time taken: 0.098 seconds
spark-sql (default)> describe table extended tbl2;
col0                    int
col1                    int
col2                    int
col3                    int

# Detailed Table Information
Catalog                 spark_catalog
Database                default
Table                   tbl2
Owner                   zhuqi
Created Time            Mon Aug 07 14:14:07 CST 2023
Last Access             UNKNOWN
Created By              Spark 4.0.0-SNAPSHOT
Type                    MANAGED
Provider                hive
Table Properties        [transient_lastDdlTime=1691388847]
Location                file:/Users/zhuqi/spark/spark/spark-warehouse/tbl2
Serde Library           org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat             org.apache.hadoop.mapred.TextInputFormat
OutputFormat            
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Storage Properties      [serialization.format=1]
Partition Provider      Catalog
Time taken: 0.03 seconds, Fetched 22 row(s)

The table stats are missing.


> Create table like other table should also copy table stats.
> ---
>
>  

[jira] [Updated] (SPARK-44698) Create table like other table should also copy table stats.

2023-08-06 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated SPARK-44698:
---
Description: 
For example:
describe table extended tbl;

col0                    int
col1                    int
col2                    int
col3                    int

# Detailed Table Information
Catalog                 spark_catalog
Database                default
Table                   tbl
Owner                   zhuqi
Created Time            Mon Aug 07 14:02:30 CST 2023
Last Access             UNKNOWN
Created By              Spark 4.0.0-SNAPSHOT
Type                    MANAGED
Provider                hive
Table Properties        [transient_lastDdlTime=1691388473]
Statistics              30 bytes
Location                file:/Users/zhuqi/spark/spark/spark-warehouse/tbl
Serde Library           org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat             org.apache.hadoop.mapred.TextInputFormat
OutputFormat            
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Storage Properties      [serialization.format=1]
Partition Provider      Catalog
Time taken: 0.032 seconds, Fetched 23 row(s)



create table tbl2 like tbl;
23/08/07 14:14:07 WARN HiveMetaStore: Location: 
file:/Users/zhuqi/spark/spark/spark-warehouse/tbl2 specified for non-external 
table:tbl2
Time taken: 0.098 seconds
spark-sql (default)> describe table extended tbl2;
col0                    int
col1                    int
col2                    int
col3                    int

# Detailed Table Information
Catalog                 spark_catalog
Database                default
Table                   tbl2
Owner                   zhuqi
Created Time            Mon Aug 07 14:14:07 CST 2023
Last Access             UNKNOWN
Created By              Spark 4.0.0-SNAPSHOT
Type                    MANAGED
Provider                hive
Table Properties        [transient_lastDdlTime=1691388847]
Location                file:/Users/zhuqi/spark/spark/spark-warehouse/tbl2
Serde Library           org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat             org.apache.hadoop.mapred.TextInputFormat
OutputFormat            
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Storage Properties      [serialization.format=1]
Partition Provider      Catalog
Time taken: 0.03 seconds, Fetched 22 row(s)

The table stats are missing.

> Create table like other table should also copy table stats.
> ---
>
> Key: SPARK-44698
> URL: https://issues.apache.org/jira/browse/SPARK-44698
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 4.0.0
>Reporter: Qi Zhu
>Priority: Major
>
> For example:
> describe table extended tbl;
> col0                    int
> col1                    int
> col2                    int
> col3                    int
> # Detailed Table Information
> Catalog                 spark_catalog
> Database                default
> Table                   tbl
> Owner                   zhuqi
> Created Time            Mon Aug 07 14:02:30 CST 2023
> Last Access             UNKNOWN
> Created By              Spark 4.0.0-SNAPSHOT
> Type                    MANAGED
> Provider                hive
> Table Properties        [transient_lastDdlTime=1691388473]
> Statistics              30 bytes
> Location                file:/Users/zhuqi/spark/spark/spark-warehouse/tbl
> Serde Library           org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> InputFormat             org.apache.hadoop.mapred.TextInputFormat
> OutputFormat            
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Storage Properties      [serialization.format=1]
> Partition Provider      Catalog
> Time taken: 0.032 seconds, Fetched 23 row(s)
> create table tbl2 like tbl;
> 23/08/07 14:14:07 WARN HiveMetaStore: Location: 
> file:/Users/zhuqi/spark/spark/spark-warehouse/tbl2 specified for non-external 
> table:tbl2
> Time taken: 0.098 seconds
> spark-sql (default)> describe table extended tbl2;
> col0                    int
> col1                    int
> col2                    int
> col3                    int
> # Detailed Table Information
> Catalog                 spark_catalog
> Database                default
> Table                   tbl2
> Owner                   zhuqi
> Created Time            Mon Aug 07 14:14:07 CST 2023
> Last Access             UNKNOWN
> Created By              Spark 4.0.0-SNAPSHOT
> Type                    MANAGED
> Provider                hive
> Table Properties        [transient_lastDdlTime=1691388847]
> Location                file:/Users/zhuqi/spark/spark/spark-warehouse/tbl2
> Serde Library           org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> InputFormat             org.apache.hadoop.mapred.TextInputFormat
> OutputFormat            
> org.apache.h

[jira] [Created] (SPARK-44698) Create table like other table should also copy table stats.

2023-08-06 Thread Qi Zhu (Jira)
Qi Zhu created SPARK-44698:
--

 Summary: Create table like other table should also copy table 
stats.
 Key: SPARK-44698
 URL: https://issues.apache.org/jira/browse/SPARK-44698
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.1, 4.0.0
Reporter: Qi Zhu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35426) When addMergerLocation exceed the maxRetainedMergerLocations , we should remove the merger based on merged shuffle data size.

2021-08-17 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated SPARK-35426:
---
Description: 
Now When addMergerLocation exceed the maxRetainedMergerLocations , we just 
remove the oldest merger, but we'd better remove the merger based on merged 
shuffle data size. 

We should remove mergers with the largest amount of merged shuffle data, so 
that the remaining mergers have potentially more disk space to store new merged 
shuffle data

  was:
Now When addMergerLocation exceed the maxRetainedMergerLocations , we just 
remove the oldest merger, but we'd better remove the merger based on merged 
shuffle data size. 

 


> When addMergerLocation exceed the maxRetainedMergerLocations , we should 
> remove the merger based on merged shuffle data size.
> -
>
> Key: SPARK-35426
> URL: https://issues.apache.org/jira/browse/SPARK-35426
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Qi Zhu
>Priority: Major
>
> Now When addMergerLocation exceed the maxRetainedMergerLocations , we just 
> remove the oldest merger, but we'd better remove the merger based on merged 
> shuffle data size. 
> We should remove mergers with the largest amount of merged shuffle data, so 
> that the remaining mergers have potentially more disk space to store new 
> merged shuffle data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35426) When addMergerLocation exceed the maxRetainedMergerLocations , we should remove the merger based on merged shuffle data size.

2021-08-17 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated SPARK-35426:
---
Description: 
Now When addMergerLocation exceed the maxRetainedMergerLocations , we just 
remove the oldest merger, but we'd better remove the merger based on merged 
shuffle data size. 

 

  was:
Now When addMergerLocation exceed the maxRetainedMergerLocations , we just 
remove the oldest merger, but we'd better remove the merger based on merged 
shuffle data size. 

The oldest merger may have big merged shuffle data size, it will not be a good 
choice to do so.


> When addMergerLocation exceed the maxRetainedMergerLocations , we should 
> remove the merger based on merged shuffle data size.
> -
>
> Key: SPARK-35426
> URL: https://issues.apache.org/jira/browse/SPARK-35426
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Qi Zhu
>Priority: Major
>
> Now When addMergerLocation exceed the maxRetainedMergerLocations , we just 
> remove the oldest merger, but we'd better remove the merger based on merged 
> shuffle data size. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35548) Handling new attempt has started error message in BlockPushErrorHandler in client

2021-08-16 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400091#comment-17400091
 ] 

Qi Zhu commented on SPARK-35548:


Hi [~mridulm80],

[~Qi Zhu] is not me just the same name, [~zhuqi] is me,  the SPARK-36344 also 
has the wrong Assignee, could you please change to the right one.

Thanks

> Handling new attempt has started error message in BlockPushErrorHandler in 
> client
> -
>
> Key: SPARK-35548
> URL: https://issues.apache.org/jira/browse/SPARK-35548
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Ye Zhou
>Assignee: zhuqi
>Priority: Major
> Fix For: 3.2.0
>
>
> In SPARK-33350, a new type of error message is introduced in 
> BlockPushErrorHandler which indicates the PushblockStream message is received 
> after a new application attempt has started. This error message should be 
> correctly handled in client without retrying the block push.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35548) Handling new attempt has started error message in BlockPushErrorHandler in client

2021-08-16 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400029#comment-17400029
 ] 

Qi Zhu commented on SPARK-35548:


cc [~mridulm80]

My Apache id is:
zhuqi

full name:
Qi Zhu

Thanks.

> Handling new attempt has started error message in BlockPushErrorHandler in 
> client
> -
>
> Key: SPARK-35548
> URL: https://issues.apache.org/jira/browse/SPARK-35548
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Ye Zhou
>Priority: Major
> Fix For: 3.2.0
>
>
> In SPARK-33350, a new type of error message is introduced in 
> BlockPushErrorHandler which indicates the PushblockStream message is received 
> after a new application attempt has started. This error message should be 
> correctly handled in client without retrying the block push.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36344) Fix some typos in ShuffleBlockPusher class.

2021-07-29 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390233#comment-17390233
 ] 

Qi Zhu commented on SPARK-36344:


Hi [~hyukjin.kwon], the assignee zhuqi is not me, just the same name user. :P

> Fix some typos in ShuffleBlockPusher class.
> ---
>
> Key: SPARK-36344
> URL: https://issues.apache.org/jira/browse/SPARK-36344
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.2.0
>Reporter: Qi Zhu
>Assignee: zhuqi
>Priority: Trivial
> Fix For: 3.2.0
>
>
> I find there are some typos in ShuffleBlockPusher class when studying the 
> push based shuffle code , so i just to help fix this monior typos.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36344) Fix some typos in ShuffleBlockPusher class.

2021-07-29 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated SPARK-36344:
---
Parent: SPARK-33235
Issue Type: Sub-task  (was: Task)

> Fix some typos in ShuffleBlockPusher class.
> ---
>
> Key: SPARK-36344
> URL: https://issues.apache.org/jira/browse/SPARK-36344
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.2.0
>Reporter: Qi Zhu
>Priority: Trivial
>
> I find there are some typos in ShuffleBlockPusher class when studying the 
> push based shuffle code , so i just to help fix this monior typos.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36344) Fix some typos in ShuffleBlockPusher class.

2021-07-29 Thread Qi Zhu (Jira)
Qi Zhu created SPARK-36344:
--

 Summary: Fix some typos in ShuffleBlockPusher class.
 Key: SPARK-36344
 URL: https://issues.apache.org/jira/browse/SPARK-36344
 Project: Spark
  Issue Type: Task
  Components: Shuffle, Spark Core
Affects Versions: 3.2.0
Reporter: Qi Zhu


I find there are some typos in ShuffleBlockPusher class when studying the push 
based shuffle code , so i just to help fix this monior typos.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-31314) Revert SPARK-29285 to fix shuffle regression caused by creating temporary file eagerly

2021-07-28 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388519#comment-17388519
 ] 

Qi Zhu edited comment on SPARK-31314 at 7/28/21, 7:29 AM:
--

cc [~XuanYuan] [~cloud_fan] [~Ngone51] 

Since this has been reverted, i meet the disk failure in our production 
clusters, how can we handle the disk failed problem without this.

There are many disks in yarn clusters, but if one disk failure happend, we just 
retry the task, if we can avoid retry to the same failed disk in one node? Or 
if spark has some disk blacklist solution now?

And reverted solution causes that applications with many tasks don't actually 
create shuffle files, it caused overhead, if we can get a workaround solution 
to avoid create when tasks don't need temp shuffle files, i still think we 
should handle this.

The logs are: 
{code:java}
DAGScheduler: ShuffleMapStage 521 (insertInto at Tools.scala:147) failed in 
4.995 s due to Job aborted due to stage failure: Task 30 in stage 521.0 failed 
4 times, most recent failure: Lost task 30.3 in stage 521.0 (TID 127941, 
** 91): java.io.FileNotFoundException: 
/data2/yarn/local/usercache/aa/appcache/*/blockmgr-eb5ca215-a7af-41be-87ee-89fd7e3b1de5/0e/temp_shuffle_45279ef1-5143-4632-9df0-d7ee1f50c026
 (Input/output error)
 at java.io.FileOutputStream.open0(Native Method)
 at java.io.FileOutputStream.open(FileOutputStream.java:270)
 at java.io.FileOutputStream.(FileOutputStream.java:213)
 at 
org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:103)
 at 
org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:116)
 at 
org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:237)
 at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
 at org.apache.spark.scheduler.Task.run(Task.scala:121)
 at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
{code}
Thanks.

 

 


was (Author: zhuqi):
cc [~XuanYuan] [~cloud_fan] [~wuyi]

Since this has been reverted, i meet the disk failure in our production 
clusters, how can we handle the disk failed problem without this.

There are many disks in yarn clusters, but if one disk failure happend, we just 
retry the task, if we can avoid retry to the same failed disk in one node? Or 
if spark has some disk blacklist solution now?

And reverted solution causes that applications with many tasks don't actually 
create shuffle files, it caused overhead, if we can get a workaround solution 
to avoid create when tasks don't need temp shuffle files, i still think we 
should handle this.

The logs are: 
{code:java}
DAGScheduler: ShuffleMapStage 521 (insertInto at Tools.scala:147) failed in 
4.995 s due to Job aborted due to stage failure: Task 30 in stage 521.0 failed 
4 times, most recent failure: Lost task 30.3 in stage 521.0 (TID 127941, 
** 91): java.io.FileNotFoundException: 
/data2/yarn/local/usercache/aa/appcache/*/blockmgr-eb5ca215-a7af-41be-87ee-89fd7e3b1de5/0e/temp_shuffle_45279ef1-5143-4632-9df0-d7ee1f50c026
 (Input/output error)
 at java.io.FileOutputStream.open0(Native Method)
 at java.io.FileOutputStream.open(FileOutputStream.java:270)
 at java.io.FileOutputStream.(FileOutputStream.java:213)
 at 
org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:103)
 at 
org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:116)
 at 
org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:237)
 at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
 at org.apache.spark.scheduler.Task.run(Task.scala:121)
 at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
{cod

[jira] [Comment Edited] (SPARK-31314) Revert SPARK-29285 to fix shuffle regression caused by creating temporary file eagerly

2021-07-28 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388519#comment-17388519
 ] 

Qi Zhu edited comment on SPARK-31314 at 7/28/21, 7:28 AM:
--

cc [~XuanYuan] [~cloud_fan] [~wuyi]

Since this has been reverted, i meet the disk failure in our production 
clusters, how can we handle the disk failed problem without this.

There are many disks in yarn clusters, but if one disk failure happend, we just 
retry the task, if we can avoid retry to the same failed disk in one node? Or 
if spark has some disk blacklist solution now?

And reverted solution causes that applications with many tasks don't actually 
create shuffle files, it caused overhead, if we can get a workaround solution 
to avoid create when tasks don't need temp shuffle files, i still think we 
should handle this.

The logs are: 
{code:java}
DAGScheduler: ShuffleMapStage 521 (insertInto at Tools.scala:147) failed in 
4.995 s due to Job aborted due to stage failure: Task 30 in stage 521.0 failed 
4 times, most recent failure: Lost task 30.3 in stage 521.0 (TID 127941, 
** 91): java.io.FileNotFoundException: 
/data2/yarn/local/usercache/aa/appcache/*/blockmgr-eb5ca215-a7af-41be-87ee-89fd7e3b1de5/0e/temp_shuffle_45279ef1-5143-4632-9df0-d7ee1f50c026
 (Input/output error)
 at java.io.FileOutputStream.open0(Native Method)
 at java.io.FileOutputStream.open(FileOutputStream.java:270)
 at java.io.FileOutputStream.(FileOutputStream.java:213)
 at 
org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:103)
 at 
org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:116)
 at 
org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:237)
 at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
 at org.apache.spark.scheduler.Task.run(Task.scala:121)
 at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
{code}
Thanks.

 

 


was (Author: zhuqi):
cc [~XuanYuan] [~cloud_fan]

Since this has been reverted, i meet the disk failure in our production 
clusters, how can we handle the disk failed problem without this.

There are many disks in yarn clusters, but if one disk failure happend, we just 
retry the task, if we can avoid retry to the same failed disk in one node? Or 
if spark has some disk blacklist solution now?

And reverted solution causes that applications with many tasks don't actually 
create shuffle files, it caused overhead, if we can get a workaround solution 
to avoid create when tasks don't need temp shuffle files, i still think we 
should handle this.

The logs are: 
{code:java}
DAGScheduler: ShuffleMapStage 521 (insertInto at Tools.scala:147) failed in 
4.995 s due to Job aborted due to stage failure: Task 30 in stage 521.0 failed 
4 times, most recent failure: Lost task 30.3 in stage 521.0 (TID 127941, 
** 91): java.io.FileNotFoundException: 
/data2/yarn/local/usercache/aa/appcache/*/blockmgr-eb5ca215-a7af-41be-87ee-89fd7e3b1de5/0e/temp_shuffle_45279ef1-5143-4632-9df0-d7ee1f50c026
 (Input/output error)
 at java.io.FileOutputStream.open0(Native Method)
 at java.io.FileOutputStream.open(FileOutputStream.java:270)
 at java.io.FileOutputStream.(FileOutputStream.java:213)
 at 
org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:103)
 at 
org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:116)
 at 
org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:237)
 at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
 at org.apache.spark.scheduler.Task.run(Task.scala:121)
 at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
{code}
Thanks.


[jira] [Commented] (SPARK-31314) Revert SPARK-29285 to fix shuffle regression caused by creating temporary file eagerly

2021-07-28 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388519#comment-17388519
 ] 

Qi Zhu commented on SPARK-31314:


cc [~XuanYuan] [~cloud_fan]

Since this has been reverted, i meet the disk failure in our production 
clusters, how can we handle the disk failed problem without this.

There are many disks in yarn clusters, but if one disk failure happend, we just 
retry the task, if we can avoid retry to the same failed disk in one node? Or 
if spark has some disk blacklist solution now?

And reverted solution causes that applications with many tasks don't actually 
create shuffle files, it caused overhead, if we can get a workaround solution 
to avoid create when tasks don't need temp shuffle files, i still think we 
should handle this.

The logs are: 
{code:java}
DAGScheduler: ShuffleMapStage 521 (insertInto at Tools.scala:147) failed in 
4.995 s due to Job aborted due to stage failure: Task 30 in stage 521.0 failed 
4 times, most recent failure: Lost task 30.3 in stage 521.0 (TID 127941, 
** 91): java.io.FileNotFoundException: 
/data2/yarn/local/usercache/aa/appcache/*/blockmgr-eb5ca215-a7af-41be-87ee-89fd7e3b1de5/0e/temp_shuffle_45279ef1-5143-4632-9df0-d7ee1f50c026
 (Input/output error)
 at java.io.FileOutputStream.open0(Native Method)
 at java.io.FileOutputStream.open(FileOutputStream.java:270)
 at java.io.FileOutputStream.(FileOutputStream.java:213)
 at 
org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:103)
 at 
org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:116)
 at 
org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:237)
 at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
 at org.apache.spark.scheduler.Task.run(Task.scala:121)
 at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
{code}
Thanks.

 

 

> Revert SPARK-29285 to fix shuffle regression caused by creating temporary 
> file eagerly
> --
>
> Key: SPARK-31314
> URL: https://issues.apache.org/jira/browse/SPARK-31314
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Yuanjian Li
>Assignee: Yuanjian Li
>Priority: Major
> Fix For: 3.0.0
>
>
> In SPARK-29285, we change to create shuffle temporary eagerly. This is 
> helpful for not to fail the entire task in the scenario of occasional disk 
> failure.
> But for the applications that many tasks don't actually create shuffle files, 
> it caused overhead. See the below benchmark:
> Env: Spark local-cluster[2, 4, 19968], each queries run 5 round, each round 5 
> times.
> Data: TPC-DS scale=99 generate by spark-tpcds-datagen
> Results:
> || ||Base||Revert||
> |Q20|Vector(4.096865667, 2.76231748, 2.722007606, 2.514433591, 2.400373579) 
> Median 2.722007606|Vector(3.763185446, 2.586498463, 2.593472842, 2.320522846, 
> 2.224627274) Median 2.586498463|
> |Q33|Vector(5.872176321, 4.854397586, 4.568787136, 4.393378146, 4.423996818) 
> Median 4.568787136|Vector(5.38746785, 4.361236877, 4.082311276, 3.867206824, 
> 3.783188024) Median 4.082311276|
> |Q52|Vector(3.978870321, 3.225437871, 3.282411608, 2.869674887, 2.644490664) 
> Median 3.225437871|Vector(4.000381522, 3.196025108, 3.248787619, 2.767444508, 
> 2.606163423) Median 3.196025108|
> |Q56|Vector(6.238045133, 4.820535173, 4.609965579, 4.313509894, 4.221256227) 
> Median 4.609965579|Vector(6.241611339, 4.225592467, 4.195202502, 3.757085755, 
> 3.657525982) Median 4.195202502|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35426) When addMergerLocation exceed the maxRetainedMergerLocations , we should remove the merger based on merged shuffle data size.

2021-06-14 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363318#comment-17363318
 ] 

Qi Zhu commented on SPARK-35426:


Thanks [~mshen] for clarify, i will check the corresponding code. 

> When addMergerLocation exceed the maxRetainedMergerLocations , we should 
> remove the merger based on merged shuffle data size.
> -
>
> Key: SPARK-35426
> URL: https://issues.apache.org/jira/browse/SPARK-35426
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Qi Zhu
>Priority: Major
>
> Now When addMergerLocation exceed the maxRetainedMergerLocations , we just 
> remove the oldest merger, but we'd better remove the merger based on merged 
> shuffle data size. 
> The oldest merger may have big merged shuffle data size, it will not be a 
> good choice to do so.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-35426) When addMergerLocation exceed the maxRetainedMergerLocations , we should remove the merger based on merged shuffle data size.

2021-06-08 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359805#comment-17359805
 ] 

Qi Zhu edited comment on SPARK-35426 at 6/9/21, 6:39 AM:
-

Thanks [~mshen] for reply.

I mean we can remove the mergers which has less shuffle data, when schedule 
reduce task to fetch the shuffle data, we can use the local data better with 
more shuffle data.

 

 


was (Author: zhuqi):
Thanks [~mshen] for reply.

I mean we can remove the mergers which has less shuffle data, when schedule 
reduce task to fetch the shuffle data, we can use the local data better.

 

 

> When addMergerLocation exceed the maxRetainedMergerLocations , we should 
> remove the merger based on merged shuffle data size.
> -
>
> Key: SPARK-35426
> URL: https://issues.apache.org/jira/browse/SPARK-35426
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Qi Zhu
>Priority: Major
>
> Now When addMergerLocation exceed the maxRetainedMergerLocations , we just 
> remove the oldest merger, but we'd better remove the merger based on merged 
> shuffle data size. 
> The oldest merger may have big merged shuffle data size, it will not be a 
> good choice to do so.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35426) When addMergerLocation exceed the maxRetainedMergerLocations , we should remove the merger based on merged shuffle data size.

2021-06-08 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359805#comment-17359805
 ] 

Qi Zhu commented on SPARK-35426:


Thanks [~mshen] for reply.

I mean we can remove the mergers which has less shuffle data, when schedule 
reduce task to fetch the shuffle data, we can use the local data better.

 

 

> When addMergerLocation exceed the maxRetainedMergerLocations , we should 
> remove the merger based on merged shuffle data size.
> -
>
> Key: SPARK-35426
> URL: https://issues.apache.org/jira/browse/SPARK-35426
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Qi Zhu
>Priority: Major
>
> Now When addMergerLocation exceed the maxRetainedMergerLocations , we just 
> remove the oldest merger, but we'd better remove the merger based on merged 
> shuffle data size. 
> The oldest merger may have big merged shuffle data size, it will not be a 
> good choice to do so.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35426) When addMergerLocation exceed the maxRetainedMergerLocations , we should remove the merger based on merged shuffle data size.

2021-05-17 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated SPARK-35426:
---
Description: 
Now When addMergerLocation exceed the maxRetainedMergerLocations , we just 
remove the oldest merger, but we'd better remove the merger based on merged 
shuffle data size. 

The oldest merger may have big merged shuffle data size, it will not be a good 
choice to do so.

  was:Now 


> When addMergerLocation exceed the maxRetainedMergerLocations , we should 
> remove the merger based on merged shuffle data size.
> -
>
> Key: SPARK-35426
> URL: https://issues.apache.org/jira/browse/SPARK-35426
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Qi Zhu
>Priority: Major
>
> Now When addMergerLocation exceed the maxRetainedMergerLocations , we just 
> remove the oldest merger, but we'd better remove the merger based on merged 
> shuffle data size. 
> The oldest merger may have big merged shuffle data size, it will not be a 
> good choice to do so.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35426) When addMergerLocation exceed the maxRetainedMergerLocations , we should remove the merger based on merged shuffle data size.

2021-05-17 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated SPARK-35426:
---
Description: Now 

> When addMergerLocation exceed the maxRetainedMergerLocations , we should 
> remove the merger based on merged shuffle data size.
> -
>
> Key: SPARK-35426
> URL: https://issues.apache.org/jira/browse/SPARK-35426
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Qi Zhu
>Priority: Major
>
> Now 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35426) When addMergerLocation exceed the maxRetainedMergerLocations , we should remove the merger based on merged shuffle data size.

2021-05-17 Thread Qi Zhu (Jira)
Qi Zhu created SPARK-35426:
--

 Summary: When addMergerLocation exceed the 
maxRetainedMergerLocations , we should remove the merger based on merged 
shuffle data size.
 Key: SPARK-35426
 URL: https://issues.apache.org/jira/browse/SPARK-35426
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: Qi Zhu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org