[jira] [Updated] (SPARK-26513) Trigger GC on executor node idle

2019-01-02 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-26513:

Fix Version/s: (was: 3.0.0)

> Trigger GC on executor node idle
> 
>
> Key: SPARK-26513
> URL: https://issues.apache.org/jira/browse/SPARK-26513
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Sandish Kumar HN
>Priority: Major
>
>  
> Correct me if I'm wrong.
>  *Stage:*
>       On a large cluster, each stage would have some executors. were a few 
> executors would finish a couple of tasks first and wait for whole stage or 
> remaining tasks to finish which are executed by different executors nodes in 
> a cluster. a stage will only be completed when all tasks in a current stage 
> finish its execution. and the next stage execution has to wait till all tasks 
> of the current stage are completed. 
>  
> why don't we trigger GC, when the executor node is waiting for remaining 
> tasks to finish, or executor Idle? anyways executor has to wait for the 
> remaining tasks to finish which can at least take a couple of seconds. why 
> don't we trigger GC? which will max take <300ms
>  
> I have proposed a small code snippet which triggers GC when running tasks are 
> empty and heap usage in current executor node is more than the given 
> threshold.
> This could improve performance for long-running spark job's. 
> we referred this paper 
> [https://www.computer.org/csdl/proceedings/hipc/2016/5411/00/07839705.pdf] 
> and we found performance improvements in our long-running spark batch job's.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26513) Trigger GC on executor node idle

2018-12-31 Thread Sandish Kumar HN (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandish Kumar HN updated SPARK-26513:
-
Description: 
After going through this paper 
[https://www.computer.org/csdl/proceedings/hipc/2016/5411/00/07839705.pdf] we 
found performance improvments spark speedup with 
Correct me if I'm wrong.
 *Stage:*
      On a large cluster, each stage would have some executors. were a few 
executors would finish a couple of tasks first and wait for whole stage or 
remaining tasks to finish which are executed by different executors nodes in a 
cluster. a stage will only be completed when all tasks in a current stage 
finish its execution. and the next stage execution has to wait till all tasks 
of the current stage are completed. 
 
why don't we trigger GC, when the executor node is waiting for remaining tasks 
to finish, or executor Idle? anyways executor has to wait for the remaining 
tasks to finish which can at least take a couple of seconds. why don't we 
trigger GC? which will max take <300ms
 
I have proposed a small code snippet which triggers GC when running tasks are 
empty and heap usage in current executor node is more than the given threshold.
This could improve performance for long-running spark job's. 

  was:
Correct me if I'm wrong.
*Stage:*
      On a large cluster, each stage would have some executors. were a few 
executors would finish a couple of tasks first and wait for whole stage or 
remaining tasks to finish which are executed by different executors nodes in a 
cluster. a stage will only be completed when all tasks in a current stage 
finish its execution. and the next stage execution has to wait till all tasks 
of the current stage are completed. 
 
why don't we trigger GC, when the executor node is waiting for remaining tasks 
to finish, or executor Idle? anyways executor has to wait for the remaining 
tasks to finish which can at least take a couple of seconds. why don't we 
trigger GC? which will max take <300ms
 
I have proposed a small code snippet which triggers GC when running tasks are 
empty and heap usage in current executor node is more than the given threshold.
This could improve performance for long-running spark job's. 


> Trigger GC on executor node idle
> 
>
> Key: SPARK-26513
> URL: https://issues.apache.org/jira/browse/SPARK-26513
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Sandish Kumar HN
>Priority: Major
> Fix For: 3.0.0
>
>
> After going through this paper 
> [https://www.computer.org/csdl/proceedings/hipc/2016/5411/00/07839705.pdf] we 
> found performance improvments spark speedup with 
> Correct me if I'm wrong.
>  *Stage:*
>       On a large cluster, each stage would have some executors. were a few 
> executors would finish a couple of tasks first and wait for whole stage or 
> remaining tasks to finish which are executed by different executors nodes in 
> a cluster. a stage will only be completed when all tasks in a current stage 
> finish its execution. and the next stage execution has to wait till all tasks 
> of the current stage are completed. 
>  
> why don't we trigger GC, when the executor node is waiting for remaining 
> tasks to finish, or executor Idle? anyways executor has to wait for the 
> remaining tasks to finish which can at least take a couple of seconds. why 
> don't we trigger GC? which will max take <300ms
>  
> I have proposed a small code snippet which triggers GC when running tasks are 
> empty and heap usage in current executor node is more than the given 
> threshold.
> This could improve performance for long-running spark job's. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26513) Trigger GC on executor node idle

2018-12-31 Thread Sandish Kumar HN (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandish Kumar HN updated SPARK-26513:
-
Description: 
 

Correct me if I'm wrong.
 *Stage:*
      On a large cluster, each stage would have some executors. were a few 
executors would finish a couple of tasks first and wait for whole stage or 
remaining tasks to finish which are executed by different executors nodes in a 
cluster. a stage will only be completed when all tasks in a current stage 
finish its execution. and the next stage execution has to wait till all tasks 
of the current stage are completed. 
 
why don't we trigger GC, when the executor node is waiting for remaining tasks 
to finish, or executor Idle? anyways executor has to wait for the remaining 
tasks to finish which can at least take a couple of seconds. why don't we 
trigger GC? which will max take <300ms
 
I have proposed a small code snippet which triggers GC when running tasks are 
empty and heap usage in current executor node is more than the given threshold.
This could improve performance for long-running spark job's. 

we refered this paper 
[https://www.computer.org/csdl/proceedings/hipc/2016/5411/00/07839705.pdf] we 
found performance improvements in our long running spark batch job's .

  was:
After going through this paper 
[https://www.computer.org/csdl/proceedings/hipc/2016/5411/00/07839705.pdf] we 
found performance improvements in our long running spark batch job's 

Correct me if I'm wrong.
 *Stage:*
      On a large cluster, each stage would have some executors. were a few 
executors would finish a couple of tasks first and wait for whole stage or 
remaining tasks to finish which are executed by different executors nodes in a 
cluster. a stage will only be completed when all tasks in a current stage 
finish its execution. and the next stage execution has to wait till all tasks 
of the current stage are completed. 
 
why don't we trigger GC, when the executor node is waiting for remaining tasks 
to finish, or executor Idle? anyways executor has to wait for the remaining 
tasks to finish which can at least take a couple of seconds. why don't we 
trigger GC? which will max take <300ms
 
I have proposed a small code snippet which triggers GC when running tasks are 
empty and heap usage in current executor node is more than the given threshold.
This could improve performance for long-running spark job's. 


> Trigger GC on executor node idle
> 
>
> Key: SPARK-26513
> URL: https://issues.apache.org/jira/browse/SPARK-26513
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Sandish Kumar HN
>Priority: Major
> Fix For: 3.0.0
>
>
>  
> Correct me if I'm wrong.
>  *Stage:*
>       On a large cluster, each stage would have some executors. were a few 
> executors would finish a couple of tasks first and wait for whole stage or 
> remaining tasks to finish which are executed by different executors nodes in 
> a cluster. a stage will only be completed when all tasks in a current stage 
> finish its execution. and the next stage execution has to wait till all tasks 
> of the current stage are completed. 
>  
> why don't we trigger GC, when the executor node is waiting for remaining 
> tasks to finish, or executor Idle? anyways executor has to wait for the 
> remaining tasks to finish which can at least take a couple of seconds. why 
> don't we trigger GC? which will max take <300ms
>  
> I have proposed a small code snippet which triggers GC when running tasks are 
> empty and heap usage in current executor node is more than the given 
> threshold.
> This could improve performance for long-running spark job's. 
> we refered this paper 
> [https://www.computer.org/csdl/proceedings/hipc/2016/5411/00/07839705.pdf] we 
> found performance improvements in our long running spark batch job's .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26513) Trigger GC on executor node idle

2018-12-31 Thread Sandish Kumar HN (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandish Kumar HN updated SPARK-26513:
-
Description: 
 

Correct me if I'm wrong.
 *Stage:*
      On a large cluster, each stage would have some executors. were a few 
executors would finish a couple of tasks first and wait for whole stage or 
remaining tasks to finish which are executed by different executors nodes in a 
cluster. a stage will only be completed when all tasks in a current stage 
finish its execution. and the next stage execution has to wait till all tasks 
of the current stage are completed. 
 
why don't we trigger GC, when the executor node is waiting for remaining tasks 
to finish, or executor Idle? anyways executor has to wait for the remaining 
tasks to finish which can at least take a couple of seconds. why don't we 
trigger GC? which will max take <300ms
 
I have proposed a small code snippet which triggers GC when running tasks are 
empty and heap usage in current executor node is more than the given threshold.
This could improve performance for long-running spark job's. 

we referred this paper 
[https://www.computer.org/csdl/proceedings/hipc/2016/5411/00/07839705.pdf] and 
we found performance improvements in our long-running spark batch job's.

  was:
 

Correct me if I'm wrong.
 *Stage:*
      On a large cluster, each stage would have some executors. were a few 
executors would finish a couple of tasks first and wait for whole stage or 
remaining tasks to finish which are executed by different executors nodes in a 
cluster. a stage will only be completed when all tasks in a current stage 
finish its execution. and the next stage execution has to wait till all tasks 
of the current stage are completed. 
 
why don't we trigger GC, when the executor node is waiting for remaining tasks 
to finish, or executor Idle? anyways executor has to wait for the remaining 
tasks to finish which can at least take a couple of seconds. why don't we 
trigger GC? which will max take <300ms
 
I have proposed a small code snippet which triggers GC when running tasks are 
empty and heap usage in current executor node is more than the given threshold.
This could improve performance for long-running spark job's. 

we refered this paper 
[https://www.computer.org/csdl/proceedings/hipc/2016/5411/00/07839705.pdf] we 
found performance improvements in our long running spark batch job's .


> Trigger GC on executor node idle
> 
>
> Key: SPARK-26513
> URL: https://issues.apache.org/jira/browse/SPARK-26513
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Sandish Kumar HN
>Priority: Major
> Fix For: 3.0.0
>
>
>  
> Correct me if I'm wrong.
>  *Stage:*
>       On a large cluster, each stage would have some executors. were a few 
> executors would finish a couple of tasks first and wait for whole stage or 
> remaining tasks to finish which are executed by different executors nodes in 
> a cluster. a stage will only be completed when all tasks in a current stage 
> finish its execution. and the next stage execution has to wait till all tasks 
> of the current stage are completed. 
>  
> why don't we trigger GC, when the executor node is waiting for remaining 
> tasks to finish, or executor Idle? anyways executor has to wait for the 
> remaining tasks to finish which can at least take a couple of seconds. why 
> don't we trigger GC? which will max take <300ms
>  
> I have proposed a small code snippet which triggers GC when running tasks are 
> empty and heap usage in current executor node is more than the given 
> threshold.
> This could improve performance for long-running spark job's. 
> we referred this paper 
> [https://www.computer.org/csdl/proceedings/hipc/2016/5411/00/07839705.pdf] 
> and we found performance improvements in our long-running spark batch job's.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26513) Trigger GC on executor node idle

2018-12-31 Thread Sandish Kumar HN (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandish Kumar HN updated SPARK-26513:
-
Description: 
After going through this paper 
[https://www.computer.org/csdl/proceedings/hipc/2016/5411/00/07839705.pdf] we 
found performance improvements in our long running spark batch job's 

Correct me if I'm wrong.
 *Stage:*
      On a large cluster, each stage would have some executors. were a few 
executors would finish a couple of tasks first and wait for whole stage or 
remaining tasks to finish which are executed by different executors nodes in a 
cluster. a stage will only be completed when all tasks in a current stage 
finish its execution. and the next stage execution has to wait till all tasks 
of the current stage are completed. 
 
why don't we trigger GC, when the executor node is waiting for remaining tasks 
to finish, or executor Idle? anyways executor has to wait for the remaining 
tasks to finish which can at least take a couple of seconds. why don't we 
trigger GC? which will max take <300ms
 
I have proposed a small code snippet which triggers GC when running tasks are 
empty and heap usage in current executor node is more than the given threshold.
This could improve performance for long-running spark job's. 

  was:
After going through this paper 
[https://www.computer.org/csdl/proceedings/hipc/2016/5411/00/07839705.pdf] we 
found performance improvments spark speedup with 
Correct me if I'm wrong.
 *Stage:*
      On a large cluster, each stage would have some executors. were a few 
executors would finish a couple of tasks first and wait for whole stage or 
remaining tasks to finish which are executed by different executors nodes in a 
cluster. a stage will only be completed when all tasks in a current stage 
finish its execution. and the next stage execution has to wait till all tasks 
of the current stage are completed. 
 
why don't we trigger GC, when the executor node is waiting for remaining tasks 
to finish, or executor Idle? anyways executor has to wait for the remaining 
tasks to finish which can at least take a couple of seconds. why don't we 
trigger GC? which will max take <300ms
 
I have proposed a small code snippet which triggers GC when running tasks are 
empty and heap usage in current executor node is more than the given threshold.
This could improve performance for long-running spark job's. 


> Trigger GC on executor node idle
> 
>
> Key: SPARK-26513
> URL: https://issues.apache.org/jira/browse/SPARK-26513
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Sandish Kumar HN
>Priority: Major
> Fix For: 3.0.0
>
>
> After going through this paper 
> [https://www.computer.org/csdl/proceedings/hipc/2016/5411/00/07839705.pdf] we 
> found performance improvements in our long running spark batch job's 
> Correct me if I'm wrong.
>  *Stage:*
>       On a large cluster, each stage would have some executors. were a few 
> executors would finish a couple of tasks first and wait for whole stage or 
> remaining tasks to finish which are executed by different executors nodes in 
> a cluster. a stage will only be completed when all tasks in a current stage 
> finish its execution. and the next stage execution has to wait till all tasks 
> of the current stage are completed. 
>  
> why don't we trigger GC, when the executor node is waiting for remaining 
> tasks to finish, or executor Idle? anyways executor has to wait for the 
> remaining tasks to finish which can at least take a couple of seconds. why 
> don't we trigger GC? which will max take <300ms
>  
> I have proposed a small code snippet which triggers GC when running tasks are 
> empty and heap usage in current executor node is more than the given 
> threshold.
> This could improve performance for long-running spark job's. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org