[GitHub] spark pull request: Fixed the number of worker thread

2014-07-29 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1485#issuecomment-50545882
  
Hey there - as Aaron said, the executors should never have more than N 
tasks active if there are N cores. I think there might be a bug causing this. 
So I'd recommend we close this issue and open a JIRA to figure out what is 
going on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fixed the number of worker thread

2014-07-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1485


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fixed the number of worker thread

2014-07-19 Thread fireflyc
Github user fireflyc commented on the pull request:

https://github.com/apache/spark/pull/1485#issuecomment-49501533
  
My program is spark streaming over Hadoop yarn.It work for user click 
stream.
I read code,number of worker threads and block?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fixed the number of worker thread

2014-07-19 Thread aarondav
Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/1485#issuecomment-49526386
  
@fireflyc Spark should not be scheduling more than N concurrent tasks on an 
Executor. It appears that the tasks may be returning success but then don't 
actually return the thread to the thread pool. 

This is itself a bug -- could you run jstack on your Executor process to 
see where the threads are stuck?

Perhaps new tasks are just starting before the old threads finish cleaning 
up, and thus this solution is the right one, but I'd like to find out exactly 
why.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fixed the number of worker thread

2014-07-18 Thread fireflyc
GitHub user fireflyc opened a pull request:

https://github.com/apache/spark/pull/1485

Fixed the number of worker thread

There are a lot of input Block cause too many Worker threads and will
load all data.So it should be to control the number of Worker threads

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fireflyc/spark fixed-executor-thread

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1485.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1485


commit 1facd581b3e1e37cc896a7db8d3bb8e9ab088686
Author: fireflyc firef...@126.com
Date:   2014-07-18T15:19:46Z

Fixed the number of worker thread

There are a lot of input Block cause too many Worker threads and will
load all data.So it should be to control the number of Worker threads




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fixed the number of worker thread

2014-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1485#issuecomment-49443851
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fixed the number of worker thread

2014-07-18 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1485#issuecomment-49444796
  
Slightly bigger point: both the 'fixed' and 'cached' executors from 
`Executors` have some drawbacks:

- 'fixed' always keeps the given number of threads active even if they're 
not doing anything
- 'cached' may create an unlimited number of threads

It's perfectly possible to create a `ThreadPoolExecutor` with core size 0 
and a fixed maximum size. I wonder if that isn't the best choice here, and 
actually, in other usages I see throughout Spark? Because a similar issue comes 
up in about 10 places.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fixed the number of worker thread

2014-07-18 Thread aarondav
Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/1485#issuecomment-49494194
  
The tasks launched on an Executor are controlled by the DAGScheduler, and 
should not exceed the number of cores that executor is advertising. In what 
situation have you seen this happening?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fixed the number of worker thread

2014-07-18 Thread fireflyc
Github user fireflyc commented on the pull request:

https://github.com/apache/spark/pull/1485#issuecomment-49495043
  
My application have 1000+ Worker Threads.

![0e75b115d7a1b2dba97284cf6443b6f0](https://cloud.githubusercontent.com/assets/183107/3633383/d939413c-0edf-11e4-91d0-5ab99df71b59.jpeg)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---