[GitHub] spark pull request: Fixed the number of worker thread
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1485#issuecomment-50545882 Hey there - as Aaron said, the executors should never have more than N tasks active if there are N cores. I think there might be a bug causing this. So I'd recommend we close this issue and open a JIRA to figure out what is going on. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fixed the number of worker thread
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1485 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fixed the number of worker thread
Github user fireflyc commented on the pull request: https://github.com/apache/spark/pull/1485#issuecomment-49501533 My program is spark streaming over Hadoop yarn.It work for user click stream. I read code,number of worker threads and block? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fixed the number of worker thread
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/1485#issuecomment-49526386 @fireflyc Spark should not be scheduling more than N concurrent tasks on an Executor. It appears that the tasks may be returning success but then don't actually return the thread to the thread pool. This is itself a bug -- could you run jstack on your Executor process to see where the threads are stuck? Perhaps new tasks are just starting before the old threads finish cleaning up, and thus this solution is the right one, but I'd like to find out exactly why. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fixed the number of worker thread
GitHub user fireflyc opened a pull request: https://github.com/apache/spark/pull/1485 Fixed the number of worker thread There are a lot of input Block cause too many Worker threads and will load all data.So it should be to control the number of Worker threads You can merge this pull request into a Git repository by running: $ git pull https://github.com/fireflyc/spark fixed-executor-thread Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1485.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1485 commit 1facd581b3e1e37cc896a7db8d3bb8e9ab088686 Author: fireflyc firef...@126.com Date: 2014-07-18T15:19:46Z Fixed the number of worker thread There are a lot of input Block cause too many Worker threads and will load all data.So it should be to control the number of Worker threads --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fixed the number of worker thread
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1485#issuecomment-49443851 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fixed the number of worker thread
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1485#issuecomment-49444796 Slightly bigger point: both the 'fixed' and 'cached' executors from `Executors` have some drawbacks: - 'fixed' always keeps the given number of threads active even if they're not doing anything - 'cached' may create an unlimited number of threads It's perfectly possible to create a `ThreadPoolExecutor` with core size 0 and a fixed maximum size. I wonder if that isn't the best choice here, and actually, in other usages I see throughout Spark? Because a similar issue comes up in about 10 places. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fixed the number of worker thread
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/1485#issuecomment-49494194 The tasks launched on an Executor are controlled by the DAGScheduler, and should not exceed the number of cores that executor is advertising. In what situation have you seen this happening? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Fixed the number of worker thread
Github user fireflyc commented on the pull request: https://github.com/apache/spark/pull/1485#issuecomment-49495043 My application have 1000+ Worker Threads. ![0e75b115d7a1b2dba97284cf6443b6f0](https://cloud.githubusercontent.com/assets/183107/3633383/d939413c-0edf-11e4-91d0-5ab99df71b59.jpeg) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---