[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...
GitHub user YanTangZhai opened a pull request: https://github.com/apache/spark/pull/1281 [SPARK-2325] Utils.getLocalDir had better check the directory and choose a good one instead of choosing the first one directly If the first directory of spark.local.dir is bad, application will exit with the exception: Exception in thread "main" java.io.IOException: Failed to create a temp directory (under /data1/sparkenv/local) after 10 attempts! at org.apache.spark.util.Utils$.createTempDir(Utils.scala:258) at org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:154) at org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127) at org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31) at org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48) at org.apache.spark.broadcast.BroadcastManager.(BroadcastManager.scala:35) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218) at org.apache.spark.SparkContext.(SparkContext.scala:202) at JobTaskJoin$.main(JobTaskJoin.scala:9) at JobTaskJoin.main(JobTaskJoin.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Utils.getLocalDir had better check the directory and choose a good one instead of choosing the first one directly. For example, spark.local.dir is /data1/sparkenv/local,/data2/sparkenv/local. The disk data1 is bad while the disk data2 is good, we could choose the data2 not data1. You can merge this pull request into a Git repository by running: $ git pull https://github.com/YanTangZhai/spark SPARK-2325 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1281.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1281 commit 08424ce408b5e1ee679d15e46ea5b08979511fae Author: yantangzhai Date: 2014-07-02T06:55:39Z [SPARK-2325] Utils.getLocalDir had better check the directory and choose a good one instead of choosing the first one directly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-47743236 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...
Github user ash211 commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-48102838 Hi @YanTangZhai, with the merge of https://github.com/apache/spark/pull/1274 is this change still needed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...
Github user YanTangZhai commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-48840373 Hi @ash211, I think this change is needed. Since the method Utils.getLocalDir is used by some function such as HttpBroadcast, which is different from DiskBlockManager. The two problems are different. Even though #1274 has been merged, the problem is still exist. Please review again. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...
Github user YanTangZhai commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-48840378 Hi @ash211, I think this change is needed. Since the method Utils.getLocalDir is used by some function such as HttpBroadcast, which is different from DiskBlockManager. The two problems are different. Even though #1274 has been merged, the problem is still exist. Please review again. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...
Github user YanTangZhai commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-48840401 Hi @ash211, I think this change is needed. Since the method Utils.getLocalDir is used by some function such as HttpBroadcast, which is different from DiskBlockManager. The two problems are different. Even though #1274 has been merged, the problem is still exist. Please review again. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-50250447 When did this come up? I'm actually not sure this is a good behavior, because doing this means that a user might completely miss a misconfigured directory. With the current behavior, you immediately get an error and can fix your configuration. I was wondering if you had a scenario where it was just too difficult to configure this correctly on each machine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-50255139 Hi @mateiz, I think ignoring bad dir is needed in production cluster. In production, there is a good chance for disk failures. I always love the idea that we could replace the bad disks without service downtime. I hope this can be implemented in spark cluster. To replace disks without service downtime, it require: 1. the service is tolerant with bad dirs, which this pr did. 2. make sure the dir is read-only or remove all permissions anybody have (chmod 000 /dir assume it's a unix-like os), so the service doesn't pick the wrong dir. 3. replace the bad disk (modern machine supports hot plugging). mount it. bring the permissions back. 4. service auto detect the new good dir(disk), or provide a reload api so that we can notify it. I didn't dig the code, so I don't know where `spark.local.dir` are used. But, if it's for storage, it's better to choose different dirs(disks) to spread the disk IO. Ok, let's go back to this behavior. @mateiz, when running spark service, one of the configured dir(disks) fails, I simple prefer ignoring the bad dir rather than bring down the entire service. What hadoop's datanode and tasktracker do is simply ignoring some bad dirs with a maximum num limit. what about a misconfigure? If a misconfigured directory is usable, we cannot do anything, it's uses' mistake. if the directory is bad, ignoring it isn't that bad. @YanTangZhai, I believe we should log the bad dir, so user can know there is a bad dir. And what do you think the idea of replace bad disks? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-50289215 I see, that makes sense, but in that case we need to do a couple more things to make this complete: 1) We should have a max limit of broken dirs we tolerate, after which we'd throw an error. 2) spark.local.dir is used to specify a list of directories that the DiskStore puts data in. You need to modify the DiskStore to allow skipping some of them, or else there will still be problems. If you don't have time to look through the rest of the code to do this, then please just add your discussion above to the JIRA and other people will get to it later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...
Github user advancedxy commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-50295974 HI @mateiz , I'd love to make my contribution for spark. However, I believe it's more than one pr work. There must be a lot of details to be considered. I will make my time and try to implement it. Anyway, I will file a JIRA first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-50539649 Sure, please start by adding a JIRA with a proposed design for this. Then people will be able to comment on that before you even have to start implementing stuff. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-52409726 I'd like to revisit this in light of [SPARK-2974](https://issues.apache.org/jira/browse/SPARK-2974); now that #1274 has been merged, the directory returned from `Utils.getLocalDir()` might not exist, leading to confusing errors when workers fetch files. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-61584693 [Test build #22852 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22852/consoleFull) for PR 1281 at commit [`08424ce`](https://github.com/apache/spark/commit/08424ce408b5e1ee679d15e46ea5b08979511fae). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-61584969 [Test build #22852 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22852/consoleFull) for PR 1281 at commit [`08424ce`](https://github.com/apache/spark/commit/08424ce408b5e1ee679d15e46ea5b08979511fae). * This patch **fails some tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-61584971 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22852/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-67103247 It looks like the [JIRA referenced from this PR](https://issues.apache.org/jira/browse/SPARK-2325) was resolved as a duplicate of an issue which was fixed in #2002. Therefore, do you mind closing this PR? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1281#issuecomment-67103624 (I think 'close this issue' is the magic that the script needs) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1281 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org