[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-07-01 Thread YanTangZhai
GitHub user YanTangZhai opened a pull request:

https://github.com/apache/spark/pull/1281

[SPARK-2325] Utils.getLocalDir had better check the directory and choose a 
good one instead of choosing the first one directly

If the first directory of spark.local.dir is bad, application will exit 
with the exception:
Exception in thread "main" java.io.IOException: Failed to create a temp 
directory (under /data1/sparkenv/local) after 10 attempts!
at org.apache.spark.util.Utils$.createTempDir(Utils.scala:258)
at 
org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:154)
at 
org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127)
at 
org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31)
at 
org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48)
at 
org.apache.spark.broadcast.BroadcastManager.(BroadcastManager.scala:35)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218)
at org.apache.spark.SparkContext.(SparkContext.scala:202)
at JobTaskJoin$.main(JobTaskJoin.scala:9)
at JobTaskJoin.main(JobTaskJoin.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Utils.getLocalDir had better check the directory and choose a good one 
instead of choosing the first one directly. For example, spark.local.dir is 
/data1/sparkenv/local,/data2/sparkenv/local. The disk data1 is bad while the 
disk data2 is good, we could choose the data2 not data1.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/YanTangZhai/spark SPARK-2325

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1281.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1281


commit 08424ce408b5e1ee679d15e46ea5b08979511fae
Author: yantangzhai 
Date:   2014-07-02T06:55:39Z

[SPARK-2325] Utils.getLocalDir had better check the directory and choose a 
good one instead of choosing the first one directly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-07-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1281#issuecomment-47743236
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-07-05 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/1281#issuecomment-48102838
  
Hi @YanTangZhai, with the merge of 
https://github.com/apache/spark/pull/1274 is this change still needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-07-13 Thread YanTangZhai
Github user YanTangZhai commented on the pull request:

https://github.com/apache/spark/pull/1281#issuecomment-48840373
  
Hi @ash211, I think this change is needed. Since the method 
Utils.getLocalDir is used by some function such as HttpBroadcast, which is 
different from DiskBlockManager. The two problems are different. Even though 
#1274 has been merged, the problem is still exist. Please review again. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-07-13 Thread YanTangZhai
Github user YanTangZhai commented on the pull request:

https://github.com/apache/spark/pull/1281#issuecomment-48840378
  
Hi @ash211, I think this change is needed. Since the method 
Utils.getLocalDir is used by some function such as HttpBroadcast, which is 
different from DiskBlockManager. The two problems are different. Even though 
#1274 has been merged, the problem is still exist. Please review again. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-07-13 Thread YanTangZhai
Github user YanTangZhai commented on the pull request:

https://github.com/apache/spark/pull/1281#issuecomment-48840401
  
Hi @ash211, I think this change is needed. Since the method 
Utils.getLocalDir is used by some function such as HttpBroadcast, which is 
different from DiskBlockManager. The two problems are different. Even though 
#1274 has been merged, the problem is still exist. Please review again. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-07-26 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1281#issuecomment-50250447
  
When did this come up? I'm actually not sure this is a good behavior, 
because doing this means that a user might completely miss a misconfigured 
directory. With the current behavior, you immediately get an error and can fix 
your configuration. I was wondering if you had a scenario where it was just too 
difficult to configure this correctly on each machine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-07-26 Thread advancedxy
Github user advancedxy commented on the pull request:

https://github.com/apache/spark/pull/1281#issuecomment-50255139
  
Hi @mateiz, I think ignoring bad dir is needed in production cluster.
In production, there is a good chance for disk failures. I always love the 
idea that we could replace the bad disks without service downtime. I hope this 
can be implemented in spark cluster.
To replace disks without service downtime, it require:
1. the service is tolerant with bad dirs, which this pr did.
2. make sure the dir is read-only or remove all permissions anybody have 
(chmod 000 /dir assume it's a unix-like os), so the service doesn't pick the 
wrong dir.
3. replace the bad disk (modern machine supports hot plugging). mount it. 
bring the permissions back.
4. service auto detect the new good dir(disk), or provide a reload api so 
that we can notify it.

I didn't dig the code, so I don't know where `spark.local.dir` are used. 
But, if it's for storage, it's better to 
choose different dirs(disks) to spread the disk IO.

Ok, let's go back to this behavior. @mateiz, when running spark service, 
one of the configured dir(disks) fails, I simple prefer ignoring the bad dir 
rather than bring down the entire service.
What hadoop's datanode and tasktracker do is simply ignoring some bad dirs 
with a maximum num limit.

what about a misconfigure? If a misconfigured directory is usable, we 
cannot do anything, it's uses' mistake. if the directory is bad, ignoring it 
isn't that bad.

@YanTangZhai, I believe we should log the bad dir, so user can know there 
is a bad dir. And what do you think the idea of replace bad disks?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-07-27 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1281#issuecomment-50289215
  
I see, that makes sense, but in that case we need to do a couple more 
things to make this complete:
1) We should have a max limit of broken dirs we tolerate, after which we'd 
throw an error.
2) spark.local.dir is used to specify a list of directories that the 
DiskStore puts data in. You need to modify the DiskStore to allow skipping some 
of them, or else there will still be problems.

If you don't have time to look through the rest of the code to do this, 
then please just add your discussion above to the JIRA and other people will 
get to it later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-07-27 Thread advancedxy
Github user advancedxy commented on the pull request:

https://github.com/apache/spark/pull/1281#issuecomment-50295974
  
HI @mateiz , I'd love to make my contribution for spark. However, I believe 
it's more than one pr work. There must be a lot of details to be considered. I 
will make my time and try to implement it. Anyway, I will file a JIRA first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-07-29 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1281#issuecomment-50539649
  
Sure, please start by adding a JIRA with a proposed design for this. Then 
people will be able to comment on that before you even have to start 
implementing stuff.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-08-16 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/1281#issuecomment-52409726
  
I'd like to revisit this in light of 
[SPARK-2974](https://issues.apache.org/jira/browse/SPARK-2974); now that #1274 
has been merged, the directory returned from `Utils.getLocalDir()` might not 
exist, leading to confusing errors when workers fetch files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-11-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1281#issuecomment-61584693
  
  [Test build #22852 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22852/consoleFull)
 for   PR 1281 at commit 
[`08424ce`](https://github.com/apache/spark/commit/08424ce408b5e1ee679d15e46ea5b08979511fae).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-11-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1281#issuecomment-61584969
  
  [Test build #22852 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22852/consoleFull)
 for   PR 1281 at commit 
[`08424ce`](https://github.com/apache/spark/commit/08424ce408b5e1ee679d15e46ea5b08979511fae).
 * This patch **fails some tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1281#issuecomment-61584971
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22852/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-12-15 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/1281#issuecomment-67103247
  
It looks like the [JIRA referenced from this 
PR](https://issues.apache.org/jira/browse/SPARK-2325) was resolved as a 
duplicate of an issue which was fixed in #2002.  Therefore, do you mind closing 
this PR?  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-12-15 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/1281#issuecomment-67103624
  
(I think 'close this issue' is the magic that the script needs)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2325] Utils.getLocalDir had better chec...

2014-12-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1281


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org