[GitHub] spark issue #17238: getRackForHost returns None if host is unknown by driver

2017-03-16 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/17238
  
> Actually, to play devil's advocate, the problem @morenn520 is describing 
is a little more involved. You have a driver running, which has its own view of 
what the cluster topology is, and then the cluster topology changes underneath 
it. Deploying a new configuration on the new nodes being added does not fix the 
driver, unless your "topology discovery script" is fully dynamic and always 
goes to a central location to figure out what's the current topology.

Maybe I'm misunderstanding what you are saying, but the only way the AM 
gets a bad topology is if its wrong in the first place.   Or are you just 
saying app starts and host is in one rack, host gets moved to another rack and 
brought back up? I guess that is possible, but  I'm not really sure that  
applies to this case here anyway with default_rack. Any existing executors 
would have gone away on it when the host was moved so yarn should re-resolve 
when it gets a new container anyway.  If your script isn't dynamic to handle 
that then its also a configuration issue to update all the other hosts and you 
should do that before bringing the host up.  Again unless you aren't using HDFS 
the rack resolve affects more then just spark on yarn here.  Its going to 
affect HDFS block placements and other types of apps.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17238: getRackForHost returns None if host is unknown by driver

2017-03-16 Thread morenn520
Github user morenn520 commented on the issue:

https://github.com/apache/spark/pull/17238
  
Thanks @tgravescs , we has used a dynamic topology discovery script to 
avoid this problem. Since tez and mr don‘t do this, maybe spark could not fix 
this problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17238: getRackForHost returns None if host is unknown by driver

2017-03-15 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/17238
  
> If you aren't adding in machines to rack and configuring yarn properly 
before adding it to your cluster that is a process issue you should fix on your 
end

Actually, to play devil's advocate, the problem @morenn520 is describing is 
a little more involved. You have a driver running, which has its own view of 
what the cluster topology is, and then the cluster topology changes underneath 
it. Deploying a new configuration on the new nodes being added does not fix the 
driver, unless your "topology discovery script" is fully dynamic and always 
goes to a central location to figure out what's the current topology.

So, basically, even if the new nodes know about the updated topology, the 
existing driver instance might not, and there's no easy way to fix that I can 
see.

(That being said, I haven't looked at the changes here.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17238: getRackForHost returns None if host is unknown by driver

2017-03-15 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/17238
  
Ok checked tez and mr and they don't do this. 
Actually in a couple of the input formats it actually adds DEFAULT_RACK if 
there wasn't any topology information so you would end up assuming everything 
is rack local in those cases. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17238: getRackForHost returns None if host is unknown by driver

2017-03-15 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/17238
  
If you aren't adding in machines to rack and configuring yarn properly 
before adding it to your cluster that is a process issue you should fix on your 
end.I would assume a unracking/racking a node means putting in a new node?  
If that is the case you have to install hadoop and hadoop configuration on that 
node.  I would expect you to fix the configuration or have a generic rack aware 
script/java class that would be able to just figure it out, but that is going 
to be pretty configuration specific.   I would also assume if you have that 
configuration wrong then your HDFS is also not being optimal as it could get 
the replication wrong.

You can specify your own class/script to do the rack resolution so you 
could change that to handle this case:  see 
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/RackAwareness.html

I'll try to check on tez/mr today and get back to you (was to busy 
yesterday). I know tez didn't have any explicit references to DEFAULT_RACK in 
the code but want to look a bit more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17238: getRackForHost returns None if host is unknown by driver

2017-03-15 Thread morenn520
Github user morenn520 commented on the issue:

https://github.com/apache/spark/pull/17238
  
@squito Actually it's a misconfiguration in our yarn. There are thousands 
of nodes in our production yarn. It will be quite frequent to rack/unrack 
nodes, so as to make something missing easily. It would be better if spark does 
something special to avoid our misconfiguration.
In addition, I think unknown host should be recognized as different racks. 
But if MR/tez may do nothing special to this problem, it is reasonable for this 
problem not to be fix.
I hope @tgravescs would double check whether tez/MR do anything special to 
DEFAULT_RACK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17238: getRackForHost returns None if host is unknown by driver

2017-03-14 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/17238
  
I don't think you're missing anything @tgravescs , it sounds like this is 
just a misconfiguration and we shouldn't be doing anything special for it 
(since it could hurt correct configurations).  I wanted to see if this was a 
particularly common / easy misconfiguration in yarn, but you both have 
convinced me its not.

my point about SPARK-18886 was to think through how bad the effect from the 
misconfiguration is.  In the originally described scenario, if a few nodes are 
accidentally labelled as belonging to their own rack, then sometimes task will 
get preferentially assigned to this false-rack at first.  If tasks are long, 
its not that big a deal -- after a short delay, you'll then assign the tasks to 
the rest of the cluster.  But if tasks are short, b/c of SPARK-18886, you may 
just keep assigning to the nodes in your false-rack, and leave the rest of the 
cluster idle.

Nonetheless, I think its still just a misconfiguration.  sounds like this 
issue is a "won't fix", unless @morenn520 makes the case otherwise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17238: getRackForHost returns None if host is unknown by driver

2017-03-14 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/17238
  
Sorry if I'm missing something here but I don't see why this is a problem?  
 If you have YARN misconfigured or not configured everything is going to 
default to DEFAULT_RACK.  If you want them to be on different racks then fix 
the yarn configuration.

The topology mapping in hadoop is a plugin so someone could write a plugin 
that just returned DEFAULT_RACK for anything, if you don't want that behavior 
you can write a plugin that does something different.

@morenn520 I'm assuming in this case the tasks had some host locality level 
before looking at the rack level?

I don't believe tez/MR do anything special with respect to the DEFAULT_RACK 
but I'll double check


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17238: getRackForHost returns None if host is unknown by driver

2017-03-13 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/17238
  
@squito I looked at yarn bits about 3-4 years back, so I am sure you and 
@tgravescs know better :-)
But yes, you did summarize what I was attempting to get to.

I am not sure what the impact of SPARK-18886 here might be ...
Another way to look at this change is, what does MR/Tez do in this scenario 
w.r.t locality computation ? My understanding is that they do not fudge the 
value - but if I am wrong, we could do the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17238: getRackForHost returns None if host is unknown by driver

2017-03-13 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/17238
  
@mridulm you know the yarn bits involved better than I do.  It sounds like 
you are saying this is the wrong change, and instead its just a 
misconfiguration in yarn -- I'll buy that argument.  Essentially you are saying 
that "DEFAULT_RACK" may be used legitimately for one of the racks.

Even with this miconfiguration, the incorrect locality assignments don't 
seem horrible, except that for very short tasks its exacerbated by 
https://issues.apache.org/jira/browse/SPARK-18886


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17238: getRackForHost returns None if host is unknown by driver

2017-03-10 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/17238
  
This is something to be configured appropriately at the lower level (namely 
topology config), and should be handled there Special casing this in spark is 
not right.
Two scenarios : 
- Default rack for cpu nodes, and a special rack for gpu nodes 
(specifically for rdma comm) - will cause rack locality to be disabled for the 
cpu nodes (since they are on the default rack).
- Specializations of NetworkTopology need not honor DEFAULT_RACK - 
resulting in limited value in those scenarios (IIRC the api does not require 
unknown rack to be specified as DEFAULT_RACK) - is that incorrect assessment 
@squito ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17238: getRackForHost returns None if host is unknown by driver

2017-03-10 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/17238
  
The description of the problem makes sense for how it would effect task 
locality.  I will need to look more closely at some yarn bits to make sure its 
the right change, but looks reasonable.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17238: getRackForHost returns None if host is unknown by driver

2017-03-10 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/17238
  
Please change the title per http://spark.apache.org/contributing.html
@squito or @vanzin do you know enough to evaluate this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17238: getRackForHost returns None if host is unknown by driver

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17238
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org