Hey Yan,
-- I do not quite understand this part. AM essentially is running in a
container as well. And the http server is brought up in the same container
Sorry, the term "container" is overloaded. In this context by container I
meant the SamzaContainer. What we are seeing is that jobs only start when YARN happens
to place the AM and SamzaContainer(s) on the same node. Which is increasingly unlikely
as you increase container count for your job and/or expand your YARN grid.
-Tommy
On 07/30/2015 10:08 PM, Yan Fang wrote:
Hi Thommy,
{quote}
Because I don't see how this is ever going to work in scenarios where the
AM is on a different node than the containers.
{quote}
-- I do not quite understand this part. AM essentially is running in a
container as well. And the http server is brought up in the same container.
{quote}
even if we can't get a better address for the AM from YARN, we could at
least filter the addresses we get back from the JVM to exclude loopbacks.
{quote}
-- You are right. InetAddress.getLocalHost() gives back loopback address
sometimes. We should filter this out. Just googling one possible solution
<http://www.coderanch.com/t/491883/java/java/IP><http://www.coderanch.com/t/491883/java/java/IP>
.
+ @Yi, @Navina,
Also, I think this fix should go to the 0.10.0 release.
What do you guys think?
Thanks,
Fang, Yan
[email protected]<mailto:[email protected]>
On Thu, Jul 30, 2015 at 6:39 PM, Yan Fang
<[email protected]><mailto:[email protected]> wrote:
Just one point to add:
{quote}
AM gets notified of container status from the RM.
{quote}
I think this is not 100% correct. AM can communicate with NM through
NMClientAsync
<https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html><https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html>
to
get container status, though Samza does not implement the CallbackHandler.
Thanks,
Fang, Yan
[email protected]<mailto:[email protected]>
On Thu, Jul 30, 2015 at 6:06 PM, Navina Ramesh <
[email protected]<mailto:[email protected]>> wrote:
The NM (and hence, by extension the container) heartbeats to the RM, not
the AM. AM gets notified of container status from the RM.
The AM starts / stops /releases a container process by communicating to
the
NM.
Navina
On Thu, Jul 30, 2015 at 5:55 PM, Thomas Becker
<[email protected]><mailto:[email protected]> wrote:
Ok, I thought there was some communication from the container to the AM,
it sounds like you're saying it's in the other direction only? Don't
containers heartbeat to the AM? Regardless, even if we can't get a
better
address for the AM from YARN, we could at least filter the addresses we
get
back from the JVM to exclude loopbacks.
-Tommy
________________________________________
From: Navina Ramesh
[[email protected]<mailto:[email protected]>]
Sent: Thursday, July 30, 2015 8:40 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Coordinator URL always 127.0.0.1
Hi Tommy,
Yi is right. Container start is coordinated by the AppMaster using an
NMClient. Container host name and port is provided by the RM during
allocation.
In Yarn (at least, afaik), when the node joins a cluster, the NM
registers
itself with the RM. So, the NM might still be using
getLocalhost.getAddress().
I don't know of any other way to programmatically fetch the machine's
hostname (apart from some hacky shell commands).
Cheers,
Navina
On Thu, Jul 30, 2015 at 5:23 PM, Yi Pan
<[email protected]><mailto:[email protected]> wrote:
Hi, Tommy,
Yeah, I agree that the current implementation is not bullet-proof to
any
different networking configuration on the host. As for the AM <->
container
communication, if I am not mistaken, it is through the NMClient and
the
node HTTP address is wrapped within the Container object returned from
RM.
I am not very familiar with that part of source code. Navina may be
able
to
help more here.
-Yi
On Thu, Jul 30, 2015 at 4:27 PM, Thomas Becker
<[email protected]><mailto:[email protected]>
wrote:
Hi Yi,
Thanks a lot for your reply. I don't doubt we can get it to work by
mucking with the networking configuration, but to me this feels
like a
workaround, not a solution.
InetAddress.getLocalHost().getHostAddress()
is
not a reliable way of obtaining an IP that other machines can
connect
to.
Just today I tested on several Linux distros and it did not work on
any
of
them. Can we do something more robust here? How does the container
communicate status to the AM?
-Tommy
________________________________________
From: Yi Pan [[email protected]<mailto:[email protected]>]
Sent: Thursday, July 30, 2015 6:48 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Coordinator URL always 127.0.0.1
Hi, Tommy,
I think that it might be a commonly asked question regarding to
multiple
IPs on a single host. A common trick w/o changing code is (copied
from
SO:
http://stackoverflow.com/questions/2381316/java-inetaddress-getlocalhost-returns-127-0-0-1-how-to-get-real-ip
)
{code}
1.
Find your host name. Type: hostname. For example, you find your
hostname
is mycomputer.xzy.com
2.
Put your host name in your hosts file. /etc/hosts . Such as
10.50.16.136 mycomputer.xzy.com
{code}
-Yi
On Thu, Jul 30, 2015 at 11:35 AM, Tommy Becker
<[email protected]><mailto:[email protected]>
wrote:
We are testing some jobs on a YARN grid and noticed they are often
not
starting up properly due to being unable to connect to the job
coordinator.
After some investigation it seems as if the jobs are always
getting a
coordinator URL of http://127.0.0.1:<port> But my understanding
is
that
the coordinator runs only in the AM, so I'd expect these URLs to
more
often
than not be to some other machine. Looking at the code however,
I'm
not
sure how that would ever happen since the URL for the coordinator
always
comes from InetAddress.getLocalHost().getHostAddress() in
org.apache.samza.coordinator.server.HttpServer#getUrl
Am I off base here? Because I don't see how this is ever going to
work
in
scenarios where the AM is on a different node than the containers.
--
Tommy Becker
Senior Software Engineer
Digitalsmiths
A TiVo Company
www.digitalsmiths.com<http://www.digitalsmiths.com><http://www.digitalsmiths.com><http://www.digitalsmiths.com>
[email protected]<mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>
________________________________
This email and any attachments may contain confidential and
privileged
material for the sole use of the intended recipient. Any review,
copying,
or distribution of this email (or any attachments) by others is
prohibited.
If you are not the intended recipient, please contact the sender
immediately and permanently delete this email and any
attachments. No
employee or agent of TiVo Inc. is authorized to conclude any
binding
agreement on behalf of TiVo Inc. by email. Binding agreements with
TiVo
Inc. may only be made by a signed written agreement.
________________________________
This email and any attachments may contain confidential and
privileged
material for the sole use of the intended recipient. Any review,
copying,
or distribution of this email (or any attachments) by others is
prohibited.
If you are not the intended recipient, please contact the sender
immediately and permanently delete this email and any attachments.
No
employee or agent of TiVo Inc. is authorized to conclude any binding
agreement on behalf of TiVo Inc. by email. Binding agreements with
TiVo
Inc. may only be made by a signed written agreement.
--
Navina R.
________________________________
This email and any attachments may contain confidential and privileged
material for the sole use of the intended recipient. Any review,
copying,
or distribution of this email (or any attachments) by others is
prohibited.
If you are not the intended recipient, please contact the sender
immediately and permanently delete this email and any attachments. No
employee or agent of TiVo Inc. is authorized to conclude any binding
agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
Inc. may only be made by a signed written agreement.
--
Navina R.
--
Tommy Becker
Senior Software Engineer
Digitalsmiths
A TiVo Company
www.digitalsmiths.com<http://www.digitalsmiths.com>
[email protected]<mailto:[email protected]>
________________________________
This email and any attachments may contain confidential and privileged material
for the sole use of the intended recipient. Any review, copying, or
distribution of this email (or any attachments) by others is prohibited. If you
are not the intended recipient, please contact the sender immediately and
permanently delete this email and any attachments. No employee or agent of TiVo
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by
email. Binding agreements with TiVo Inc. may only be made by a signed written
agreement.