Created https://issues.apache.org/jira/browse/SAMZA-748
Fang, Yan yanfang...@gmail.com On Thu, Jul 30, 2015 at 7:17 PM, Yi Pan <nickpa...@gmail.com> wrote: > +1 on the fix in 0.10.0. It should be an easy one. > > On Thu, Jul 30, 2015 at 7:08 PM, Yan Fang <yanfang...@gmail.com> wrote: > > > Hi Thommy, > > > > {quote} > > Because I don't see how this is ever going to work in scenarios where the > > AM is on a different node than the containers. > > {quote} > > > > -- I do not quite understand this part. AM essentially is running in a > > container as well. And the http server is brought up in the same > container. > > > > {quote} > > even if we can't get a better address for the AM from YARN, we could at > > least filter the addresses we get back from the JVM to exclude loopbacks. > > {quote} > > > > -- You are right. InetAddress.getLocalHost() gives back loopback address > > sometimes. We should filter this out. Just googling one possible solution > > <http://www.coderanch.com/t/491883/java/java/IP> . > > > > + @Yi, @Navina, > > > > Also, I think this fix should go to the 0.10.0 release. > > > > What do you guys think? > > > > Thanks, > > > > Fang, Yan > > yanfang...@gmail.com > > > > On Thu, Jul 30, 2015 at 6:39 PM, Yan Fang <yanfang...@gmail.com> wrote: > > > > > Just one point to add: > > > > > > {quote} > > > AM gets notified of container status from the RM. > > > {quote} > > > > > > I think this is not 100% correct. AM can communicate with NM through > > > NMClientAsync > > > < > > > https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html > > > > to > > > get container status, though Samza does not implement the > > CallbackHandler. > > > > > > Thanks, > > > > > > Fang, Yan > > > yanfang...@gmail.com > > > > > > On Thu, Jul 30, 2015 at 6:06 PM, Navina Ramesh < > > > nram...@linkedin.com.invalid> wrote: > > > > > >> The NM (and hence, by extension the container) heartbeats to the RM, > not > > >> the AM. AM gets notified of container status from the RM. > > >> The AM starts / stops /releases a container process by communicating > to > > >> the > > >> NM. > > >> > > >> Navina > > >> > > >> > > >> On Thu, Jul 30, 2015 at 5:55 PM, Thomas Becker <tobec...@tivo.com> > > wrote: > > >> > > >> > Ok, I thought there was some communication from the container to the > > AM, > > >> > it sounds like you're saying it's in the other direction only? > Don't > > >> > containers heartbeat to the AM? Regardless, even if we can't get a > > >> better > > >> > address for the AM from YARN, we could at least filter the addresses > > we > > >> get > > >> > back from the JVM to exclude loopbacks. > > >> > > > >> > -Tommy > > >> > ________________________________________ > > >> > From: Navina Ramesh [nram...@linkedin.com.INVALID] > > >> > Sent: Thursday, July 30, 2015 8:40 PM > > >> > To: dev@samza.apache.org > > >> > Subject: Re: Coordinator URL always 127.0.0.1 > > >> > > > >> > Hi Tommy, > > >> > Yi is right. Container start is coordinated by the AppMaster using > an > > >> > NMClient. Container host name and port is provided by the RM during > > >> > allocation. > > >> > In Yarn (at least, afaik), when the node joins a cluster, the NM > > >> registers > > >> > itself with the RM. So, the NM might still be using > > >> > getLocalhost.getAddress(). > > >> > > > >> > I don't know of any other way to programmatically fetch the > machine's > > >> > hostname (apart from some hacky shell commands). > > >> > > > >> > Cheers, > > >> > Navina > > >> > > > >> > On Thu, Jul 30, 2015 at 5:23 PM, Yi Pan <nickpa...@gmail.com> > wrote: > > >> > > > >> > > Hi, Tommy, > > >> > > > > >> > > Yeah, I agree that the current implementation is not bullet-proof > to > > >> any > > >> > > different networking configuration on the host. As for the AM <-> > > >> > container > > >> > > communication, if I am not mistaken, it is through the NMClient > and > > >> the > > >> > > node HTTP address is wrapped within the Container object returned > > from > > >> > RM. > > >> > > I am not very familiar with that part of source code. Navina may > be > > >> able > > >> > to > > >> > > help more here. > > >> > > > > >> > > -Yi > > >> > > > > >> > > On Thu, Jul 30, 2015 at 4:27 PM, Thomas Becker <tobec...@tivo.com > > > > >> > wrote: > > >> > > > > >> > > > Hi Yi, > > >> > > > Thanks a lot for your reply. I don't doubt we can get it to > work > > by > > >> > > > mucking with the networking configuration, but to me this feels > > >> like a > > >> > > > workaround, not a solution. > > >> > InetAddress.getLocalHost().getHostAddress() > > >> > > is > > >> > > > not a reliable way of obtaining an IP that other machines can > > >> connect > > >> > to. > > >> > > > Just today I tested on several Linux distros and it did not work > > on > > >> any > > >> > > of > > >> > > > them. Can we do something more robust here? How does the > > container > > >> > > > communicate status to the AM? > > >> > > > > > >> > > > -Tommy > > >> > > > > > >> > > > ________________________________________ > > >> > > > From: Yi Pan [nickpa...@gmail.com] > > >> > > > Sent: Thursday, July 30, 2015 6:48 PM > > >> > > > To: dev@samza.apache.org > > >> > > > Subject: Re: Coordinator URL always 127.0.0.1 > > >> > > > > > >> > > > Hi, Tommy, > > >> > > > > > >> > > > I think that it might be a commonly asked question regarding to > > >> > multiple > > >> > > > IPs on a single host. A common trick w/o changing code is > (copied > > >> from > > >> > > SO: > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > http://stackoverflow.com/questions/2381316/java-inetaddress-getlocalhost-returns-127-0-0-1-how-to-get-real-ip > > >> > > > ) > > >> > > > > > >> > > > {code} > > >> > > > > > >> > > > 1. > > >> > > > > > >> > > > Find your host name. Type: hostname. For example, you find > your > > >> > > hostname > > >> > > > is mycomputer.xzy.com > > >> > > > 2. > > >> > > > > > >> > > > Put your host name in your hosts file. /etc/hosts . Such as > > >> > > > > > >> > > > 10.50.16.136 mycomputer.xzy.com > > >> > > > > > >> > > > > > >> > > > {code} > > >> > > > > > >> > > > -Yi > > >> > > > > > >> > > > On Thu, Jul 30, 2015 at 11:35 AM, Tommy Becker < > tobec...@tivo.com > > > > > >> > > wrote: > > >> > > > > > >> > > > > We are testing some jobs on a YARN grid and noticed they are > > often > > >> > not > > >> > > > > starting up properly due to being unable to connect to the job > > >> > > > coordinator. > > >> > > > > After some investigation it seems as if the jobs are always > > >> getting a > > >> > > > > coordinator URL of http://127.0.0.1:<port> But my > > understanding > > >> is > > >> > > that > > >> > > > > the coordinator runs only in the AM, so I'd expect these URLs > to > > >> more > > >> > > > often > > >> > > > > than not be to some other machine. Looking at the code > however, > > >> I'm > > >> > > not > > >> > > > > sure how that would ever happen since the URL for the > > coordinator > > >> > > always > > >> > > > > comes from InetAddress.getLocalHost().getHostAddress() in > > >> > > > > org.apache.samza.coordinator.server.HttpServer#getUrl > > >> > > > > > > >> > > > > Am I off base here? Because I don't see how this is ever > going > > to > > >> > work > > >> > > > in > > >> > > > > scenarios where the AM is on a different node than the > > containers. > > >> > > > > > > >> > > > > -- > > >> > > > > Tommy Becker > > >> > > > > Senior Software Engineer > > >> > > > > > > >> > > > > Digitalsmiths > > >> > > > > A TiVo Company > > >> > > > > > > >> > > > > www.digitalsmiths.com<http://www.digitalsmiths.com> > > >> > > > > tobec...@tivo.com<mailto:tobec...@tivo.com> > > >> > > > > > > >> > > > > ________________________________ > > >> > > > > > > >> > > > > This email and any attachments may contain confidential and > > >> > privileged > > >> > > > > material for the sole use of the intended recipient. Any > review, > > >> > > copying, > > >> > > > > or distribution of this email (or any attachments) by others > is > > >> > > > prohibited. > > >> > > > > If you are not the intended recipient, please contact the > sender > > >> > > > > immediately and permanently delete this email and any > > >> attachments. No > > >> > > > > employee or agent of TiVo Inc. is authorized to conclude any > > >> binding > > >> > > > > agreement on behalf of TiVo Inc. by email. Binding agreements > > with > > >> > TiVo > > >> > > > > Inc. may only be made by a signed written agreement. > > >> > > > > > > >> > > > > > >> > > > ________________________________ > > >> > > > > > >> > > > This email and any attachments may contain confidential and > > >> privileged > > >> > > > material for the sole use of the intended recipient. Any review, > > >> > copying, > > >> > > > or distribution of this email (or any attachments) by others is > > >> > > prohibited. > > >> > > > If you are not the intended recipient, please contact the sender > > >> > > > immediately and permanently delete this email and any > attachments. > > >> No > > >> > > > employee or agent of TiVo Inc. is authorized to conclude any > > binding > > >> > > > agreement on behalf of TiVo Inc. by email. Binding agreements > with > > >> TiVo > > >> > > > Inc. may only be made by a signed written agreement. > > >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > -- > > >> > Navina R. > > >> > > > >> > ________________________________ > > >> > > > >> > This email and any attachments may contain confidential and > privileged > > >> > material for the sole use of the intended recipient. Any review, > > >> copying, > > >> > or distribution of this email (or any attachments) by others is > > >> prohibited. > > >> > If you are not the intended recipient, please contact the sender > > >> > immediately and permanently delete this email and any attachments. > No > > >> > employee or agent of TiVo Inc. is authorized to conclude any binding > > >> > agreement on behalf of TiVo Inc. by email. Binding agreements with > > TiVo > > >> > Inc. may only be made by a signed written agreement. > > >> > > > >> > > >> > > >> > > >> -- > > >> Navina R. > > >> > > > > > > > > >