Created https://issues.apache.org/jira/browse/SAMZA-748

Fang, Yan
yanfang...@gmail.com

On Thu, Jul 30, 2015 at 7:17 PM, Yi Pan <nickpa...@gmail.com> wrote:

> +1 on the fix in 0.10.0. It should be an easy one.
>
> On Thu, Jul 30, 2015 at 7:08 PM, Yan Fang <yanfang...@gmail.com> wrote:
>
> > Hi Thommy,
> >
> > {quote}
> > Because I don't see how this is ever going to work in scenarios where the
> > AM is on a different node than the containers.
> > {quote}
> >
> > -- I do not quite understand this part. AM essentially is running in a
> > container as well. And the http server is brought up in the same
> container.
> >
> > {quote}
> > even if we can't get a better address for the AM from YARN, we could at
> > least filter the addresses we get back from the JVM to exclude loopbacks.
> > {quote}
> >
> > -- You are right. InetAddress.getLocalHost() gives back loopback address
> > sometimes. We should filter this out. Just googling one possible solution
> > <http://www.coderanch.com/t/491883/java/java/IP> .
> >
> > + @Yi, @Navina,
> >
> > Also, I think this fix should go to the 0.10.0 release.
> >
> > What do you guys think?
> >
> > Thanks,
> >
> > Fang, Yan
> > yanfang...@gmail.com
> >
> > On Thu, Jul 30, 2015 at 6:39 PM, Yan Fang <yanfang...@gmail.com> wrote:
> >
> > > Just one point to add:
> > >
> > > {quote}
> > > AM gets notified of container status from the RM.
> > > {quote}
> > >
> > > I think this is not 100% correct. AM can communicate with NM through
> > > NMClientAsync
> > > <
> >
> https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html
> >
> > to
> > > get container status, though Samza does not implement the
> > CallbackHandler.
> > >
> > > Thanks,
> > >
> > > Fang, Yan
> > > yanfang...@gmail.com
> > >
> > > On Thu, Jul 30, 2015 at 6:06 PM, Navina Ramesh <
> > > nram...@linkedin.com.invalid> wrote:
> > >
> > >> The NM (and hence, by extension the container) heartbeats to the RM,
> not
> > >> the AM. AM gets notified of container status from the RM.
> > >> The AM starts / stops /releases a container process by communicating
> to
> > >> the
> > >> NM.
> > >>
> > >> Navina
> > >>
> > >>
> > >> On Thu, Jul 30, 2015 at 5:55 PM, Thomas Becker <tobec...@tivo.com>
> > wrote:
> > >>
> > >> > Ok, I thought there was some communication from the container to the
> > AM,
> > >> > it sounds like you're saying it's in the other direction only?
> Don't
> > >> > containers heartbeat to the AM?  Regardless, even if we can't get a
> > >> better
> > >> > address for the AM from YARN, we could at least filter the addresses
> > we
> > >> get
> > >> > back from the JVM to exclude loopbacks.
> > >> >
> > >> > -Tommy
> > >> > ________________________________________
> > >> > From: Navina Ramesh [nram...@linkedin.com.INVALID]
> > >> > Sent: Thursday, July 30, 2015 8:40 PM
> > >> > To: dev@samza.apache.org
> > >> > Subject: Re: Coordinator URL always 127.0.0.1
> > >> >
> > >> > Hi Tommy,
> > >> > Yi is right. Container start is coordinated by the AppMaster using
> an
> > >> > NMClient. Container host name and port is provided by the RM during
> > >> > allocation.
> > >> > In Yarn (at least, afaik), when the node joins a cluster, the NM
> > >> registers
> > >> > itself with the RM. So, the NM might still be using
> > >> > getLocalhost.getAddress().
> > >> >
> > >> > I don't know of any other way to programmatically fetch the
> machine's
> > >> > hostname (apart from some hacky shell commands).
> > >> >
> > >> > Cheers,
> > >> > Navina
> > >> >
> > >> > On Thu, Jul 30, 2015 at 5:23 PM, Yi Pan <nickpa...@gmail.com>
> wrote:
> > >> >
> > >> > > Hi, Tommy,
> > >> > >
> > >> > > Yeah, I agree that the current implementation is not bullet-proof
> to
> > >> any
> > >> > > different networking configuration on the host. As for the AM <->
> > >> > container
> > >> > > communication, if I am not mistaken, it is through the NMClient
> and
> > >> the
> > >> > > node HTTP address is wrapped within the Container object returned
> > from
> > >> > RM.
> > >> > > I am not very familiar with that part of source code. Navina may
> be
> > >> able
> > >> > to
> > >> > > help more here.
> > >> > >
> > >> > > -Yi
> > >> > >
> > >> > > On Thu, Jul 30, 2015 at 4:27 PM, Thomas Becker <tobec...@tivo.com
> >
> > >> > wrote:
> > >> > >
> > >> > > > Hi Yi,
> > >> > > > Thanks a lot for your reply.  I don't doubt we can get it to
> work
> > by
> > >> > > > mucking with the networking configuration, but to me this feels
> > >> like a
> > >> > > > workaround, not a solution.
> > >> > InetAddress.getLocalHost().getHostAddress()
> > >> > > is
> > >> > > > not a reliable way of obtaining an IP that other machines can
> > >> connect
> > >> > to.
> > >> > > > Just today I tested on several Linux distros and it did not work
> > on
> > >> any
> > >> > > of
> > >> > > > them.  Can we do something more robust here?  How does the
> > container
> > >> > > > communicate status to the AM?
> > >> > > >
> > >> > > > -Tommy
> > >> > > >
> > >> > > > ________________________________________
> > >> > > > From: Yi Pan [nickpa...@gmail.com]
> > >> > > > Sent: Thursday, July 30, 2015 6:48 PM
> > >> > > > To: dev@samza.apache.org
> > >> > > > Subject: Re: Coordinator URL always 127.0.0.1
> > >> > > >
> > >> > > > Hi, Tommy,
> > >> > > >
> > >> > > > I think that it might be a commonly asked question regarding to
> > >> > multiple
> > >> > > > IPs on a single host. A common trick w/o changing code is
> (copied
> > >> from
> > >> > > SO:
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> http://stackoverflow.com/questions/2381316/java-inetaddress-getlocalhost-returns-127-0-0-1-how-to-get-real-ip
> > >> > > > )
> > >> > > >
> > >> > > > {code}
> > >> > > >
> > >> > > >    1.
> > >> > > >
> > >> > > >    Find your host name. Type: hostname. For example, you find
> your
> > >> > > hostname
> > >> > > >    is mycomputer.xzy.com
> > >> > > >    2.
> > >> > > >
> > >> > > >    Put your host name in your hosts file. /etc/hosts . Such as
> > >> > > >
> > >> > > >    10.50.16.136 mycomputer.xzy.com
> > >> > > >
> > >> > > >
> > >> > > > {code}
> > >> > > >
> > >> > > > -Yi
> > >> > > >
> > >> > > > On Thu, Jul 30, 2015 at 11:35 AM, Tommy Becker <
> tobec...@tivo.com
> > >
> > >> > > wrote:
> > >> > > >
> > >> > > > > We are testing some jobs on a YARN grid and noticed they are
> > often
> > >> > not
> > >> > > > > starting up properly due to being unable to connect to the job
> > >> > > > coordinator.
> > >> > > > > After some investigation it seems as if the jobs are always
> > >> getting a
> > >> > > > > coordinator URL of http://127.0.0.1:<port>  But my
> > understanding
> > >> is
> > >> > > that
> > >> > > > > the coordinator runs only in the AM, so I'd expect these URLs
> to
> > >> more
> > >> > > > often
> > >> > > > > than not be to some other machine.  Looking at the code
> however,
> > >> I'm
> > >> > > not
> > >> > > > > sure how that would ever happen since the URL for the
> > coordinator
> > >> > > always
> > >> > > > > comes from InetAddress.getLocalHost().getHostAddress() in
> > >> > > > > org.apache.samza.coordinator.server.HttpServer#getUrl
> > >> > > > >
> > >> > > > > Am I off base here?  Because I don't see how this is ever
> going
> > to
> > >> > work
> > >> > > > in
> > >> > > > > scenarios where the AM is on a different node than the
> > containers.
> > >> > > > >
> > >> > > > > --
> > >> > > > > Tommy Becker
> > >> > > > > Senior Software Engineer
> > >> > > > >
> > >> > > > > Digitalsmiths
> > >> > > > > A TiVo Company
> > >> > > > >
> > >> > > > > www.digitalsmiths.com<http://www.digitalsmiths.com>
> > >> > > > > tobec...@tivo.com<mailto:tobec...@tivo.com>
> > >> > > > >
> > >> > > > > ________________________________
> > >> > > > >
> > >> > > > > This email and any attachments may contain confidential and
> > >> > privileged
> > >> > > > > material for the sole use of the intended recipient. Any
> review,
> > >> > > copying,
> > >> > > > > or distribution of this email (or any attachments) by others
> is
> > >> > > > prohibited.
> > >> > > > > If you are not the intended recipient, please contact the
> sender
> > >> > > > > immediately and permanently delete this email and any
> > >> attachments. No
> > >> > > > > employee or agent of TiVo Inc. is authorized to conclude any
> > >> binding
> > >> > > > > agreement on behalf of TiVo Inc. by email. Binding agreements
> > with
> > >> > TiVo
> > >> > > > > Inc. may only be made by a signed written agreement.
> > >> > > > >
> > >> > > >
> > >> > > > ________________________________
> > >> > > >
> > >> > > > This email and any attachments may contain confidential and
> > >> privileged
> > >> > > > material for the sole use of the intended recipient. Any review,
> > >> > copying,
> > >> > > > or distribution of this email (or any attachments) by others is
> > >> > > prohibited.
> > >> > > > If you are not the intended recipient, please contact the sender
> > >> > > > immediately and permanently delete this email and any
> attachments.
> > >> No
> > >> > > > employee or agent of TiVo Inc. is authorized to conclude any
> > binding
> > >> > > > agreement on behalf of TiVo Inc. by email. Binding agreements
> with
> > >> TiVo
> > >> > > > Inc. may only be made by a signed written agreement.
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Navina R.
> > >> >
> > >> > ________________________________
> > >> >
> > >> > This email and any attachments may contain confidential and
> privileged
> > >> > material for the sole use of the intended recipient. Any review,
> > >> copying,
> > >> > or distribution of this email (or any attachments) by others is
> > >> prohibited.
> > >> > If you are not the intended recipient, please contact the sender
> > >> > immediately and permanently delete this email and any attachments.
> No
> > >> > employee or agent of TiVo Inc. is authorized to conclude any binding
> > >> > agreement on behalf of TiVo Inc. by email. Binding agreements with
> > TiVo
> > >> > Inc. may only be made by a signed written agreement.
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Navina R.
> > >>
> > >
> > >
> >
>

Reply via email to