Re: Issues with YARN in hello-samza

Martin Kleppmann Fri, 21 Feb 2014 07:12:30 -0800

Did a bit more experimentation: whether it works or not seems to vary depending 
on the network my laptop is connected to. It works at my home, but it doesn't 
work at my girlfriend's apartment! Also whether or not I'm connected to the 
company's VPN seems to make a difference.


It might be due to DNS: looks like YARN does some lookups to determine the 
current machine's FQDN. That's probably very useful in a datacenter, but the 
results are somewhat undefined when using a laptop on a wifi connection of 
dubious quality.

So far I'm having success with the following config:

1. A change to yarn-site.xml, telling it to always look for the RM on 
localhost: https://github.com/linkedin/hello-samza/pull/20

2. echo "127.0.0.1 `hostname`" >> /etc/hosts  (otherwise the RM refuses to 
start up if it can't reach a DNS server to resolve the hostname)

Cheers,
Martin

On 20 Feb 2014, at 00:34, Martin Kleppmann <[email protected]> wrote:
> Yeah, I checked -- no old YARN processes running. ZK and Kafka are the only 
> other two Java processes running on my machine.
> 
> Martin
> 
> On 20 Feb 2014, at 00:20, Chris Riccomini <[email protected]> wrote:
>> Hey Martin,
>> 
>> Have you checked if you've leaked a NM process?
>> 
>> I've seen cases in the past where an NM wasn't properly shutdown, and the
>> pid was over-written. Could be that.
>> 
>> Cheers,
>> Chris
>> 
>> On 2/19/14 4:18 PM, "Martin Kleppmann" <[email protected]> wrote:
>> 
>>> Hi,
>>> 
>>> I'm suddenly having problems with YARN as set up by hello-samza. It was
>>> working fine earlier today and I don't recall changing anything in my
>>> setup -- so I just wanted to check if anyone has seen this before.
>>> 
>>> The YARN resourcemanager seems to start up fine (at least the web UI
>>> works, and nothing strange-looking in the log). But when the nodemanager
>>> starts, I see a lot of this in its logs:
>>> 
>>> 14/02/20 00:00:04 INFO ipc.Client: Retrying connect to server:
>>> 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); maxRetries=45
>>> 14/02/20 00:00:08 INFO ipc.Client: Retrying connect to server:
>>> 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is
>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>>> 14/02/20 00:00:09 INFO ipc.Client: Retrying connect to server:
>>> 0.0.0.0/0.0.0.0:8031. Already tried 1 time(s); retry policy is
>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>>> 14/02/20 00:00:11 INFO ipc.Client: Retrying connect to server:
>>> 0.0.0.0/0.0.0.0:8031. Already tried 2 time(s); retry policy is
>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>>> 14/02/20 00:00:12 INFO ipc.Client: Retrying connect to server:
>>> 0.0.0.0/0.0.0.0:8031. Already tried 3 time(s); retry policy is
>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>>> 
>>> ...etc repeating every few seconds, and never connecting. But the RM is
>>> listening on localhost:8031 (verified with netcat).
>>> 
>>> run-job.sh similarly sits there, writing a similar message to
>>> hello-samza/deploy/samza/undefined-samza-container-name.log every few
>>> seconds (but with port 8032 instead of 8031).
>>> 
>>> Any ideas?
>>> 
>>> Thanks,
>>> Martin
>>> 
>> 
>

Re: Issues with YARN in hello-samza

Reply via email to