Hey Martin, Mac OS and Linux. I only tested the changes on my Mac box, which is the one that didn't work with the yarn-site.xml change. I didn't test on the Linux (RHEL) box.
Cheers, Chris On 2/21/14 2:02 PM, "Martin Kleppmann" <[email protected]> wrote: >Thanks Chris. I'll keep digging into it. It's a really strange issue... >on my machine, run-job.sh still works fine (with the 127.0.0.1 setting >and without the YARN_HOME env var). I'll check out the YARN source and >see if that gives any clues. > >I'm running on Mac OS -- you too? > >Martin > >On 21 Feb 2014, at 19:26, Chris Riccomini <[email protected]> wrote: >> Hey Martin, >> >> Yea, that's somewhat alarming. I merged your commit, but after that >> commit, now I'm unable to get run-job.sh working. I opened a JIRA for >>this: >> >> https://issues.apache.org/jira/browse/SAMZA-154 >> >> >> I'm going to revert the commit for now, until we understand this better. >> >> Cheers, >> Chris >> >> On 2/21/14 7:11 AM, "Martin Kleppmann" <[email protected]> wrote: >> >>> Did a bit more experimentation: whether it works or not seems to vary >>> depending on the network my laptop is connected to. It works at my >>>home, >>> but it doesn't work at my girlfriend's apartment! Also whether or not >>>I'm >>> connected to the company's VPN seems to make a difference. >>> >>> It might be due to DNS: looks like YARN does some lookups to determine >>> the current machine's FQDN. That's probably very useful in a >>>datacenter, >>> but the results are somewhat undefined when using a laptop on a wifi >>> connection of dubious quality. >>> >>> So far I'm having success with the following config: >>> >>> 1. A change to yarn-site.xml, telling it to always look for the RM on >>> localhost: https://github.com/linkedin/hello-samza/pull/20 >>> >>> 2. echo "127.0.0.1 `hostname`" >> /etc/hosts (otherwise the RM refuses >>> to start up if it can't reach a DNS server to resolve the hostname) >>> >>> Cheers, >>> Martin >>> >>> On 20 Feb 2014, at 00:34, Martin Kleppmann <[email protected]> >>> wrote: >>>> Yeah, I checked -- no old YARN processes running. ZK and Kafka are the >>>> only other two Java processes running on my machine. >>>> >>>> Martin >>>> >>>> On 20 Feb 2014, at 00:20, Chris Riccomini <[email protected]> >>>> wrote: >>>>> Hey Martin, >>>>> >>>>> Have you checked if you've leaked a NM process? >>>>> >>>>> I've seen cases in the past where an NM wasn't properly shutdown, and >>>>> the >>>>> pid was over-written. Could be that. >>>>> >>>>> Cheers, >>>>> Chris >>>>> >>>>> On 2/19/14 4:18 PM, "Martin Kleppmann" <[email protected]> >>>>>wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I'm suddenly having problems with YARN as set up by hello-samza. It >>>>>> was >>>>>> working fine earlier today and I don't recall changing anything in >>>>>>my >>>>>> setup -- so I just wanted to check if anyone has seen this before. >>>>>> >>>>>> The YARN resourcemanager seems to start up fine (at least the web UI >>>>>> works, and nothing strange-looking in the log). But when the >>>>>> nodemanager >>>>>> starts, I see a lot of this in its logs: >>>>>> >>>>>> 14/02/20 00:00:04 INFO ipc.Client: Retrying connect to server: >>>>>> 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); maxRetries=45 >>>>>> 14/02/20 00:00:08 INFO ipc.Client: Retrying connect to server: >>>>>> 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is >>>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 >>>>>> SECONDS) >>>>>> 14/02/20 00:00:09 INFO ipc.Client: Retrying connect to server: >>>>>> 0.0.0.0/0.0.0.0:8031. Already tried 1 time(s); retry policy is >>>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 >>>>>> SECONDS) >>>>>> 14/02/20 00:00:11 INFO ipc.Client: Retrying connect to server: >>>>>> 0.0.0.0/0.0.0.0:8031. Already tried 2 time(s); retry policy is >>>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 >>>>>> SECONDS) >>>>>> 14/02/20 00:00:12 INFO ipc.Client: Retrying connect to server: >>>>>> 0.0.0.0/0.0.0.0:8031. Already tried 3 time(s); retry policy is >>>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 >>>>>> SECONDS) >>>>>> >>>>>> ...etc repeating every few seconds, and never connecting. But the RM >>>>>> is >>>>>> listening on localhost:8031 (verified with netcat). >>>>>> >>>>>> run-job.sh similarly sits there, writing a similar message to >>>>>> hello-samza/deploy/samza/undefined-samza-container-name.log every >>>>>>few >>>>>> seconds (but with port 8032 instead of 8031). >>>>>> >>>>>> Any ideas? >>>>>> >>>>>> Thanks, >>>>>> Martin >>>>>> >>>>> >>>> >>> >> >
