Hey Ethan,

Are you explicitly using "export HADOOP_YARN_HOME=…" before running your
./gradlew command? If you're not calling "export" then the
HADOOP_YARN_HOME will get dropped from all subprocesses, which means
Gradle's driver script would see it, but runJob wouldn't.

Can you confirm that /usr/hadoop/conf/yarn-site.xml exists?

This might not work with Gradle. I've never tried running runJob from
within gradle. I always call run-job.sh directly.

Cheers,
Chris

On 9/12/14 11:01 AM, "Ethan Setnik" <[email protected]> wrote:

>Hi Chris,
>
>
>
>
>That doesn’t seem to be working. I also noticed that YARN_HOME seems to
>be replaced with HADOOP_YARN_HOME. To be safe I set both.
>
>
>
>
>
>$ ls -la /usr/hadoop/conf
>
>total 12
>
>drwxr-xr-x 2 root  root  4096 Sep 12 17:17 .
>
>drwxr-xr-x 5 samza samza 4096 Sep 12 17:17 ..
>
>-rw-r--r-- 1 samza  samza  1978 Sep 12 17:17 yarn-site.xml
>
>
>
>
>
>
>
>$ YARN_HOME=/usr/hadoop HADOOP_YARN_HOME=/usr/hadoop./gradlew
>samza-shell:runJob
>-PconfigPath=file:///usr/samza/config/wikipedia-feed.properties <- does
>not load the yarn-site.xml found in /usr/hadoop/conf/yarn-site.xml
>
>
>
>
>The documentation 
>http://www.gradle.org/docs/current/userguide/tutorial_this_and_that.html
>for 14.2 Gradle properties and system properties is horrible.
>
>
>
>
>
>Is this the correct way to set the environment when using gradle tasks?
>It does not work.
>
>On Fri, Sep 12, 2014 at 12:41 PM, Chris Riccomini
><[email protected]> wrote:
>
>> Hey Ethan,
>> I believe that you need to export YARN_HOME=<path to yarn home dir>
>> The YARN_HOME dir must have a conf/yarn-site.xml in it.
>> Cheers,
>> Chris
>> On 9/12/14 9:38 AM, "Ethan Setnik" <[email protected]> wrote:
>>>My initial tests suggest that the HA failover is handled gracefully in
>>>yarn 2.4.0 without any changes to Samza due to the abstraction of
>>>AMRMClient as mentioned by Zhijie.  I’ve also confirmed that upgrading
>>>to
>>>yarn 2.5.0 works as well FWIW.
>>>
>>>
>>>
>>>
>>>The one thing that’s not working completely is
>>>org.apache.samza.job.JobRunner
>>>
>>>
>>>
>>>
>>>
>>>$ ./gradlew samza-shell:runJob
>>>-PconfigPath=file:///usr/samza/config/wikipedia-feed.properties
>>>
>>>...
>>>
>>>
>>>
>>>
>>>2014-09-12 16:13:08 JobRunner [INFO] job factory:
>>>org.apache.samza.job.yarn.YarnJobFactory
>>>
>>>2014-09-12 16:13:08 ClientHelper [INFO] trying to connect to RM
>>>0.0.0.0:8032
>>>
>>>2014-09-12 16:13:08 RMProxy [INFO] Connecting to ResourceManager at
>>>/0.0.0.0:8032
>>>
>>>…
>>>
>>>
>>>
>>>
>>>The JobRunner creates a new instance of YarnJobFactory which initializes
>>>a default instance of YarnConfiguration instead of reading an existing
>>>configuration from the file system.
>>>
>>>
>>>
>>>
>>>
>>>class YarnJobFactory extends StreamJobFactory {
>>>
>>>  def getJob(config: Config) = {
>>>
>>>    // TODO fix this. needed to support http package locations.
>>>
>>>    val hConfig = new YarnConfiguration
>>>
>>>    hConfig.set("fs.http.impl", classOf[HttpFileSystem].getName)
>>>
>>>
>>>
>>>
>>>    new YarnJob(config, hConfig)
>>>
>>>  }
>>>
>>>}
>>>
>>>
>>>
>>>
>>>This default config is then used to create and submit the job.
>>>
>>>
>>>
>>>
>>>
>>>class YarnJob(config: Config, hadoopConfig: Configuration) extends
>>>StreamJob {
>>>
>>>  import YarnJob._
>>>
>>>  
>>>
>>>  val client = new ClientHelper(hadoopConfig)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>It looks as if it’s always using the default configuration. How do I
>>>specify the environment for hadoop to load the yarn-site.xml I have
>>>placed in /usr/hadoop/etc/hadoop instead of using the defaults?
>>>
>>>On Thu, Sep 11, 2014 at 6:51 PM, Zhijie Shen <[email protected]>
>>>wrote:
>>>
>>>> W.R.T HA RM, AM doesn't need to make any change if it is using
>>>> AMRMClient(Async), which will automatically take care of the work
>>>>required
>>>> when RM failover happens.
>>>> One thing that may interest Samza is the work-preserving AM
>>>>restarting.
>>>>If
>>>> Samza AM crash for some reason, when it restarts, it can try to get
>>>>the
>>>>old
>>>> containers back, to continue the work. To achieve this, Samza needs
>>>>some
>>>> logic optimization on the AM side.
>>>> On Thu, Sep 11, 2014 at 2:45 PM, Chris Riccomini <
>>>> [email protected]> wrote:
>>>>> Hey Ethan,
>>>>>
>>>>> Yes, we plan to support this, but haven't done any testing with it
>>>>>yet.
>>>>>
>>>>> The main question that I have is whether supporting HA RM requires
>>>>>Samza's
>>>>> AM to have code changes, or whether the YARN AM client will
>>>>>transparently
>>>>> handle RM failover. Until we run some tests on this (or are told
>>>>> otherwise), I just don't know (and haven't had time to investigate).
>>>>>Does
>>>>> anyone know the answer to this question?
>>>>>
>>>>> Cheers,
>>>>> Chris
>>>>>
>>>>> On 9/11/14 1:55 PM, "Ethan Setnik" <[email protected]> wrote:
>>>>>
>>>>> >Yarn 2.4.0 brings support for high availability configurations by
>>>>> >specifying a cluster of resource managers and a state store via
>>>>>zookeeper.
>>>>> >
>>>>> >
>>>>> >"When there are multiple RMs, the configuration (yarn-site.xml) used
>>>>>by
>>>>> >clients and nodes is expected to list all the RMs. Clients,
>>>>> >ApplicationMasters (AMs) and NodeManagers (NMs) try connecting to
>>>>>the
>>>>>RMs
>>>>> >in a round-robin fashion until they hit the Active RM. If the Active
>>>>>goes
>>>>> >down, they resume the round-robin polling until they hit the "new"
>>>>> >Active.²
>>>>> >
>>>>> >
>>>>> >Does Samza have any plans to support Yarn HA configurations?
>>>>>
>>>>>
>>>> -- 
>>>> Zhijie Shen
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>> -- 
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>>entity to 
>>>> which it is addressed and may contain information that is
>>>>confidential,
>>>> privileged and exempt from disclosure under applicable law. If the
>>>>reader 
>>>> of this message is not the intended recipient, you are hereby notified
>>>>that 
>>>> any printing, copying, dissemination, distribution, disclosure or
>>>> forwarding of this communication is strictly prohibited. If you have
>>>> received this communication in error, please contact the sender
>>>>immediately 
>>>> and delete it from your system. Thank You.

Reply via email to