Hey Sonali, As a follow on, here is EXACTLY what I did:
# start by downloading YARN and setting it up 1. Download YARN 2.3 from http://mirror.symnds.com/software/Apache/hadoop/common/hadoop-2.3.0/hadoop- 2.3.0.tar.gz to /tmp 2. cd /tmp 3. tar -xvf hadoop-2.3.0.tar.gz 4. cd hadoop-2.3.0 5. export HADOOP_YARN_HOME=$(pwd) 6. mkdir conf 7. export HADOOP_CONF_DIR=$HADOOP_YARN_HOME/conf 8. cp ./etc/hadoop/yarn-site.xml conf 9. vi conf/yarn-site.xml 10. Add this property to yarn-site.xml: <property> <name>yarn.resourcemanager.hostname</name> <!-- hostname that is accessible from all NMs --> <value>criccomi-mn</value> </property> 11. curl http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-yarn-project/hadoop -yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/resources/capaci ty-scheduler.xml?view=co > conf/capacity-scheduler.xml # setup http filesystem for YARN (you can skip this and follow SAMZA-181 instead, if you are using HDFS) 12. cd /tmp 13. curl http://www.scala-lang.org/files/archive/scala-2.10.3.tgz > scala-2.10.3.tgz 14. tar -xvf scala-2.10.3.tgz 15. cp /tmp/scala-2.10.3/lib/scala-compiler.jar $HADOOP_YARN_HOME/share/hadoop/hdfs/lib 16. cp /tmp/scala-2.10.3/lib/scala-library.jar $HADOOP_YARN_HOME/share/hadoop/hdfs/lib 17. curl http://search.maven.org/remotecontent?filepath=org/clapper/grizzled-slf4j_2 .10/1.0.1/grizzled-slf4j_2.10-1.0.1.jar > $HADOOP_YARN_HOME/share/hadoop/hdfs/lib/grizzled-slf4j_2.10-1.0.1.jar 18. vi $HADOOP_YARN_HOME/conf/core-site.xml <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.http.impl</name> <value>org.apache.samza.util.hadoop.HttpFileSystem</value> </property> </configuration> 19. Copy the Hadoop directory to all slave nodes (172.21.100.35, in my case): scp -r . 172.21.100.35:/tmp/hadoop-2.3.0 20. echo 172.21.100.35 > conf/slaves 21. sbin/start-yarn.sh 22. If you get "172.21.100.35: Error: JAVA_HOME is not set and could not be found.", you'll need to add a conf/hadoop-env.sh file to the machine with the failure (172.21.100.35, in this case), which has "export JAVA_HOME=/export/apps/jdk/JDK-1_6_0_27" (or wherever your JAVA_HOME actually is). 23. Validate that your nodes are up by visiting http://criccomi-mn:8088/cluster/nodes # now we more or less follow the hello-samza steps. 24. cd /tmp 25. git clone http://git-wip-us.apache.org/repos/asf/incubator-samza.git 26. cd incubator-samza 27. curl https://issues.apache.org/jira/secure/attachment/12634493/SAMZA-182.1.patch > SAMZA-182.1.patch 28. git apply SAMZA-182.1.patch 29. ./gradlew clean publishToMavenLocal 30. cd .. 31. git clone git://github.com/linkedin/hello-samza.git 32. cd hello-samza 33. vi samza-job-package/src/main/config/wikipedia-feed.properties 34. Change the yarn.package.path property to be: yarn.package.path=http://criccomi-mn:8000/samza-job-package/target/samza-jo b-package-0.7.0-dist.tar.gz 35. mvn clean package 36. mkdir -p deploy/samza 37. tar -xvf ./samza-job-package/target/samza-job-package-0.7.0-dist.tar.gz -C deploy/samza 38. Open a new terminal, and cd /tmp/hello-samza && python -m SimpleHTTPServer 39. Go back to the original terminal (not the one running the HTTP server) 40. deploy/samza/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-feed.properties 41. Go to http://criccomi-mn:8088 and find the wikipedia-feed job. Click on the ApplicationMaster link to see that it's running. I plan to write a tutorial that formalizes this. Cheers, Chris On 3/13/14 11:03 AM, "Chris Riccomini" <[email protected]> wrote: >Hey Sonali, > >Please have a look at: > > https://issues.apache.org/jira/browse/SAMZA-182 > >I have posted a fix there. I have successfully downloaded YARN, setup a >two node grid from scratch, and run hello-samza on it with the patch on >SAMZA-182. Can you give that a shot for me? > >Thanks for your patience! > >Cheers, >Chris > >On 3/13/14 10:58 AM, "[email protected]" ><[email protected]> wrote: > >>Hi Chris, >> >>I checked my .bashrc. The variable was set on one of the NMs and not on >>the other. I made the change and restarted the scripts. I still get the >>same error. >> >>Also in my stderr I get: >>Null identity service, trying login service: null >>Finding identity service: null >>Null identity service, trying login service: null >>Finding identity service: null >> >>-----Original Message----- >>From: Chris Riccomini [mailto:[email protected]] >>Sent: Wednesday, March 12, 2014 7:59 PM >>To: [email protected] >>Subject: Re: Failed to package using mvn >> >>Hey Guys, >> >>I was able to reproduce this problem. >> >>I was also able to fix it (without the patch in SAMZA-182). All I needed >>to do was update ~/.bashrc on my NM's box to have: >> >> export YARN_HOME=/tmp/hadoop-2.3.0 >> >>It appears that the YARN environment variables are somehow getting lost >>or not forwarded from the NM to the AM. Adding this bashrc setting makes >>sure that the NM gets them. >> >> >>I have a feeling upgrading Samza to YARN 2.3.0 will fix this, but I >>haven't validated yet. I will continue to investigate tomorrow. >> >>Cheers, >>Chris >> >>On 3/12/14 6:43 PM, "Yan Fang" <[email protected]> wrote: >> >>>I guess Sonali has the problem is because his NMs do not read the >>>YARN_HOME variable. That may be because the NM machine does not have >>>YARN_HOME set when the NM starts. >>> >>>Check this https://issues.apache.org/jira/browse/SAMZA-182 >>> >>>Thanks, >>> >>>Yan Fang >>> >>>> On Mar 12, 2014, at 6:14 PM, Chris Riccomini >>>><[email protected]> >>>>wrote: >>>> >>>> Hey Sonali, >>>> >>>> I am unfamiliar with the start-yarn.sh. Looking at: >>>> >>>> >>>> >>>>https://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-yarn-project/ >>>>had >>>>oo >>>> p-yarn/bin/stop-yarn.sh?revision=1370666&view=markup >>>> >>>> What version of YARN are you using? >>>> >>>> Cheers, >>>> Chris >>>> >>>> On 3/12/14 5:56 PM, "[email protected]" >>>> <[email protected]> wrote: >>>> >>>>> Hey Chris, >>>>> >>>>> Yes, I have YARN_HOME set in all the NMs pointing to the right >>>>>directories. I also made sure the yarn-site.xml file has the hostname >>>>>set. >>>>> >>>>> I start yarn using start.yarn.sh in the RM and that automatically >>>>>starts the NMs on the slave nodes. Is that the right way to do it? >>>>> >>>>> -----Original Message----- >>>>> From: Chris Riccomini [mailto:[email protected]] >>>>> Sent: Wednesday, March 12, 2014 5:52 PM >>>>> To: [email protected] >>>>> Subject: Re: Failed to package using mvn >>>>> >>>>> Hey Sonali, >>>>> >>>>> OK, so we've validated that the NMs are able to connect, which means >>>>>they can see the yarn-site.xml. >>>>> >>>>> How are you starting your NMs? Are you running: >>>>> >>>>> export YARN_HOME=/path/to/yarn/home >>>>> >>>>> In the CLI before starting the NM? >>>>> >>>>> For reference, we run: >>>>> >>>>> export YARN_HOME=/path/to/our/yarn-home export >>>>> YARN_CONF_DIR=$YARN_HOME/conf >>>>> >>>>> bin/yarn nodemanager >>>>> >>>>> With YARN_HOME pointing to a directory that has a subdirectory >>>>> called "conf" in it, which has a yarn-site.xml in it: >>>>> >>>>> /path/to/our/yarn-home/conf/yarn-site.xml >>>>> >>>>> This yarn-site.xml has yarn.resourcemanager.hostname set to the IP >>>>> (or >>>>> hostname) of the resource manager: >>>>> >>>>> <property> >>>>> <name>yarn.resourcemanager.hostname</name> >>>>> <value>123.456.789.123</value> >>>>> </property> >>>>> >>>>> >>>>> Cheers, >>>>> Chris >>>>> >>>>> On 3/12/14 5:33 PM, "[email protected]" >>>>> <[email protected]> wrote: >>>>> >>>>>> I see two active nodes (I have 2 NMs running) >>>>>> >>>>>> -----Original Message----- >>>>>> From: Chris Riccomini [mailto:[email protected]] >>>>>> Sent: Wednesday, March 12, 2014 5:24 PM >>>>>> To: [email protected] >>>>>> Subject: Re: Failed to package using mvn >>>>>> >>>>>> Hey Sonali, >>>>>> >>>>>> Can you go to your ResourceManager's UI, and tell me how many >>>>>> active nodes you see? This should be under the "active nodes" >>>>>>heading. >>>>>> >>>>>> It sounds like the SamzaAppMaster is not getting the resource >>>>>> manager host/port from the yarn-site.xml. Usually this is due to >>>>>> not exporting YARN_HOME on the NodeManager before starting it. >>>>>> >>>>>> Cheers, >>>>>> Chris >>>>>> >>>>>> On 3/12/14 5:21 PM, "[email protected]" >>>>>> <[email protected]> wrote: >>>>>> >>>>>>> Okay so I was able to submit the job: >>>>>>> >>>>>>> In the nodemanager I get this error: Specifically it's trying to >>>>>>> connect to 0.0.0.0/8032 instead of the IP I have specified in the >>>>>>> yarn-site.xml file >>>>>>> >>>>>>> 2014-03-12 17:04:47 SamzaAppMaster$ [INFO] got container id: >>>>>>> container_1391637982288_0033_01_000001 >>>>>>> 2014-03-12 17:04:47 SamzaAppMaster$ [INFO] got app attempt id: >>>>>>> appattempt_1391637982288_0033_000001 >>>>>>> 2014-03-12 17:04:47 SamzaAppMaster$ [INFO] got node manager host: >>>>>>> svdpdac001.techlabs.accenture.com >>>>>>> 2014-03-12 17:04:47 SamzaAppMaster$ [INFO] got node manager port: >>>>>>> 38218 >>>>>>> 2014-03-12 17:04:47 SamzaAppMaster$ [INFO] got node manager http >>>>>>>port: >>>>>>> 8042 >>>>>>> 2014-03-12 17:04:47 SamzaAppMaster$ [INFO] got config: >>>>>>> >>>>>>>{task.inputs=wikipedia.#en.wikipedia,wikipedia.#en.wiktionary,wikip >>>>>>>edi a .#e n.wikinews, systems.wikipedia.host=irc.wikimedia.org, >>>>>>> systems.kafka.producer.batch.num.messages=1, >>>>>>> job.factory.class=org.apache.samza.job.yarn.YarnJobFactory, >>>>>>> systems.wikipedia.port=6667, >>>>>>> systems.kafka.producer.producer.type=sync, >>>>>>> job.name=wikipedia-feed, >>>>>>> >>>>>>>systems.kafka.consumer.zookeeper.connect=svdpdac013.techlabs.accent >>>>>>>ure >>>>>>>. >>>>>>> com :2181/, systems.kafka.samza.msg.serde=json, >>>>>>> >>>>>>>serializers.registry.json.class=org.apache.samza.serializers.JsonSe >>>>>>>rde >>>>>>> F >>>>>>> act ory, >>>>>>> task.class=samza.examples.wikipedia.task.WikipediaFeedStreamTask, >>>>>>> >>>>>>>yarn.package.path=hdfs://10.1.174.85:9000/samza-job-package-0.7.0-d >>>>>>>ist >>>>>>>. >>>>>>> tar >>>>>>> .gz, >>>>>>> >>>>>>>systems.wikipedia.samza.factory=samza.examples.wikipedia.system.Wik >>>>>>>ipe >>>>>>> d >>>>>>> iaS >>>>>>> ystemFactory, >>>>>>> >>>>>>>systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSyst >>>>>>>emF >>>>>>> a >>>>>>> cto >>>>>>> ry, >>>>>>> >>>>>>>systems.kafka.producer.metadata.broker.list=svdpdac001.techlabs.acc >>>>>>>ent >>>>>>> ure >>>>>>> . >>>>>>> com:6667,svdpdac015.techlabs.accenture.com:6667} >>>>>>> 2014-03-12 17:04:48 ClientHelper [INFO] trying to connect to RM >>>>>>> 0.0.0.0:8032 >>>>>>> 2014-03-12 17:04:48 NativeCodeLoader [WARN] Unable to load >>>>>>>native-hadoop library for your platform... using builtin-java >>>>>>>classes where applicable >>>>>>> 2014-03-12 17:04:48 RMProxy [INFO] Connecting to ResourceManager >>>>>>>at >>>>>>> /0.0.0.0:8032 >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Chris Riccomini [mailto:[email protected]] >>>>>>> Sent: Wednesday, March 12, 2014 4:48 PM >>>>>>> To: [email protected] >>>>>>> Subject: Re: Failed to package using mvn >>>>>>> >>>>>>> Hey Sonali, >>>>>>> >>>>>>> You need to specify a valid HDFS uri. Usually something like: >>>>>>> >>>>>>> hdfs://<hdfs name node ip>:<hdfs name node port>/path/to/tgz >>>>>>> >>>>>>> Right now, Hadoop is trying to use the package name as the HDFS >>>>>>>host. >>>>>>> >>>>>>> Cheers, >>>>>>> Chris >>>>>>> >>>>>>> On 3/12/14 4:45 PM, "[email protected]" >>>>>>> <[email protected]> wrote: >>>>>>> >>>>>>>> I did and I can now see the hadoop-hdfs jar in /deploy/samza/lib >>>>>>>> folder. >>>>>>>> >>>>>>>> I do get a different error now. >>>>>>>> >>>>>>>> I uploaded the samza-job to hdfs and it resides on >>>>>>>> hdfs://samza-job-package-0.7.0-dist.tar.gz >>>>>>>> >>>>>>>> But when I run the job I get this exception: >>>>>>>> >>>>>>>> Exception in thread "main" java.lang.IllegalArgumentException: >>>>>>>> java.net.UnknownHostException: samza-job-package-0.7.0-dist.tar.gz >>>>>>>> at >>>>>>>> >>>>>>>>org.apache.hadoop.security.SecurityUtil.buildTokenService(Security >>>>>>>>Uti >>>>>>>>l. >>>>>>>> jav >>>>>>>> a:418) >>>>>>>> at >>>>>>>> >>>>>>>>org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodePr >>>>>>>>oxi >>>>>>>> e >>>>>>>> s >>>>>>>> .ja >>>>>>>> va:231) >>>>>>>> at >>>>>>>> >>>>>>>>org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies >>>>>>>>.ja >>>>>>>> v >>>>>>>> a >>>>>>>> :13 >>>>>>>> 9) >>>>>>>> at >>>>>>>>org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:510) >>>>>>>> at >>>>>>>>org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453) >>>>>>>> at >>>>>>>> >>>>>>>>org.apache.hadoop.hdfs.DistributedFileSystem.initialize(Distribute >>>>>>>>dFi >>>>>>>> l >>>>>>>> e >>>>>>>> Sys >>>>>>>> tem.java:136) >>>>>>>> at >>>>>>>> >>>>>>>>org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2 >>>>>>>>433 >>>>>>>>) >>>>>>>> at >>>>>>>> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88) >>>>>>>> at >>>>>>>> >>>>>>>>org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java: >>>>>>>>246 >>>>>>>>7) >>>>>>>> at >>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449) >>>>>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) >>>>>>>> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:287) >>>>>>>> at >>>>>>>> >>>>>>>>org.apache.samza.job.yarn.ClientHelper.submitApplication(ClientHel >>>>>>>>per >>>>>>>>. >>>>>>>> s >>>>>>>> cal >>>>>>>> a:111) >>>>>>>> at org.apache.samza.job.yarn.YarnJob.submit(YarnJob.scala:55) >>>>>>>> at org.apache.samza.job.yarn.YarnJob.submit(YarnJob.scala:48) >>>>>>>> at org.apache.samza.job.JobRunner.run(JobRunner.scala:100) >>>>>>>> at org.apache.samza.job.JobRunner$.main(JobRunner.scala:75) >>>>>>>> at org.apache.samza.job.JobRunner.main(JobRunner.scala) >>>>>>>> Caused by: java.net.UnknownHostException: >>>>>>>> samza-job-package-0.7.0-dist.tar.gz >>>>>>>> ... 18 more >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Yan Fang [mailto:[email protected]] >>>>>>>> Sent: Wednesday, March 12, 2014 4:20 PM >>>>>>>> To: [email protected] >>>>>>>> Subject: Re: Failed to package using mvn >>>>>>>> >>>>>>>> Hi Sonali, >>>>>>>> >>>>>>>> One tip you may miss: >>>>>>>> >>>>>>>> If you had already run >>>>>>>> >>>>>>>> tar -xvf >>>>>>>> ./samza-job-package/target/samza-job-package-0.7.0-dist.tar.gz >>>>>>>> -C deploy/samza >>>>>>>> >>>>>>>> before you bundled the jar file to tar.gz. Please also remember >>>>>>>> to put the hdfs jar file to the deploy/samza/lib. >>>>>>>> >>>>>>>> Let me know if you miss this step. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Fang, Yan >>>>>>>> [email protected] >>>>>>>> +1 (206) 849-4108 >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Mar 12, 2014 at 4:10 PM, Chris Riccomini >>>>>>>> <[email protected]>wrote: >>>>>>>> >>>>>>>>> Hey Sonali, >>>>>>>>> >>>>>>>>> Yan has made a step-by-step tutorial for this. Could you confirm >>>>>>>>> that you've followed the instructions, and it's still not >>>>>>>>>working? >>>>>>>>> >>>>>>>>> https://issues.apache.org/jira/browse/SAMZA-181 >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Chris >>>>>>>>> >>>>>>>>> On 3/12/14 3:12 PM, "[email protected]" >>>>>>>>> <[email protected]> wrote: >>>>>>>>> >>>>>>>>>> So sigh! I had some Kafka issues in-between. That's fixed now. >>>>>>>>>> >>>>>>>>>> As suggested, >>>>>>>>>> >>>>>>>>>> 1. I made sure the hadoop-hdfs-2.2.0.jar is bundled with the >>>>>>>>>> samza job tar.gz. >>>>>>>>>> 2. I added the configuration to implement hdfs in the >>>>>>>>>> hdfs-site.xml files both on the NMs and in the /conf directory >>>>>>>>>> for samza >>>>>>>>>> >>>>>>>>>> I still get the No Filesystem for scheme :hdfs error. >>>>>>>>>> >>>>>>>>>> Is there anything else im missing? >>>>>>>>>> Thanks, >>>>>>>>>> Sonali >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Chris Riccomini [mailto:[email protected]] >>>>>>>>>> Sent: Tuesday, March 11, 2014 8:27 PM >>>>>>>>>> To: [email protected] >>>>>>>>>> Subject: Re: Failed to package using mvn >>>>>>>>>> >>>>>>>>>> Hey Yan, >>>>>>>>>> >>>>>>>>>> This looks great! I added a few requests to the JIRA, if you >>>>>>>>>> have >>>>>>>>> time. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Chris >>>>>>>>>> >>>>>>>>>>> On 3/11/14 7:20 PM, "Yan Fang" <[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Chris, >>>>>>>>>>> >>>>>>>>>>> Has opened an issue >>>>>>>>>>> SAMZA-181<https://issues.apache.org/jira/browse/SAMZA-181>and >>>>>>>>>>> also uploaded the patch. Let me know if there is something >>>>>>>>>>> wrong in my tutorial. Thank you! >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> >>>>>>>>>>> Fang, Yan >>>>>>>>>>> [email protected] >>>>>>>>>>> +1 (206) 849-4108 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Mar 11, 2014 at 10:40 AM, >>>>>>>>>>> <[email protected]>wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks Chris, Yan, >>>>>>>>>>>> >>>>>>>>>>>> Let me try that. >>>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: Chris Riccomini [mailto:[email protected]] >>>>>>>>>>>> Sent: Tuesday, March 11, 2014 10:22 AM >>>>>>>>>>>> To: [email protected] >>>>>>>>>>>> Subject: Re: Failed to package using mvn >>>>>>>>>>>> >>>>>>>>>>>> Hey Yan, >>>>>>>>>>>> >>>>>>>>>>>> Awesome!The location where you can add your .md is here: >>>>>>>>>>>> >>>>>>>>>>>> docs/learn/tutorials/0.7.0/ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Here's a link to the code tree: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> https://git-wip-us.apache.org/repos/asf?p=incubator-samza.git >>>>>>>>>>>> ;a= >>>>>>>>>>>> t >>>>>>>>>>>> r >>>>>>>>>>>> e >>>>>>>>>>>> e;f >>>>>>>>>>>> =do >>>>>>>>>>>> cs >>>>>>>>>>>> >>>>>>>>>>>> /learn/tutorials/0.7.0;h=ef117f4066f14a00f50f0f6fca1790313044 >>>>>>>>>>>> 831 >>>>>>>>>>>> 2 >>>>>>>>>>>> ; >>>>>>>>>>>> h >>>>>>>>>>>> b=H >>>>>>>>>>>> EAD >>>>>>>>>>>> >>>>>>>>>>>> You can get the code here: >>>>>>>>>>>> >>>>>>>>>>>> git clone >>>>>>>>>>>> http://git-wip-us.apache.org/repos/asf/incubator-samza.git >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Once you write the .md, just throw it up on a JIRA, and one >>>>>>>>>>>> of us can merge it in. >>>>>>>>>>>> >>>>>>>>>>>> Re: hdfs-site.xml, ah ha, that's what I figured. This is good >>>>>>>>>>>> to >>>>>>>>> know. >>>>>>>>>>>> So >>>>>>>>>>>> you just copy your hdfs-site.xml from your NodeManager's conf >>>>>>>>>>>> directory into your local hdfs-site.xml. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Chris >>>>>>>>>>>> >>>>>>>>>>>>> On 3/11/14 10:16 AM, "Yan Fang" <[email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Chris, >>>>>>>>>>>>> >>>>>>>>>>>>> Sure. I just do not know how/where to contribute this >>>>>>>>>>>>> page...*_* >>>>>>>>>>>>> >>>>>>>>>>>>> Oh, I mean the same this as you mentioned in the *Cluster >>>>>>>>>>>>> Installation*thread: >>>>>>>>>>>>> >>>>>>>>>>>>> *"2. Get a copy of one of your NM's yarn-site.xml and put it >>>>>>>>>>>>> somewhere >>>>>>>>>>>>> on* >>>>>>>>>>>>> >>>>>>>>>>>>> *your desktop (I usually use ~/.yarn/conf/yarn-site.xml). >>>>>>>>>>>>> Note that there'sa "conf" directory there. This is >>>>>>>>>>>>> mandatory."* >>>>>>>>>>>>> >>>>>>>>>>>>> So I just copy the hdfs-site.xml to >>>>>>>>>>>>>~/.yarn/conf/hdfs-site.xml. >>>>>>>>>>>>> Thank >>>>>>>>>>>> you. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> >>>>>>>>>>>>> Fang, Yan >>>>>>>>>>>>> [email protected] >>>>>>>>>>>>> +1 (206) 849-4108 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Mar 11, 2014 at 10:10 AM, Chris Riccomini >>>>>>>>>>>>> <[email protected]>wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hey Yan, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Would you be up for contributing a tutorial page that >>>>>>>>>>>>>> describes >>>>>>>>>>>> this? >>>>>>>>>>>>>> This >>>>>>>>>>>>>> is really useful information. Our docs are just simple .md >>>>>>>>>>>>>> files in the main code base. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regarding step (3), is the hdfs-site.xml put into the conf >>>>>>>>>>>>>> folder for the NM boxes, or on the client side (where >>>>>>>>>>>>>> run-job.sh >>>>>>>>> is run)? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> Chris >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 3/11/14 10:07 AM, "Yan Fang" <[email protected]> >>>>>>>>>>>>>>>wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Sonali, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The way I make Samza run with HDFS is following: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1. include hdfs jar in Samza jar tar.gz. >>>>>>>>>>>>>>> 2. you may also want to make sure the hadoop-common.jar >>>>>>>>>>>>>>> has the same version as your hdfs jar. Otherwise, you may >>>>>>>>>>>>>>> have configuration error popping out. >>>>>>>>>>>>>>> 3. then put hdfs-site.xml to conf folder, the same folder >>>>>>>>>>>>>>> as the yarn-site.xml 4. all other steps are not changed. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hope this will help. Thank you. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Fang, Yan >>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>> +1 (206) 849-4108 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Mar 11, 2014 at 9:25 AM, Chris Riccomini >>>>>>>>>>>>>>> <[email protected]>wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hey Sonali, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I believe that you need to make sure that the HDFS jar is >>>>>>>>>>>>>>>> in your .tar.gz file, as you've said. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If that doesn't work, you might need to define this >>>>>>>>>>>>>>>> setting in core-site.xml on the machine you're running >>>>>>>>> run-job.sh on: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> <property> >>>>>>>>>>>>>>>> <name>fs.hdfs.impl</name> >>>>>>>>> <value>org.apache.hadoop.hdfs.DistributedFileSystem</value> >>>>>>>>>>>>>>>> <description>The FileSystem for hdfs: >>>>>>>>>>>>>>>> uris.</description> </property> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You might also need to configure your NodeManagers to >>>>>>>>>>>>>>>> have the HDFS >>>>>>>>>>>>>> file >>>>>>>>>>>>>>>> system impl as well. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I've never run Samza with HDFS, so I'm guessing here. >>>>>>>>>>>>>>>> Perhaps someone else on the list has been successful >>>>>>>>>>>>>>>> with >>>>>>>>> this? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> Chris >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 3/10/14 3:59 PM, "[email protected]" >>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I fixed this by starting from scratch with gradlew. But >>>>>>>>>>>>>>>>> now when I >>>>>>>>>>>>>> run >>>>>>>>>>>>>>>> my >>>>>>>>>>>>>>>>> job it throws this error: >>>>>>>>>>>>>>>>> Exception in thread "main" java.io.IOException: No >>>>>>>>>>>>>>>>> FileSystem for >>>>>>>>>>>>>>>> scheme: >>>>>>>>>>>>>>>>> hdfs >>>>>>>>>>>>>>>>> at >>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSy >>>>>>>>>>>>>>>>>ste >>>>>>>>>>>>>>>>>m. >>>>>>>>>>>>>>>>> jav >>>>>>>>>>>>>>>>> a: >>>>>>>>>>>>>>>>> 242 >>>>>>>>>>>>>>>>> 1) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem. >>>>>>>>>>>>>>> j >>>>>>>>>>>>>>> a >>>>>>>>>>>>>>> v >>>>>>>>>>>>>>> a:2 >>>>>>>>>>>>>>> 428 >>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java: >>>>>>>>>>>>>>>> 8 >>>>>>>>>>>>>>>> 8 >>>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.j >>>>>>>>>>>>ava >>>>>>>>>>>>: >>>>>>>>>>>>>>> 246 >>>>>>>>>>>>>>> 7) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java: >>>>>>>>>>>>>>>> 2 >>>>>>>>>>>>>>>> 4 >>>>>>>>>>>>>>>> 4 >>>>>>>>>>>>>>>> 9) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:287) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>>>>>>>> org.apache.samza.job.yarn.ClientHelper.submitApplication >>>>>>>>>>>>>>>>> (Cl >>>>>>>>>>>>>>>>> i >>>>>>>>>>>>>>>>> e >>>>>>>>>>>>>>>>> n >>>>>>>>>>>>>>>>> tHe >>>>>>>>>>>>>>>>> lpe >>>>>>>>>>>>>>>>> r. >>>>>>>>>>>>>>>>> sc >>>>>>>>>>>>>>>>> al >>>>>>>>>>>>>>>>> a:111) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>> org.apache.samza.job.yarn.YarnJob.submit(YarnJob.scala:55) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>> org.apache.samza.job.yarn.YarnJob.submit(YarnJob.scala:48) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>> org.apache.samza.job.JobRunner.run(JobRunner.scala:100) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>> org.apache.samza.job.JobRunner$.main(JobRunner.scala:75) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> org.apache.samza.job.JobRunner.main(JobRunner.scala) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I looked at the samza job tar.gz and it doesn't have a >>>>>>>>>>>>>>>>> Hadoop-hdfs >>>>>>>>>>>>>> jar. >>>>>>>>>>>>>>>>> Is that why I get this error? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Sonali >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> From: Parthasarathy, Sonali >>>>>>>>>>>>>>>>> Sent: Monday, March 10, 2014 11:25 AM >>>>>>>>>>>>>>>>> To: [email protected] >>>>>>>>>>>>>>>>> Subject: Failed to package using mvn >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> When I tried to do a mvn clean package of my hello-samza >>>>>>>>>>>>>>>>> project, I >>>>>>>>>>>>>> get >>>>>>>>>>>>>>>>> the following error. Has anyone seen this before? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [ERROR] Failed to execute goal on project >>>>>>>>>>>>>>>>>samza-wikipedia: >>>>>>>>>>>>>>>>> Could not resolve dependencies for project >>>>>>>>>>>> samza:samza-wikipedia:jar:0.7.0: >>>>>>>>>>>>>> Could >>>>>>>>>>>>>>>>> not find artifact >>>>>>>>>>>>>>>>> org.apache.samza:samza-kv_2.10:jar:0.7.0 >>>>>>>>>>>>>>>>> in apache-releases >>>>>>>>>>>>>> (https://repository.apache.org/content/groups/public) >>>>>>>>>>>>>>>> -> >>>>>>>>>>>>>>>>> [Help 1] >>>>>>>>>>>>>>>>> [ERROR] >>>>>>>>>>>>>>>>> [ERROR] To see the full stack trace of the errors, >>>>>>>>>>>>>>>>> re-run Maven with >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> -e switch. >>>>>>>>>>>>>>>>> [ERROR] Re-run Maven using the -X switch to enable full >>>>>>>>>>>>>>>>> debug >>>>>>>>>>>>>> logging. >>>>>>>>>>>>>>>>> [ERROR] >>>>>>>>>>>>>>>>> [ERROR] For more information about the errors and >>>>>>>>>>>>>>>>> possible >>>>>>>>>>>>>> solutions, >>>>>>>>>>>>>>>>> please read the following articles: >>>>>>>>>>>>>>>>> [ERROR] [Help 1] >>>>>>>>>>>>>> >>>>>>>>>>>>>> http://cwiki.apache.org/confluence/display/MAVEN/Dependency >>>>>>>>>>>>>> Re >>>>>>>>>>>>>> s >>>>>>>>>>>>>> o >>>>>>>>>>>>>> l >>>>>>>>>>>>>> uti >>>>>>>>>>>>>> onE >>>>>>>>>>>>>> xce >>>>>>>>>>>>>>>> p >>>>>>>>>>>>>>>>> tion >>>>>>>>>>>>>>>>> [ERROR] >>>>>>>>>>>>>>>>> [ERROR] After correcting the problems, you can resume >>>>>>>>>>>>>>>>> the build with >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> command >>>>>>>>>>>>>>>>> [ERROR] mvn <goals> -rf :samza-wikipedia >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Sonali >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Sonali Parthasarathy >>>>>>>>>>>>>>>>> R&D Developer, Data Insights Accenture Technology Labs >>>>>>>>>>>>>>>>> 703-341-7432 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ________________________________ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This message is for the designated recipient only and >>>>>>>>>>>>>>>>> may contain privileged, proprietary, or otherwise >>>>>>>>>>>>>>>>> confidential >>>>>>>>>>>> information. >>>>>>>>>>>>>>>>> If >>>>>>>>>>>>>> you >>>>>>>>>>>>>>>>> have received it in error, please notify the sender >>>>>>>>>>>>>>>>> immediately and delete the original. Any other use of >>>>>>>>>>>>>>>>> the e-mail by you is >>>>>>>>>>>>>> prohibited. >>>>>>>>>>>>>>>>> Where allowed by local law, electronic communications >>>>>>>>>>>>>>>>> with Accenture >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> its affiliates, including e-mail and instant messaging >>>>>>>>>>>>>>>>> (including content), may be scanned by our systems for >>>>>>>>>>>>>>>>> the purposes of >>>>>>>>>>>>>> information >>>>>>>>>>>>>>>>> security and assessment of internal compliance with >>>>>>>>>>>>>>>>> Accenture >>>>>>>>>>>>>> policy. >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ________________________________________________________ >>>>>>>>>>>>>>>>> __ >>>>>>>>>>>>>>>>> _ >>>>>>>>>>>>>>>>> _ >>>>>>>>>>>>>>>>> _ >>>>>>>>>>>>>>>>> ___ >>>>>>>>>>>>>>>>> ___ >>>>>>>>>>>>>>>>> ___ >>>>>>>>>>>>>>>>> __ >>>>>>>>>>>>>>>>> __ >>>>>>>>>>>>>>>>> ____________ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> www.accenture.com >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ________________________________ >>>>>>>>>>>> >>>>>>>>>>>> This message is for the designated recipient only and may >>>>>>>>>>>> contain privileged, proprietary, or otherwise confidential >>>>>>>>>>>> information. If you have received it in error, please notify >>>>>>>>>>>> the sender immediately and delete the original. Any other >>>>>>>>>>>> use of the e-mail by you is prohibited. Where allowed by >>>>>>>>>>>> local law, electronic communications with Accenture and its >>>>>>>>>>>> affiliates, including e-mail and instant messaging (including >>>>>>>>>>>> content), may be scanned by our systems for the purposes of >>>>>>>>>>>> information security and assessment of internal compliance >>>>>>>>>>>> with Accenture >>>>>>>>> policy. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _____________________________________________________________ >>>>>>>>>>>> ___ >>>>>>>>>>>> _ >>>>>>>>>>>> _ >>>>>>>>>>>> _ >>>>>>>>>>>> ___ >>>>>>>>>>>> ___ >>>>>>>>>>>> _____________ >>>>>>>>>>>> >>>>>>>>>>>> www.accenture.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ________________________________ >>>>>>>>>> >>>>>>>>>> This message is for the designated recipient only and may >>>>>>>>>> contain privileged, proprietary, or otherwise confidential >>>>>>>>>> information. If you have received it in error, please notify >>>>>>>>>> the sender immediately and delete the original. Any other use >>>>>>>>>> of the e-mail by you is >>>>>>>>> prohibited. >>>>>>>>>> Where allowed by local law, electronic communications with >>>>>>>>>> Accenture and its affiliates, including e-mail and instant >>>>>>>>>> messaging (including content), may be scanned by our systems >>>>>>>>>> for the purposes of information security and assessment of >>>>>>>>>> internal compliance with >>>>>>>>> Accenture policy. >>>>>>>>>> _______________________________________________________________ >>>>>>>>>> ___ >>>>>>>>>> _ >>>>>>>>>> _ >>>>>>>>>> _ >>>>>>>>>> _____ >>>>>>>>>> ____________ >>>>>>>>>> >>>>>>>>>> www.accenture.com >>>>>>>> >>>>>>>> ________________________________ >>>>>>>> >>>>>>>> This message is for the designated recipient only and may contain >>>>>>>>privileged, proprietary, or otherwise confidential information. If >>>>>>>>you have received it in error, please notify the sender >>>>>>>>immediately and delete the original. Any other use of the e-mail >>>>>>>>by you is prohibited. >>>>>>>> Where allowed by local law, electronic communications with >>>>>>>>Accenture and its affiliates, including e-mail and instant >>>>>>>>messaging (including content), may be scanned by our systems for >>>>>>>>the purposes of information security and assessment of internal >>>>>>>>compliance with Accenture policy. >>>>>>>> >>>>>>>>__________________________________________________________________ >>>>>>>>___ >>>>>>>> _ >>>>>>>> _ >>>>>>>> ___ >>>>>>>> ____________ >>>>>>>> >>>>>>>> www.accenture.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> ________________________________ >>>>>>> >>>>>>> This message is for the designated recipient only and may contain >>>>>>>privileged, proprietary, or otherwise confidential information. If >>>>>>>you have received it in error, please notify the sender >>>>>>>immediately and delete the original. Any other use of the e-mail >>>>>>>by you is prohibited. >>>>>>> Where allowed by local law, electronic communications with >>>>>>>Accenture and its affiliates, including e-mail and instant >>>>>>>messaging (including content), may be scanned by our systems for >>>>>>>the purposes of information security and assessment of internal >>>>>>>compliance with Accenture policy. >>>>>>> >>>>>>>___________________________________________________________________ >>>>>>>___ >>>>>>> _ >>>>>>> ___ >>>>>>> ____________ >>>>>>> >>>>>>> www.accenture.com >>>>>> >>>>>> >>>>>> >>>>>> ________________________________ >>>>>> >>>>>> This message is for the designated recipient only and may contain >>>>>>privileged, proprietary, or otherwise confidential information. If >>>>>>you have received it in error, please notify the sender immediately >>>>>>and delete the original. Any other use of the e-mail by you is >>>>>>prohibited. >>>>>> Where allowed by local law, electronic communications with >>>>>>Accenture and its affiliates, including e-mail and instant >>>>>>messaging (including content), may be scanned by our systems for >>>>>>the purposes of information security and assessment of internal >>>>>>compliance with Accenture policy. >>>>>> >>>>>>____________________________________________________________________ >>>>>>___ >>>>>> ___ >>>>>> ____________ >>>>>> >>>>>> www.accenture.com >>>>> >>>>> >>>>> >>>>> ________________________________ >>>>> >>>>> This message is for the designated recipient only and may contain >>>>>privileged, proprietary, or otherwise confidential information. If >>>>>you have received it in error, please notify the sender immediately >>>>>and delete the original. Any other use of the e-mail by you is >>>>>prohibited. >>>>> Where allowed by local law, electronic communications with Accenture >>>>>and its affiliates, including e-mail and instant messaging >>>>>(including content), may be scanned by our systems for the purposes >>>>>of information security and assessment of internal compliance with >>>>>Accenture policy. >>>>> >>>>>_____________________________________________________________________ >>>>>___ >>>>>__ >>>>> ____________ >>>>> >>>>> www.accenture.com >>>> >> >> >> >>________________________________ >> >>This message is for the designated recipient only and may contain >>privileged, proprietary, or otherwise confidential information. If you >>have received it in error, please notify the sender immediately and >>delete the original. Any other use of the e-mail by you is prohibited. >>Where allowed by local law, electronic communications with Accenture and >>its affiliates, including e-mail and instant messaging (including >>content), may be scanned by our systems for the purposes of information >>security and assessment of internal compliance with Accenture policy. >>_________________________________________________________________________ >>_ >>____________ >> >>www.accenture.com >> >
