Build hadoop from src and running it?
Hi, I checked out hadoop common project 2.2.0 from the svn and I built it using maven. Now, I don't find any conf folder or any cofiuration files to set to start running hadoop and using its hdfs? How can I do so? -- Best Regards, Karim Ahmed Awara -- -- This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
which is better : form reducer or Driver
I am emiting 'A' value and 'B' value from reducer. I need to do further calculations also. which is a better way? 1. Do all remamining computations within reducer , after emiting or 2.Do remaining computation In driver: read A and B value from part file and do further computations. Pls suggest a better way. -- *Thanks Regards* Unmesha Sreeveni U.B *Junior Developer* *Amrita Center For Cyber Security* * Amritapuri.www.amrita.edu/cyber/ http://www.amrita.edu/cyber/*
Re: C++ example for hadoop-2.2.0
Hi, I have solved the problem! Have downloaded the source, compiled it on the machine and now can successfully link to the library. Thank you everyone for you quick responses. Regards.. Salman. Salman Toor, PhD salman.t...@it.uu.se On Nov 4, 2013, at 3:21 PM, Andre Kelpe wrote: No, because I was trying to set up a cluster automatically with the tarballs from apache.org. - André On Mon, Nov 4, 2013 at 3:05 PM, Salman Toor salman.t...@it.uu.se wrote: Hi, Did you tried to compile with source? /Salman. Salman Toor, PhD salman.t...@it.uu.se On Nov 4, 2013, at 2:55 PM, Andre Kelpe wrote: I reported the 32bit/64bit problem a few weeks ago. There hasn't been much activity around it though: https://issues.apache.org/jira/browse/HADOOP-9911 - André On Mon, Nov 4, 2013 at 2:20 PM, Salman Toor salman.t...@it.uu.se wrote: Hi, Ok so 2.x is not a new version its another branch. Good to know! Actually 32bit will be difficult as the code I got have already some dependencies on 64 bit. Otherwise I will continue with 1.x version. Can you suggest some version with 1.x series which is stable and work on the cluster environment? especially with C++ ... Regards.. Salman. Salman Toor, PhD salman.t...@it.uu.se On Nov 4, 2013, at 1:54 PM, Amr Shahin wrote: Well, the 2 series isn't exactly the next version. It's a continuation of branch .2. Also, the error message from the gcc indicates that the library you're trying to link to isn't compatible which made me suspicious. check the documentation to see if hadoop has 64 libraries, or otherwise compile against the 32 ones On Mon, Nov 4, 2013 at 4:51 PM, Salman Toor salman.t...@it.uu.se wrote: Hi, Thanks for your answer! But are you sure about it? Actually Hadoop version 1.2 have both 32 and 64 bit libraries so I believe the the next version should have both... But I am not sure just a random guess :-( Regards.. Salman. Salman Toor, PhD salman.t...@it.uu.se On Nov 4, 2013, at 1:38 PM, Amr Shahin wrote: I believe hadoop isn't compatible with 64 architecture. Try installing the 32 libraries and compile against them. This error (skipping incompatible /home/sztoor/hadoop-2.2.0/lib/native/libhadooppipes.a when searching -lhadooppipes) indicates so On Mon, Nov 4, 2013 at 2:44 PM, Salman Toor salman.t...@it.uu.se wrote: Hi, Can someone give a pointer? Thanks in advance. Regards.. Salman. Salman Toor, PhD salman.t...@it.uu.se On Nov 3, 2013, at 11:31 PM, Salman Toor wrote: Hi, I am quite new to the Hadoop world, previously was running hadoop-1.2.0 stable version on my small cluster and encountered some strange problems like the local path to the mapper file didn't copy to the hdfs It works fine on the single node setup but on multiple node simple word-count python example didn't work... I read on the blog that it might be the problem in the version I am using. So I thought to change the version and downloaded the Hadoop 2.2.0. This version has yarn together with many new features that I hope I will learn in the future. Now simple wordcount example works without any problem on the multi-node setup. I am using simple python example. Now I would like to compile my C++ code. Since the directory structure together with other things have been changed. I have started to get the following error: /urs/bin/ld: skipping incompatible /home/sztoor/hadoop-2.2.0/lib/native/libhadooputils.a when searching -lhadooputils cannot find -lhadooputils /urs/bin/ld: skipping incompatible /home/sztoor/hadoop-2.2.0/lib/native/libhadooppipes.a when searching -lhadooppipes cannot find -lhadooppipes -- I have managed to run the c++ example successfully with the 1.2.0 version on single node setup. I am having 64bit Ubuntu machine. previously I was using Linux-amd64-64 Now in new version, lib and include directories are in the hadoop-2.2.0 directory. No build.xml is available... Can someone please give me an example of a makefile based on the version 2.2.0? Or suggest me which version I should go for? Or if there are some prerequisites that I should do before compiling my code? Thanks in advance. Regards.. Salman. Salman Toor, PhD salman.t...@it.uu.se -- André Kelpe an...@concurrentinc.com http://concurrentinc.com -- André Kelpe an...@concurrentinc.com http://concurrentinc.com
Re: which is better : form reducer or Driver
If you have multiple reducers you are doing it in parallel while in the driver it is surely single threaded so my bet would be on the reducers. Chris On 11/5/2013 6:15 AM, unmesha sreeveni wrote: I am emiting 'A' value and 'B' value from reducer. I need to do further calculations also. which is a better way? 1. Do all remamining computations within reducer , after emiting or 2.Do remaining computation In driver: read A and B value from part file and do further computations. Pls suggest a better way. -- /Thanks Regards/ / / Unmesha Sreeveni U.B/ / /Junior Developer / /Amrita Center For Cyber Security / / Amritapuri. www.amrita.edu/cyber/ http://www.amrita.edu/cyber/ /
Re: Error while running Hadoop Source Code
Hi, Can anyone kindly assist on this ? Regards, Indrashish On Mon, 04 Nov 2013 10:23:23 -0500, Basu,Indrashish wrote: Hi All, Any update on the below post ? I came across some old post regarding the same issue. It explains the solution as The *nopipe* example needs more documentation. It assumes that it is run with the InputFormat from src/test/org/apache/*hadoop*/mapred/*pipes*/ *WordCountInputFormat*.java, which has a very specific input split format. By running with a TextInputFormat, it will send binary bytes as the input split and won't work right. The *nopipe* example should probably be recoded *to* use libhdfs *too*, but that is more complicated *to* get running as a unit test. Also note that since the C++ example is using local file reads, it will only work on a cluster if you have nfs or something working across the cluster. I would need some more light on the above explanation , so if anyone could elaborate a bit about the same as what needs to be done exactly. To mention, I am trying to run a sample KMeans algorithm on a GPU using Hadoop. Thanks in advance. Regards, Indrashish. On Thu, 31 Oct 2013 20:00:10 -0400, Basu,Indrashish wrote: Hi, I am trying to run a sample Hadoop GPU source code (kmeans algorithm) on an ARM processor and getting the below error. Can anyone please throw some light on this ? rmr: cannot remove output: No such file or directory. 13/10/31 13:43:12 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 13/10/31 13:43:12 INFO mapred.FileInputFormat: Total input paths to process : 1 13/10/31 13:43:13 INFO mapred.JobClient: Running job: job_201310311320_0001 13/10/31 13:43:14 INFO mapred.JobClient: map 0% reduce 0% 13/10/31 13:43:39 INFO mapred.JobClient: Task Id : attempt_201310311320_0001_m_00_0, Status : FAILED java.io.IOException: pipe child exception at org.apache.hadoop.mapred.pipes.Application.abort(Application.java:191) at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:103) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:363) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) at java.net.SocketOutputStream.write(SocketOutputStream.java:159) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) at java.io.DataOutputStream.write(DataOutputStream.java:107) at org.apache.hadoop.mapred.pipes.BinaryProtocol.writeObject(BinaryProtocol.java:333) at org.apache.hadoop.mapred.pipes.BinaryProtocol.mapItem(BinaryProtocol.java:286) at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:92) ... 3 more attempt_201310311320_0001_m_00_0: cmd: [bash, -c, exec '/app/hadoop/tmp/mapred/local/taskTracker/archive/10.227.56.195bin/cpu-kmeans2D/cpu-kmeans2D' '0' /dev/null 1 /usr/local/hadoop/hadoop-gpu-0.20.1/bin/../logs/userlogs/attempt_201310311320_0001_m_00_0/stdout 2 /usr/local/hadoop/hadoop-gpu-0.20.1/bin/../logs/userlogs/ Regards, -- Indrashish Basu Graduate Student Department of Electrical and Computer Engineering University of Florida
only one map or reduce job per time on one node
hi, I have a cluster of 7 nodes. Every node has 2 maps-lots and 1 reduce slot. Is it possible to force the jobtracker executing only 2 map jobs or 1 reduce job per time? I have found this configuration option: mapred.reduce.slowstart.completed.maps. I think this will do exactly what I want If I set it to 1.0 if there is only one mapreduce job per time. But whenn there are 2 jobs I think it doesn't work because if the second job is finished with the map part it will maybe execute a reduce job on node where still the first job is running. Or am I wrong? Is there a way to allow executing reduce jobs only if there is no map reduce job on this node? kind regards
Re: only one map or reduce job per time on one node
Why do you want to do this? +Vinod On Nov 5, 2013, at 9:17 AM, John wrote: Is it possible to force the jobtracker executing only 2 map jobs or 1 reduce job per time? -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Error while running Hadoop Source Code
It seems like your pipes mapper is exiting before consuming all the input. Did you check the task-logs on the web UI? Thanks, +Vinod On Nov 5, 2013, at 7:25 AM, Basu,Indrashish wrote: Hi, Can anyone kindly assist on this ? Regards, Indrashish On Mon, 04 Nov 2013 10:23:23 -0500, Basu,Indrashish wrote: Hi All, Any update on the below post ? I came across some old post regarding the same issue. It explains the solution as The *nopipe* example needs more documentation. It assumes that it is run with the InputFormat from src/test/org/apache/*hadoop*/mapred/*pipes*/ *WordCountInputFormat*.java, which has a very specific input split format. By running with a TextInputFormat, it will send binary bytes as the input split and won't work right. The *nopipe* example should probably be recoded *to* use libhdfs *too*, but that is more complicated *to* get running as a unit test. Also note that since the C++ example is using local file reads, it will only work on a cluster if you have nfs or something working across the cluster. I would need some more light on the above explanation , so if anyone could elaborate a bit about the same as what needs to be done exactly. To mention, I am trying to run a sample KMeans algorithm on a GPU using Hadoop. Thanks in advance. Regards, Indrashish. On Thu, 31 Oct 2013 20:00:10 -0400, Basu,Indrashish wrote: Hi, I am trying to run a sample Hadoop GPU source code (kmeans algorithm) on an ARM processor and getting the below error. Can anyone please throw some light on this ? rmr: cannot remove output: No such file or directory. 13/10/31 13:43:12 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 13/10/31 13:43:12 INFO mapred.FileInputFormat: Total input paths to process : 1 13/10/31 13:43:13 INFO mapred.JobClient: Running job: job_201310311320_0001 13/10/31 13:43:14 INFO mapred.JobClient: map 0% reduce 0% 13/10/31 13:43:39 INFO mapred.JobClient: Task Id : attempt_201310311320_0001_m_00_0, Status : FAILED java.io.IOException: pipe child exception at org.apache.hadoop.mapred.pipes.Application.abort(Application.java:191) at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:103) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:363) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) at java.net.SocketOutputStream.write(SocketOutputStream.java:159) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) at java.io.DataOutputStream.write(DataOutputStream.java:107) at org.apache.hadoop.mapred.pipes.BinaryProtocol.writeObject(BinaryProtocol.java:333) at org.apache.hadoop.mapred.pipes.BinaryProtocol.mapItem(BinaryProtocol.java:286) at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:92) ... 3 more attempt_201310311320_0001_m_00_0: cmd: [bash, -c, exec '/app/hadoop/tmp/mapred/local/taskTracker/archive/10.227.56.195bin/cpu-kmeans2D/cpu-kmeans2D' '0' /dev/null 1 /usr/local/hadoop/hadoop-gpu-0.20.1/bin/../logs/userlogs/attempt_201310311320_0001_m_00_0/stdout 2 /usr/local/hadoop/hadoop-gpu-0.20.1/bin/../logs/userlogs/ Regards, -- Indrashish Basu Graduate Student Department of Electrical and Computer Engineering University of Florida -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
HDP 2.0 GA?
I noticed that HDP 2.0 is available for download here: http://hortonworks.com/products/hdp-2/?b=1#install Is this the final GA version that tracks Apache Hadoop 2.2? Sorry I am just a little confused by the different numbering schemes. Thanks John
RE: HDP 2.0 GA?
HDP 2.0.6 is the GA version that matches Apache Hadoop 2.2. From: John Lilley john.lil...@redpoint.net Sent: Tuesday, November 05, 2013 12:34 PM To: user@hadoop.apache.org Subject: HDP 2.0 GA? I noticed that HDP 2.0 is available for download here: http://hortonworks.com/products/hdp-2/?b=1#install Is this the final GA version that tracks Apache Hadoop 2.2? Sorry I am just a little confused by the different numbering schemes. Thanks John
Re: only one map or reduce job per time on one node
Because my node swaps the memory if the 2 map slots + 1 reduce is occupied with my job. Sure I can minimize the max memory for the map/reduce process. I tried this already, but I got a out of memory exception if set the max heap size for the map/reduce process to low for my mr job. kind regards 2013/11/5 Vinod Kumar Vavilapalli vino...@hortonworks.com Why do you want to do this? +Vinod On Nov 5, 2013, at 9:17 AM, John wrote: Is it possible to force the jobtracker executing only 2 map jobs or 1 reduce job per time? CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: HDP 2.0 GA?
Please send the questions related to a vendor specific distro to vendor mailing list. In this case - http://hortonworks.com/community/forums/. On Tue, Nov 5, 2013 at 10:49 AM, Jim Falgout jim.falg...@actian.com wrote: HDP 2.0.6 is the GA version that matches Apache Hadoop 2.2. -- *From:* John Lilley john.lil...@redpoint.net *Sent:* Tuesday, November 05, 2013 12:34 PM *To:* user@hadoop.apache.org *Subject:* HDP 2.0 GA? I noticed that HDP 2.0 is available for download here: http://hortonworks.com/products/hdp-2/?b=1#install Is this the final “GA” version that tracks Apache Hadoop 2.2? Sorry I am just a little confused by the different numbering schemes. *Thanks* *John* -- http://hortonworks.com/download/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
AUTO: Jose Luis Mujeriego Gomez is out of the office. (returning 06/11/2013)
I am out of the office until 06/11/2013. I will be out of the office with limited email access. I will check email regularly but no guarantees I will be able to answer you back promptly. For any urgent matter please contact Dittmar Haegele (dittmar.haeg...@de.ibm.com) or Tadhg Murphy (murp...@ie.ibm.com) I will respond to your message when I return. Note: This is an automated response to your message Re: Build hadoop from src and running it? sent on 05/11/2013 17:32:06. This is the only notification you will receive while this person is away.
what different between infoserver and streaming server in datanode?
hi,all: i am reading the source code about datanode starting,when datanode start ,it will start streaming server and info server,i do not know what difference between the two server
Re: which is better : form reducer or Driver
i am dealing with multiple mappers and 1 reducer. so which is best? On Tue, Nov 5, 2013 at 6:28 PM, Chris Mawata chris.maw...@gmail.com wrote: If you have multiple reducers you are doing it in parallel while in the driver it is surely single threaded so my bet would be on the reducers. Chris On 11/5/2013 6:15 AM, unmesha sreeveni wrote: I am emiting 'A' value and 'B' value from reducer. I need to do further calculations also. which is a better way? 1. Do all remamining computations within reducer , after emiting or 2.Do remaining computation In driver: read A and B value from part file and do further computations. Pls suggest a better way. -- *Thanks Regards* Unmesha Sreeveni U.B *Junior Developer * *Amrita Center For Cyber Security * * Amritapuri. www.amrita.edu/cyber/ http://www.amrita.edu/cyber/ * -- *Thanks Regards* Unmesha Sreeveni U.B *Junior Developer* *Amrita Center For Cyber Security* * Amritapuri.www.amrita.edu/cyber/ http://www.amrita.edu/cyber/*
how to read the xml tags data
hi all, how to write the mapreduce code to read the xml files tag info. thanks, mallik.
Re: how to read the xml tags data
Hi Mallik Arjun, you can use XmlInputFormat class which is provided by Apache mohut and not provided by Hadoop. Here is the link for the code. https://github.com/apache/mahout/blob/ad84344e4055b1e6adff5779339a33fa29e1265d/examples/src/main/java/org/apache/mahout/classifier/bayes/XmlInputFormat.java And here is another link explained about using of xml input file. http://xmlandhadoop.blogspot.com/ Thanks, Murali AdiReddy On Wed, Nov 6, 2013 at 12:38 PM, mallik arjun mallik.cl...@gmail.comwrote: hi all, how to write the mapreduce code to read the xml files tag info. thanks, mallik.