MR output to a file instead of directory?
Hi all, The FileOutputFormat/FileOutputCommitter always treats an output path as a directory and write files under it, even if there is only one Reducer. Is there any way to configure an OutputFormat to write all data into a file? Thanks, James
Re: no log function for map/red in a cluster setup
Thank you very much. But that does not help. I did try to symbolic link one into my working directory "-files conf/log4j.proeprties#mylog4j.properties", and then put a specified configuration in jvm options mapred.child.java.opts -Dlog4j.configuration=mylog4j.properties This messed up the task log output in some degree, the "syslog" section completely has completely gone. Nevertheless, it does not work. It seems hadoop engine did some special log setting. What should I do? Thanks. Zhu, Guojun Modeling Sr Graduate 571-3824370 guojun_...@freddiemac.com Financial Engineering Freddie Mac Joey Echeverria 02/29/2012 10:45 PM Please respond to mapreduce-user@hadoop.apache.org To "mapreduce-user@hadoop.apache.org" cc "mapreduce-user@hadoop.apache.org" Subject Re: no log function for map/red in a cluster setup Try adding the log4j.properties file to he distributed cache, e.g.: hadoop jar job.jar -config conf -files conf/log4j.properties my.package.Class arg1 -Joey On Feb 29, 2012, at 16:15, GUOJUN Zhu wrote: What I found out is that the default conf/log4j.properties set root with INFO and indeed anything beyond INFO (hadoop's or my own codes') show up. However, I tried to put a new log4j.properties with lower threshold in the new conf directory and specify it with "--configure" option and it did not work (it did pick up other things such as mapreq-site.xml). Unfortunately, I am not the administrator and do not have the priviledge to modify the default log4j.properties. Do I have to ask the administrator to do it for me? Thanks. Zhu, Guojun Modeling Sr Graduate 571-3824370 guojun_...@freddiemac.com Financial Engineering Freddie Mac GUOJUN Zhu 02/27/2012 11:34 AM Please respond to mapreduce-user@hadoop.apache.org To "mapreduce-user@hadoop.apache.org" cc Subject no log function for map/red in a cluster setup Hi. We are testing hadoop. We are using hadoop (0.20.2-cdh3u3). I am using the cotomized conf directory with -"-config mypath". I modified the log4j.properties file in this path, adding " log4j.logger.com.mycompany=DEBUG". It works fine with our pseudo-one-node-cluster setup (1.00). But in the new cluster (with 32 data nodes/name node/secondary namenode/jobtracker/backup jobtracker), I can only see the log from hadoop (in the web interface, when I navigate all the way into the task node log), but no logs from my mapper/reducer (com.mycompany.***) show up. I can do System.out.println or System.err.println and see them in the same log file, but no logs from log4j show up. Is there any other configuration I missed? Thanks. Zhu, Guojun Modeling Sr Graduate 571-3824370 guojun_...@freddiemac.com Financial Engineering Freddie Mac
Re: yarn NoClassDefFoundError from LinuxContainerExecutor
It relates to: https://issues.apache.org/jira/browse/MAPREDUCE-3505 Thanks. On 03/01/2012 07:09 AM, Ioan Eugen Stan wrote: Hi Mingjie, I don't know about Yarn, but NoClassDefFoundError appear when you have a class that was present during compile time but no longer available during runtime. See a detailed explanation here [1]. Check that the classpath built in the container/node contains the classes from that error. Also check that you don't get another version on the classpath, one without the specified class. Hope this helps, [1] http://javarevisited.blogspot.com/2011/06/noclassdeffounderror-exception-in.html Pe 29.02.2012 23:07, Mingjie Lai a scris: Hi. I'm trying yarn + security but still cannot make a mapred example runing. Can anyone help me to take a look? My env: - 3-slave cluster on ec2. Centos 5.5 - nn, dn, rm, nm all started, with security enabled. - i saw java.lang.NoClassDefFoundError from LinuxContainerExecutor eror log: ./application_1330545370212_0004/container_1330545370212_0004_01_01/stderr - If i disable security, still saw this issue. Any hint? I followed the instructions from http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html Steps: I started a mapred sample from nn/rm: $ usr/lib/hadoop/bin/yarn --config ./conf jar share/hadoop/mapreduce/hadoop-mapreduce-examples-0.24.0-SNAPSHOT.jar randomwriter 10 10 Logs are from nn, nm, -- [yarn@ip-10-176-231-35 hadoop]$ /usr/lib/hadoop/bin/yarn --config ./conf jar share/hadoop/mapreduce/hadoop-mapreduce-examples-0.24.0-SNAPSHOT.jar randomwriter 10 10 Running 30 maps. Job started: Wed Feb 29 20:33:48 UTC 2012 12/02/29 20:33:48 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used 12/02/29 20:33:49 INFO mapreduce.JobSubmitter: number of splits:30 12/02/29 20:33:49 INFO mapred.ResourceMgrDelegate: Submitted application application_1330545370212_0005 to ResourceManager at ip-10-176-231-35.us-west-1.compute.internal/10.176.231.35:7090 12/02/29 20:33:49 INFO mapreduce.Job: The url to track the job: http://ip-10-176-231-35.us-west-1.compute.internal:7050/proxy/application_1330545370212_0005/ 12/02/29 20:33:49 INFO mapreduce.Job: Running job: job_1330545370212_0005 12/02/29 20:33:53 INFO mapreduce.Job: Job job_1330545370212_0005 running in uber mode : false 12/02/29 20:33:53 INFO mapreduce.Job: map 0% reduce 0% 12/02/29 20:33:53 INFO mapreduce.Job: Job job_1330545370212_0005 failed with state FAILED due to: Application application_1330545370212_0005 failed 1 times due to AM Container for appattempt_1330545370212_0005_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: at org.apache.hadoop.util.Shell.runCommand(Shell.java:261) at org.apache.hadoop.util.Shell.run(Shell.java:188) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:381) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:207) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:241) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:68) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) main : command provided 1 main : user is yarn .Failing this attempt.. Failing the application. 12/02/29 20:33:53 INFO mapreduce.Job: Counters: 0 Job ended: Wed Feb 29 20:33:53 UTC 2012 The job took 5 seconds. -- LinuxContainer error: [root@ip-10-176-203-45 yarn]# more ./application_1330545370212_0004/container_1330545370212_0004_01_01/stderr Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yar n/service/CompositeService at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) at java.lang.ClassLoader.defineClass(ClassLoader.java:615) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:14 1) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) Caused by: java.lang.ClassNotFoundException: org
Re: basic doubt on number of reduce tasks
Vamshi If you have set the number of reduce slots in a node to 5 and if you have 4 nodes, then your cluster can run a max of 5*4 = 20 reduce tasks at a time. If more reduce tasks are present those has to wait till reduce slots becomes available. In reducer the data locality is not considered,reducer tasks are triggered on nodes in random, if there are free slots available in there. There is no guarantee that all nodes would have same number of reducers running at a time. Mappers consider data locality but it is hard to determine that on a reducer as a reducer input would be the output from multiple mappers across cluster. Regards Bejoy.KS On Fri, Mar 2, 2012 at 3:39 PM, Vamshi Krishna wrote: > Hi all, > Consider in hadoop cluster having 4 nodes, and in every node the maximum > no.of reduce slots fixed at 5. When mapreduce deamons started, > > 1) Is there any restriction on no. of simultaneously running reduce tasks > on all nodes such as it should be same on all nodes? OR > > 2)Is it like this: A node where there is lot of data to be processed, on > that node higher number of reduce tasks will run than the node where less > amount of data present.That is, according to the size of data to be > processed on a particular node, proportionate number of reduce tasks will > be run on different nodes. > > please some body clarify this basic doubt .. which is correct? If none, > what is the actual process that takes place > > -- > *Regards* > * > Vamshi Krishna > * > >