MR output to a file instead of directory?

2012-03-02 Thread Jianhui Zhang
Hi all,

The FileOutputFormat/FileOutputCommitter always treats an output path
as a directory and write files under it, even if there is only one
Reducer. Is there any way to configure an OutputFormat to write all
data into a file?

Thanks,
James


Re: no log function for map/red in a cluster setup

2012-03-02 Thread GUOJUN Zhu
Thank you very much.  But that does not help.  I did try to symbolic link 
one into my working directory "-files 
conf/log4j.proeprties#mylog4j.properties", and then put a specified 
configuration in jvm options 

 
mapred.child.java.opts
 -Dlog4j.configuration=mylog4j.properties
  
This messed up the task log output in some degree, the "syslog" section 
completely has completely gone. Nevertheless,   it does not work.  It 
seems hadoop engine did some special log setting.  What should I do? 

Thanks. 

Zhu, Guojun
Modeling Sr Graduate
571-3824370
guojun_...@freddiemac.com
Financial Engineering
Freddie Mac



   Joey Echeverria  
   02/29/2012 10:45 PM
   Please respond to
mapreduce-user@hadoop.apache.org


To
"mapreduce-user@hadoop.apache.org" 
cc
"mapreduce-user@hadoop.apache.org" 
Subject
Re: no log function for map/red in a cluster setup






Try adding the log4j.properties file to he distributed cache, e.g.:

hadoop jar job.jar -config conf -files conf/log4j.properties 
my.package.Class arg1

-Joey



On Feb 29, 2012, at 16:15, GUOJUN Zhu  wrote:


What I found out is that the default conf/log4j.properties set root with 
INFO and indeed anything beyond INFO (hadoop's or my own codes') show up. 
However, I tried to put a new log4j.properties with lower threshold in the 
new conf directory and specify it with "--configure" option and it did not 
work (it did pick up other things such as mapreq-site.xml). Unfortunately, 
I am not the administrator and do not have the priviledge to modify the 
default log4j.properties.  Do I have to ask the administrator to do it for 
me?  Thanks. 

Zhu, Guojun
Modeling Sr Graduate
571-3824370
guojun_...@freddiemac.com
Financial Engineering
Freddie Mac 


   GUOJUN Zhu  
   02/27/2012 11:34 AM 


   Please respond to
mapreduce-user@hadoop.apache.org


To
"mapreduce-user@hadoop.apache.org"  
cc

Subject
no log function for map/red in a cluster setup









Hi.   

We are testing hadoop.  We are using hadoop (0.20.2-cdh3u3).  I am using 
the cotomized conf directory with -"-config mypath".  I modified the 
log4j.properties file in this path, adding "
log4j.logger.com.mycompany=DEBUG".   It works fine with our 
pseudo-one-node-cluster setup (1.00).  But in the new cluster (with 32 
data nodes/name node/secondary namenode/jobtracker/backup jobtracker), I 
can only see the log from hadoop (in the web interface, when I navigate 
all the way into the task node log), but no logs from my mapper/reducer 
(com.mycompany.***) show up.  I can do System.out.println or 
System.err.println and see them in the same log file,  but no logs from 
log4j show up.  Is there any other configuration I missed?  Thanks. 

Zhu, Guojun
Modeling Sr Graduate
571-3824370
guojun_...@freddiemac.com
Financial Engineering
Freddie Mac 


Re: yarn NoClassDefFoundError from LinuxContainerExecutor

2012-03-02 Thread Mingjie Lai


It relates to: https://issues.apache.org/jira/browse/MAPREDUCE-3505

Thanks.

On 03/01/2012 07:09 AM, Ioan Eugen Stan wrote:

Hi Mingjie,


I don't know about Yarn, but NoClassDefFoundError appear when you have a
class that was present during compile time but no longer available
during runtime. See a detailed explanation here [1].

Check that the classpath built in the container/node contains the
classes from that error. Also check that you don't get another version
on the classpath, one without the specified class.

Hope this helps,


[1]
http://javarevisited.blogspot.com/2011/06/noclassdeffounderror-exception-in.html



Pe 29.02.2012 23:07, Mingjie Lai a scris:


Hi.

I'm trying yarn + security but still cannot make a mapred example
runing. Can anyone help me to take a look?

My env:
- 3-slave cluster on ec2. Centos 5.5
- nn, dn, rm, nm all started, with security enabled.
- i saw java.lang.NoClassDefFoundError from LinuxContainerExecutor eror
log:
./application_1330545370212_0004/container_1330545370212_0004_01_01/stderr


- If i disable security, still saw this issue.

Any hint?

I followed the instructions from
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html



Steps:
I started a mapred sample from nn/rm:

$ usr/lib/hadoop/bin/yarn --config ./conf jar
share/hadoop/mapreduce/hadoop-mapreduce-examples-0.24.0-SNAPSHOT.jar
randomwriter 10 10

Logs are from nn, nm,
--
[yarn@ip-10-176-231-35 hadoop]$ /usr/lib/hadoop/bin/yarn --config ./conf
jar share/hadoop/mapreduce/hadoop-mapreduce-examples-0.24.0-SNAPSHOT.jar
randomwriter 10 10
Running 30 maps.
Job started: Wed Feb 29 20:33:48 UTC 2012
12/02/29 20:33:48 WARN conf.Configuration:
mapred.used.genericoptionsparser is deprecated. Instead, use
mapreduce.client.genericoptionsparser.used
12/02/29 20:33:49 INFO mapreduce.JobSubmitter: number of splits:30
12/02/29 20:33:49 INFO mapred.ResourceMgrDelegate: Submitted application
application_1330545370212_0005 to ResourceManager at
ip-10-176-231-35.us-west-1.compute.internal/10.176.231.35:7090
12/02/29 20:33:49 INFO mapreduce.Job: The url to track the job:
http://ip-10-176-231-35.us-west-1.compute.internal:7050/proxy/application_1330545370212_0005/


12/02/29 20:33:49 INFO mapreduce.Job: Running job: job_1330545370212_0005
12/02/29 20:33:53 INFO mapreduce.Job: Job job_1330545370212_0005 running
in uber mode : false
12/02/29 20:33:53 INFO mapreduce.Job: map 0% reduce 0%
12/02/29 20:33:53 INFO mapreduce.Job: Job job_1330545370212_0005 failed
with state FAILED due to: Application application_1330545370212_0005
failed 1 times due to AM Container for
appattempt_1330545370212_0005_01 exited with exitCode: 1 due to:
Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:261)
at org.apache.hadoop.util.Shell.run(Shell.java:188)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:381)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:207)


at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:241)


at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:68)


at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)


at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)


at java.lang.Thread.run(Thread.java:662)

main : command provided 1
main : user is yarn

.Failing this attempt.. Failing the application.
12/02/29 20:33:53 INFO mapreduce.Job: Counters: 0
Job ended: Wed Feb 29 20:33:53 UTC 2012
The job took 5 seconds.
--

LinuxContainer error:

[root@ip-10-176-203-45 yarn]# more
./application_1330545370212_0004/container_1330545370212_0004_01_01/stderr


Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/hadoop/yar
n/service/CompositeService
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:14
1)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Caused by: java.lang.ClassNotFoundException:
org

Re: basic doubt on number of reduce tasks

2012-03-02 Thread Bejoy Ks
Vamshi
If you have set the number of reduce slots in a node to 5 and if
you have 4 nodes, then your cluster can run a max of 5*4 = 20 reduce tasks
at a time. If more reduce tasks are present those has to wait till
reduce slots becomes available.
   In reducer the data locality is not considered,reducer tasks are
triggered on nodes in random, if there are free slots available in there.
There is no guarantee that all nodes would have same number of reducers
running at a time. Mappers consider data locality but it is hard to
determine that on a reducer as a reducer input would be the output
from multiple mappers across cluster.

Regards
Bejoy.KS

On Fri, Mar 2, 2012 at 3:39 PM, Vamshi Krishna  wrote:

> Hi all,
> Consider in hadoop cluster having 4 nodes, and in every node the maximum
> no.of reduce slots fixed at 5. When mapreduce deamons started,
>
> 1) Is there any restriction on no. of simultaneously running reduce tasks
> on all nodes such as it should be same on all nodes? OR
>
> 2)Is it like this: A node where there is lot of data to be processed, on
> that node higher number of reduce tasks will run than the node where less
> amount of data present.That is, according to the size of data to be
> processed on a particular node, proportionate number of reduce tasks will
> be run on different nodes.
>
> please some body clarify this basic doubt .. which is correct? If none,
> what is the actual process that takes place
>
> --
> *Regards*
> *
> Vamshi Krishna
> *
>
>