Re: [EC2] Could not reserve enough space for object heap

2011-12-12 Thread Marco Didonna
Hi, thanks for your answer.
I am indeed using a 64bit jvm on all nodes. In fact I get the
following answer when executing java -version

java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)

And I do have a looot of free RAM:

   total   used   free sharedbuffers cached
Mem:  7700930   6769  0 14462
-/+ buffers/cache:453   7246
Swap:0  0  0


And still I cannot make use of such amount of ram. Again I tried to
set mapred.child.java.opts to -Xmx1g but the map task got only the
default  amount of heap space (I detect this using the ps command).
Only If i use mapred.{map|reduce}.child.java.opts=-Xmx1g this setting
affects the job execution and I end up with the error we're trying to
debug.
The other non-default setting can be seen from the attached whirr
configuration file (previous message).

Thanks again,
Marco

On 11 December 2011 21:37, Harsh J  wrote:
> Marco,
>
> Are you using a 64-bit JVM on your nodes or a 32-bit one?
>
> Sun JRE should say something like (for hadoop -version):
> "Java HotSpot(TM) 64-Bit Server VM (build 20.4-b02-402, mixed mode)"
>
> If you are, could you post what 'free' says on your slave nodes?
>
> On Sun, Dec 11, 2011 at 11:29 PM, Marco Didonna  wrote:
>> Hello everyone,
>> I'm running a small toy cluster (3 nodes) on EC2 configured as follows:
>>
>> * one node as JT+NN
>> * two nodes as DN+TT
>>
>> I use whirr to build such cluster on demand (config file here
>> http://pastebin.com/JXHYvMNb). Since my jobs are memory intensive I'd
>> like to exploit the 8GB of ram the m1.large instance offers. Thus I
>> added mapred.map.child.java.opts=-Xmx1g and
>> mapred.reduce.child.java.opts=-Xmx1g (adding just
>> hadoop-mapreduce.mapred.child.java.opts=-Xmx1g produced no effect
>> since the map tasks were allocated the default 200MB). The problem is
>> that with these settings I cannot have any job running because I
>> always get
>>
>> 1/12/11 18:14:24 INFO mapred.JobClient: Running job: job_201112111644_0002
>> 11/12/11 18:14:25 INFO mapred.JobClient:  map 0% reduce 0%
>> 11/12/11 18:14:28 INFO mapred.JobClient: Task Id :
>> attempt_201112111644_0002_m_04_0, Status : FAILED
>> java.lang.Throwable: Child Error
>>        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:242)
>> Caused by: java.io.IOException: Task process exit with nonzero status of 1.
>>        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:229)
>>
>> 11/12/11 18:14:28 WARN mapred.JobClient: Error reading task
>> outputip-10-87-1-170.ec2.internal
>> 11/12/11 18:14:28 WARN mapred.JobClient: Error reading task
>> outputip-10-87-1-170.ec2.internal
>> 11/12/11 18:14:30 INFO mapred.JobClient: Task Id :
>> attempt_201112111644_0002_r_01_0, Status : FAILED
>> java.lang.Throwable: Child Error
>>        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:242)
>> Caused by: java.io.IOException: Task process exit with nonzero status of 1.
>>        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:229)
>>
>> And the only log I can get from the job_201112111644_0002 is the
>> stdout and the stderr whose combined output is
>>
>> Could not create the Java virtual machine.
>> Error occurred during initialization of VM
>> Could not reserve enough space for object heap
>>
>> I really cannot understand why the jvm cannot allocate enough space:
>> there's plenty of ram. I also tried to reduce the number of map slots
>> to two: nothing changed. I'm out of ideas. I hope you can shed some
>> light :)
>>
>> FYI I use cloudera distribution for hadoop, latest stable release available.
>>
>> Thanks for your attention.
>>
>> MD
>
>
>
> --
> Harsh J


Cannot start secure cluster without privileged resources.

2011-12-12 Thread sri ram
HI,
   I TRIED INSTALLING HADOOP SECURE MODE IN 0.23 VERSION.I STUCK UP
WITH THE ERROR OF

java.lang.RuntimeException: Cannot start secure cluster without privileged
resources.
at
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1487)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:457)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2263)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2196)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2219)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2367)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2384)
2011-12-12 13:24:04,419 INFO  datanode.DataNode (StringUtils.java:run(605))
- SHUTDOWN_MSG:

I tried this post
http://www.mail-archive.com/common-user@hadoop.apache.org/msg13679.html
But fail to configure.


hadoop 0.23 secure mode error

2011-12-12 Thread sri ram
Hi,
  I am trying to form a hadoop cluster of 0.23 version in secure
mode.
   While starting nodemanager i get the following error
2011-12-12 15:37:26,874 INFO  ipc.HadoopYarnRPC
(HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc
proxy for protocol interface org.apac$
2011-12-12 15:37:26,953 INFO  nodemanager.NodeStatusUpdaterImpl
(NodeStatusUpdaterImpl.java:registerWithRM(155)) - Connected to
ResourceManager at master:80$
2011-12-12 15:37:38,784 WARN  ipc.Client (Client.java:run(526)) - Couldn't
setup connection for nm/ad...@master.example.com to rm/
ad...@master.example.com
2011-12-12 15:37:38,787 ERROR service.CompositeService
(CompositeService.java:start(72)) - Error starting services
org.apache.hadoop.yarn.server.nodemanager$
org.apache.avro.AvroRuntimeException:
java.lang.reflect.UndeclaredThrowableException
at
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:132)
at
org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:163)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:231)
Caused by: java.lang.reflect.UndeclaredThrowableException
at
org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
at
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:161)
at
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:128)
... 3 more
Caused by: com.google.protobuf.ServiceException: java.io.IOException:
Failed on local exception: java.io.IOException: Couldn't setup connection
for nm/admin$
at
org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
at $Proxy14.registerNodeManager(Unknown Source)
at
org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
... 5 more
Caused by: java.io.IOException: Failed on local exception:
java.io.IOException: Couldn't setup connection for nm/
ad...@master.example.com to rm/admin@MASTER$
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
at org.apache.hadoop.ipc.Client.call(Client.java:1089)


Any help is appreciated


Re: Re: About slots of tasktracker and munber of map taskers

2011-12-12 Thread Bejoy Ks
Hi Tan
Adding on to Harsh's response.

*Map Reduce Slots*
   It is maximum number of map and reduce tasks that can run
concurrently on your cluster/nodes. Say if you have a 10 node cluster(10
data nodes), each node would be assigned a specific number of map and
reduce tasks it can handle concurrently. It needn't be same for all nodes,
as per the node's hardware capacity it can vary. Considering the
hardware(cpu, memory, ...) of each node the admin assigns these values
accordingly so that the box can handle the resource requirements
gracefully. If you overload these values(assigning more slots), ie you are
asking the box to run more number of simultaneous tasks than it can handle
and it results in memory swap, OOM, CPU cycle unavailability etc and in
turn you end up in having an inefficient cluster encountering large number
of task failures. Here assuming all machines are of same capacity  if one
machine has 8 map and 2 reduce slots then the total number of map task
capacity of your cluster is 8*10=80 maps and 2*10=20 reducers, which means
at a time your cluster can run only 80 map tasks and 20 reduce tasks. So
the total number of map slots is 80 and reduce slots is 20 for your
cluster.

*Map Reduce Tasks*
 It refers to the actual tasks spawn from your map reduce jobs. Say
at a time in my above a cluster I'm firing two jobs, one after other. The
first job spawns 60 mappers and the second one spawns 40 mappers. As soon
as the first job is spawned the 60 slots out of 80 would be occupied, what
is left in cluster is 20 slots. When I trigger my second job it has 40 map
tasks but only 20 slots are available in cluster, so 20 map tasks would be
spawned and the rest 20 has to be in queue, once the slots gets free these
tasks would be able to execute.

  In short the map reduce slots are set by admin based on hardware
on a per node basis. It is not set at individual task level. The developer
need not have to worry on these parameter at his job level. The map reduce
developer can develop his application, based on input splits and Input
Formats it fires maps and reduce tasks. The number of tasks would vary as
per your inputs and jobs. Based on the availability of slots in
cluster(assigned by admin) (and factors like data/rack locality) these
tasks are executed on cluster.

Coming to your question,
As an administrator, I can set the max number of maps/reduces run on a
datanode,
then what I set the number of slots for?

max number of maps/reduces that can run on a datanode at the same time is
exactly what you call map reduce slots specified for that data node.


Hope it is clarifies.

Regards
Bejoy.K.S


2011/12/12 Tan Jun 

> **
> Harsh,
> Sorry for my poor English.
> There is one more question.
> As an administrator, I can set the max number of maps/reduces run on a
> datanode,
> then what I set the number of slots for?
> What's the differences between these attributes?
> In my opinion ,the number of  slot depends on hardware while maps/reduces
> on software.
> Assuming that only one job is running, especially for benchmarking case PI
> computing.
> Thanks!
>
> --
> Tan Jun
>
>  *From:* Harsh J 
> *Date:* 2011-12-12 13:33
> *To:* mapreduce-user ; 
> tanjun_2525
> *Subject:* Re: Re: About slots of tasktracker and munber of map taskers
>  Tan,
>
> As an admin, I can even choose to set configuration to even 100 slots
> on a 4-core node, if I feel like burning the box. There is no hardware
> auto-detection, and the slot limit is entirely controlled by the
> mapred-site.xml for that TaskTracker.
>
> The book merely tries to tell that you need to set these maximum slot
> settings based on your hardware knowledge on each node -- TaskTrackers
> do nothing of that sort on their own.
>
> There is some CPU/Memory considerations taken into account by a
> variety of non-default Schedulers in JobTracker, but your slot limits
> per tasktracker is entirely controlled by configuration.
>
> 2011/12/12 Tan Jun :
> > Hi Harsh,
>
> > Now I know the number of maps and reduces run simultaneously is set by the
> > administrator in mapred-site.xml with default value 2.
> > But I cant get the point about number of slots.
> > For my understanding by now,
> > the number of?slots is decides by hardware that administrator cannot
> > change.
> > Is that wright?
> >
> > 
> > Tan Jun
> >
> > From:�Harsh J
> > Date:?011-12-12?2:22
> > To:�mapreduce-user
> > Subject:�Re: About slots of tasktracker and munber of map taskers
> > Hi Tan,
> >
> > On 12-Dec-2011, at 8:48 AM, Tan Jun wrote:
> >
> > Hi,
> > I dont really understand the meaning of the sentences in "The Definitive
> > Guide"(page 155):
> >
>
> > Tasktrackers�have�a�fixed�number�of�slots�for�map�tasks�and�for�reduce�tasks:�for�example,
> > a�tasktracker�may�be�able�to�run�two�map�tasks�and�two�reduce�tasks�simultaneously.
> > (The�precise�number�depends�on�the�number�of�cores�and�the�amount�of
> > memory

Re: hadoop 0.23 secure mode error

2011-12-12 Thread Robert Evans
It looks like you do not have nm/ad...@master.example.com configured in your 
kerberos setup.  I wander how much traffic example.com gets on a daily basis.

--Bobby Evans

On 12/12/11 4:15 AM, "sri ram"  wrote:

Hi,
  I am trying to form a hadoop cluster of 0.23 version in secure mode.
   While starting nodemanager i get the following error
2011-12-12 15:37:26,874 INFO  ipc.HadoopYarnRPC 
(HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc proxy 
for protocol interface org.apac$
2011-12-12 15:37:26,953 INFO  nodemanager.NodeStatusUpdaterImpl 
(NodeStatusUpdaterImpl.java:registerWithRM(155)) - Connected to ResourceManager 
at master:80$
2011-12-12 15:37:38,784 WARN  ipc.Client (Client.java:run(526)) - Couldn't 
setup connection for nm/ad...@master.example.com to rm/ad...@master.example.com
2011-12-12 15:37:38,787 ERROR service.CompositeService 
(CompositeService.java:start(72)) - Error starting services 
org.apache.hadoop.yarn.server.nodemanager$
org.apache.avro.AvroRuntimeException: 
java.lang.reflect.UndeclaredThrowableException
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:132)
at 
org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:163)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:231)
Caused by: java.lang.reflect.UndeclaredThrowableException
at 
org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:161)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:128)
... 3 more
Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on 
local exception: java.io.IOException: Couldn't setup connection for nm/admin$
at 
org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
at $Proxy14.registerNodeManager(Unknown Source)
at 
org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
... 5 more
Caused by: java.io.IOException: Failed on local exception: java.io.IOException: 
Couldn't setup connection for nm/ad...@master.example.com to rm/admin@MASTER$
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
at org.apache.hadoop.ipc.Client.call(Client.java:1089)


Any help is appreciated



Re: how to access a mapper counter in reducer

2011-12-12 Thread Robert Evans
Praveen,

http://wiki.apache.org/hadoop/HowToContribute

is a good place to help you get started with creating the patch.  Once you have 
code written I am happy to help review it.

--Bobby Evans

On 12/9/11 9:00 PM, "Praveen Sripati"  wrote:

Robert,

I will take a shot at it. I think it would be about writing a custom comparator 
and a partitioner, reading some config parameters and sending the counters as 
key/value pairs to the reducers. It shouldn't be that difficult.

If I am stuck, I will post in the forum. I will also know how to create a patch.

Regards,
Praveen

On Thu, Dec 8, 2011 at 9:45 PM, Robert Evans  wrote:
Sorry I have not responded sooner I have had a number of fires at work to put 
out, and I haven't been keeping up with the user mailing lists.  The code I did 
before was very specific to the task I was working on, and it was an ugly hack 
because I did not bother with the comparator, I already knew there was only a 
small predefined set of keys so I just output one set of metadata data for each 
key.

I would be happy to put something like this into the map/reduce framework.  I 
have filed https://issues.apache.org/jira/browse/MAPREDUCE-3520 for this. I 
just don't know when I will have the time to do that, especially with my work 
on the 0.23 release.  I'll also talk to my management to see if they want to 
allow me to work on this during work, or if it will have to be in my spare 
time.  Please feel free to comment on the JIRA or vote for it if you feel that 
it is something that you want done.  Or if you feel comfortable helping out 
perhaps you could take a first crack at it.

Thanks,

Bobby Evans


On 12/6/11 9:14 AM, "Mapred Learn" http://mapred.le...@gmail.com> > wrote:

Hi Praveen,
Could you share here so that we can use ?

Thanks,

Sent from my iPhone

On Dec 6, 2011, at 6:29 AM, Praveen Sripati http://praveensrip...@gmail.com> > wrote:

Robert,

> I have made the above thing work.

Any plans to make it into the Hadoop framework. There had been similar queries 
about it in other forums also. Need any help testing/documenting or anything, 
please let me know.

Regards,
Praveen

On Sat, Dec 3, 2011 at 2:34 AM, Robert Evans http://ev...@yahoo-inc.com>   > wrote:
Anurag,

The current set of counter APIs from within Map or Reduce process are write 
only.  They are not intended to be used for reading data from other tasks.  
They are there to be used for collecting statistics about the job as a whole.  
If you use too many of them the performance of the system as a whole can get 
very bad, because they are stored on the JobTracker in memory.  Also there is 
the potential that a map task that has finished "successfully" can later fail 
if the node it is running on dies before all of the map output can be fetched 
by all of the reducers.  This could result in a reducer reading in counter data 
that is only partial or out of date.  You may be able to access it through the 
job API  but I would not recommend it and I think there may be some issues with 
security if you have security enabled, but I don't know for sure.

If you have an optimization that really needs summary data from each mapper in 
all reducers then you should do it a map/reduce way.   Output a special 
key/value pair when a mapper finishes for each reducer with the statistics in 
it.  You can know how many reducers there are because that is set in the 
configuration.  You then need a special partitioner to recognize those summary 
key/value pairs and make sure that they each go to the proper reducer.  You 
also need a special compairitor to make sure that these special keys are the 
very first ones read by the reducer so it can have the data before processing 
anything else.

I would also recommend that you don't try to store this data in HDFS.  You can 
very easily do a DDOS on the namenode on a large cluster, and then your ops 
will yell at you as they did with me before I stopped doing it.  I have made 
the above thing work.  It is just a lot of work to do it right.

--Bobby Evans



On 12/1/11 1:18 PM, "Markus Jelsma" http://markus.jel...@openindex.io>   > 
wrote:

Can access it via the Job API?

http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters%28%29
 


> Hi,

> I have a similar query.

>

> Infact, I sent it yesterday and waiting for anybody's response who might

> have done it.

>

>

> Thanks,

> Anurag Tangri

>

> 2011/11/30 rabbit_cheng http://rabbit_ch...@126.com>  
>  >


>

> >  I have created a counter in mapper to count something, I wanna get the

> >

> > counter's value in reducer phase, the code segment is as follow:

> >

> > public class MM extends Mapper {

> >

> > static enum TEST{ pt }

> > @Override

> > public void map(LongWritable ke

Pause and Resume Hadoop map reduce job

2011-12-12 Thread Dino Kečo
Hi Hadoop users,

In my company we have been using Hadoop for 2 years and we have need to
pause and resume map reduce jobs. I was searching on Hadoop JIRA and there
are couple of tickets which are not resolved. So we have implemented our
solution. I would like to share this approach with you and to hear your
opinion about it.

We have created one special pool in fair scheduler called PAUSE
(maxMapTasks = 0, maxReduceTasks = 0). Our logic for pausing job is to move
it into this pool and kill all running tasks. When we want to resume job we
move this job into some other pool. Currently we can do maintenance of
cloud except Job Tracker while jobs are paused. Also we have some external
services which we use and we are doing their maintenance while jobs are
paused.

We know that records which are processed by running tasks will be
reprocessed. In some cases we use same HBase table as input and output and
we save job id on record. When record is re-processes we check this job id
and skip record if it is processed by same job.

Our custom implementation of fair scheduler have this logic implemented and
it is deployed to our cluster.

Please share your comments and concerns about this approach

Regards,
dino


Re: Pause and Resume Hadoop map reduce job

2011-12-12 Thread Arun C Murthy
The CapacityScheduler (hadoop-0.20.203 onwards) allows you to stop a queue and 
start it again.

That will give you the behavior you described.

Arun

On Dec 12, 2011, at 5:50 AM, Dino Kečo wrote:

> Hi Hadoop users,
> 
> In my company we have been using Hadoop for 2 years and we have need to pause 
> and resume map reduce jobs. I was searching on Hadoop JIRA and there are 
> couple of tickets which are not resolved. So we have implemented our 
> solution. I would like to share this approach with you and to hear your 
> opinion about it.
> 
> We have created one special pool in fair scheduler called PAUSE (maxMapTasks 
> = 0, maxReduceTasks = 0). Our logic for pausing job is to move it into this 
> pool and kill all running tasks. When we want to resume job we move this job 
> into some other pool. Currently we can do maintenance of cloud except Job 
> Tracker while jobs are paused. Also we have some external services which we 
> use and we are doing their maintenance while jobs are paused. 
> 
> We know that records which are processed by running tasks will be 
> reprocessed. In some cases we use same HBase table as input and output and we 
> save job id on record. When record is re-processes we check this job id and 
> skip record if it is processed by same job. 
> 
> Our custom implementation of fair scheduler have this logic implemented and 
> it is deployed to our cluster. 
> 
> Please share your comments and concerns about this approach 
> 
> Regards,
> dino