date:20151118

Fwd: Unable to submit jobs to a Hadoop cluster after a while

2015-11-18 Thread Ashwanth Kumar

Re-sending the post. Any help is highly appreciated.


-- Forwarded message --
From: Ashwanth Kumar 
Date: Sun, Nov 15, 2015 at 9:24 AM
Subject: Unable to submit jobs to a Hadoop cluster after a while
To: user@hadoop.apache.org


We're running Hadoop 2.6.0 via CDH5.4.4 and we get the following error
while submitting a new job

15/10/08 00:33:31 WARN security.UserGroupInformation:
PriviledgedActionException as:hadoop (auth:SIMPLE)
cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/data/hadoopfs/mapred/staging/hadoop/.staging/job_201510050004_0388/job.jar
could only be replicated to 0 nodes instead of minReplication (=1).  There
are 161 datanode(s) running and no node(s) are excluded in this operation.

At that time we had 161 DNs running in the cluster. From the NN logs I see

2015-10-08 01:00:26,889 DEBUG
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
to choose remote rack (location = ~/default-rack), fallback to local rack
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException:
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:691)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRemoteRack(BlockPlacementPolicyDefault.java:580)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:357)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:419)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:214)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:111)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:3746)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$200(BlockManager.java:3711)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1400)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1306)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3682)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3634)
at java.lang.Thread.run(Thread.java:722)
2015-10-08 01:00:26,890 WARN
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
to place enough replicas, still in need of 1 to reach 3
(unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
newBlock=false) [

>From one of the live 160+ DN logs, we saw

Node /default-rack/10.181.8.222:50010 [
  Storage [DISK]DS-2d39f3c3-2e67-48ad-871b-632f66b277d7:NORMAL:
10.181.8.222:50010 is not chosen since the node is too busy (load: 2 >
1.8370786516853932) .
]
Node /default-rack/10.181.25.147:50010 [
  Storage [DISK]DS-60b511b0-62aa-4c0f-92d9-6d90ff32ee49:NORMAL:
10.181.25.147:50010 is not chosen since the node is too busy (load: 2 >
1.8370786516853932) .
]
Node /default-rack/10.181.8.152:50010 [
  Storage [DISK]DS-7e0bf761-86f2-4748-9eda-fbfd9c69e127:NORMAL:
10.181.8.152:50010 is not chosen since the node is too busy (load: 2 >
1.8370786516853932) .
]
Node /default-rack/10.181.25.67:50010 [
  Storage [DISK]DS-5849e4d8-4ab6-4392-aee2-7a354c82c19d:NORMAL:
10.181.25.67:50010 is not chosen since the node is too busy (load: 2 >
1.8370786516853932) .
]


Few things we observed from our end
- If we restart the NN, we're able to submit jobs without any issues
- We run this Hadoop cluster on AWS
- DN and TT process run on a single EC2 machine which is backed by an
AutoScaling Group.
- We've another cluster which does't autoscale and doesn't exhibit the
behaviour

Any pointers or ideas on how to solve this for good would be really
appreciated.



-- 

Ashwanth Kumar / ashwanthkumar.in

map task frozen from master(s) perspective, but no process is there, and task log reports completion

2015-11-18 Thread Nicolae Marasoiu

Hi,

I have a map task "slot" occupied with a task that does not make progress for 
hours, and in fact is seen by yarn as NEW and STARTING. (Since we use yarn / 
hadoop2, it is not a slot per-se, but the resource mechanism works as 
dynamically computing slots - for instance I have top 5 map+reduce tasks 
running in current config. I cannot change this while the job is still running 
right?)

I have found a log of the task shwn completion:
2015-11-19 04:01:14,719 INFO [main] org.apache.hadoop.mapred.MapTask: Starting 
flush of map output
2015-11-19 04:01:14,719 INFO [main] org.apache.hadoop.mapred.MapTask: Spilling 
map output
2015-11-19 04:01:14,719 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart 
= 0; bufend = 63496; bufvoid = 104857600
2015-11-19 04:01:14,719 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 
26214396(104857584); kvend = 26201248(104804992); length = 13149/6553600
2015-11-19 04:01:14,851 INFO [main] org.apache.hadoop.mapred.MapTask: Finished 
spill 0
2015-11-19 04:01:14,858 INFO [main] org.apache.hadoop.mapred.Task: 
Task:attempt_1447872797537_0001_m_002241_0 is done. And is in the process of 
committing
2015-11-19 04:01:14,889 INFO [main] org.apache.hadoop.mapred.Task: Task 
'attempt_1447872797537_0001_m_002241_0' done.
2015-11-19 04:01:14,889 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics 
system...
2015-11-19 04:01:14,890 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system 
stopped.
2015-11-19 04:01:14,890 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system 
shutdown complete.

My hypothesis is that the task could not report its progress or completion to 
the application master, but in this case the master should have timed it out I 
believe?
Can I kill the task attempt in any way to allow it to restart?

Pls advise,
Nicu

Re: failed to start namenode

2015-11-18 Thread sandeep das

Its surprising that no logs are created. How are you trying to start Name
node? If you are starting using cloudera manager then logs can be seen
found out on screen as well.

On Thu, Nov 19, 2015 at 11:36 AM, siva kumar  wrote:

> Hi Sandeep,
>  The log is not getting generated for the name
> node.
>
> On Wed, Nov 18, 2015 at 5:53 PM, sandeep das  wrote:
>
>> At least share some excerpts from log of name node log file.
>>
>> On Wed, Nov 18, 2015 at 5:46 PM, siva kumar  wrote:
>>
>>> Hi Folks,
>>>   I'm trying to install a fresh hadoop cluster. But
>>> then,namenode is not is not starting up because of which hdfs service is
>>> not started during my first run. Can anyone help me out?
>>> I'm trying this using parcels(CDH-5).
>>>
>>> Any help?
>>>
>>
>>
>

Re: failed to start namenode

2015-11-18 Thread siva kumar

Hi Sandeep,
 The log is not getting generated for the name node.

On Wed, Nov 18, 2015 at 5:53 PM, sandeep das  wrote:

> At least share some excerpts from log of name node log file.
>
> On Wed, Nov 18, 2015 at 5:46 PM, siva kumar  wrote:
>
>> Hi Folks,
>>   I'm trying to install a fresh hadoop cluster. But
>> then,namenode is not is not starting up because of which hdfs service is
>> not started during my first run. Can anyone help me out?
>> I'm trying this using parcels(CDH-5).
>>
>> Any help?
>>
>
>

Yarn application reading from Data node using short-circuit.

2015-11-18 Thread sandeep das

Hi,

I was going through some benchmarking and realized that there are lots of
TCP connections are initiated while running my PIG jobs over YARN(MR2).
These TCP connections are related to data node. Although short-circuit is
enabled in my data nodes but still a lot TCP connections are being created.

I wanted to check that how can we enable YARN applicationMaster to read
data from Data node using short-circuits i.e. unix domain sockets. I
believe that will improve the performance of our jobs.


Can someone please help to understand how can I make sure that MR2 jobs
created by PIG scripts are reading data from Data node using short-circuit
instead of TCP connections?


Regards,
Sandeep

Re: yarn uses nodes non symetrically

2015-11-18 Thread Eric Payne

Nicolae It depends on how big your AM container is compared to the task 
containers. By default, the AM container size is 1.5GB and the map/reduce 
containers are 1GB. You can adjust these by setting 
yarn.app.mapreduce.am.resource.mb, mapreduce.map.memory.mb, and 
mapreduce.map.memory.mb. If you make them smaller, make sure you also adjust 
the -Xmx values for the mapreduce.*.java.opts properties as well.Thanks,
-Eric
 


  From: Nicolae Marasoiu 
 To: "user@hadoop.apache.org"  
 Sent: Tuesday, November 17, 2015 8:01 AM
 Subject: yarn uses nodes non symetrically
   
 #yiv7260601419 #yiv7260601419 -- P 
{margin-top:0;margin-bottom:0;}#yiv7260601419 Hi,
My nodes are identical, and the yarn-site.xml are identical too.However, 
between slaves, one is used to the full but the other, around half, meaning: 
one gets 4 containers, the other gets 3 (and one of them is the app master 
which is quite idle), and I don't know why.
Thanks,Nicu

RE: Does MapReduceApplicationMaster prevents data node from spawning YarnChild?

2015-11-18 Thread darekg11

Thank You very much if it is the case then ApplicationMaster alone takes alomst 
everything that my datanode can offer - that would explain why task itself 
can't start. Thank You very much again.
Dnia 17 listopada 2015 20:40 Bikas Saha  napisał(a):
You can check for “yarn.app.mapreduce.am.resource.mb” in your configs. Its 
default is 1.5GB and other MR task defaults are 1GB. From: darekg11 
[mailto:darek...@tlen.pl]
Sent: Tuesday, November 17, 2015 11:34 AM
To: user@hadoop.apache.org
Subject: RE: Does MapReduceApplicationMaster prevents data node from spawning 
YarnChild? Thank You very much, will check but the problem is that machines 
aren't really powerful - only 2GB of RAM and 2xCore each at 3.3GHz.
Do You maybe know approx value of required resources for AppMaster? And can we 
check current resource consumption of AppMaster?
Dnia 17 listopada 2015 18:40 Bikas Saha  napisał(a):No. 
In general App Masters and their containers can be launched on any machine and 
both can be launched on the same machine. If your case happens repeatedly then 
you could check the RM UI, while the job is running, to see the maximum 
resource on a node manager and the resource currently assigned. Perhaps your 
node managers don’t have enough resources to run multiple containers? From: 
darekg11 [mailto:darek...@tlen.pl]
Sent: Tuesday, November 17, 2015 7:55 AM
To: user@hadoop.apache.org
Subject: Does MapReduceApplicationMaster prevents data node from spawning 
YarnChild? Hello again dear users.
Today I ran into following problem:
My mini cluster consist of:
1 NameNode and 2 SlaveNodes.
When I ran my MapReduce program written in java with number of reducers equals 
to two.
As the result on first SlaveNode I got MRAppMaster task and only the second 
slave launched Yarn Child which actually was producing output results.
I understand that MRAppMaster is essential process repsonsible for managing 
life of given task.
And because of that single slave node can't launch MrAppMaster and Yarn Child 
at the same time or am I misunderstanding something?

Data spilling on disk from MR jobs

2015-11-18 Thread sandeep das

Hi,

I'm running my pig script over YARN(MR2). I was going through some tuning
parameter and find out that the value of parameter
"mapreduce.task.io.sort.mb" should be tuned properly. By default it is
configured to 256 MB in my cloudera setup.

I would wish to know that how can I find whether my MR jobs are spilling
data into disk or not. Are there any logs which can help me to find how
much data was spilled over disk? Is there any parameter which can be
configured to enable such logging.

CDH: CDH-5.4.4-1.cdh5.4.4.p0.4
Hadoop: 2.6.0-cdh5.4.4

Let me know in case more information is required.


Regards,
Sandeep

Re: failed to start namenode

2015-11-18 Thread sandeep das

At least share some excerpts from log of name node log file.

On Wed, Nov 18, 2015 at 5:46 PM, siva kumar  wrote:

> Hi Folks,
>   I'm trying to install a fresh hadoop cluster. But
> then,namenode is not is not starting up because of which hdfs service is
> not started during my first run. Can anyone help me out?
> I'm trying this using parcels(CDH-5).
>
> Any help?
>

failed to start namenode

2015-11-18 Thread siva kumar

Hi Folks,
  I'm trying to install a fresh hadoop cluster. But
then,namenode is not is not starting up because of which hdfs service is
not started during my first run. Can anyone help me out?
I'm trying this using parcels(CDH-5).

Any help?

Fwd: Unable to submit jobs to a Hadoop cluster after a while

map task frozen from master(s) perspective, but no process is there, and task log reports completion

Re: failed to start namenode

Re: failed to start namenode

Yarn application reading from Data node using short-circuit.

Re: yarn uses nodes non symetrically

RE: Does MapReduceApplicationMaster prevents data node from spawning YarnChild?

Data spilling on disk from MR jobs

Re: failed to start namenode

failed to start namenode

10 matches

Site Navigation

Mail list logo

Footer information