RE: How to set Rack Id of DataNodes?

2013-04-15 Thread Vijay Thakorlal
Hi Mohammad,

 

Yes that's correct your rack awareness script takes the IP address of a
node and returns the rack name/id.

You then just have to ensure the script is executable and referenced (using
an absolute path) in the parameter topology.script.file.name in
core-site.xml.

 

Regards,

Vijay

 

From: Mohammad Mustaqeem [mailto:3m.mustaq...@gmail.com] 
Sent: 15 April 2013 13:05
To: user@hadoop.apache.org
Subject: How to set Rack Id of DataNodes?

 

Hello everyone??

I want to set the Rack Id of each DataNodes??

I have read somewhere that we have to write a script that gives Rack Id of
nodes.

I want to clarify that the input of that script will be IP Address of
DataNode and the output will be the RackId..

Is it??


 

-- 

With regards ---

Mohammad Mustaqeem,

M.Tech (CSE)

MNNIT Allahabad

9026604270

 



RE: NameNode failure and recovery!

2013-04-03 Thread Vijay Thakorlal
Hi Rahul,

 

The SNN does not act as a backup / standby NameNode in the event of failure. 

 

The sole purpose of the Secondary NameNode (or as it’s otherwise / more 
correctly known as the Checkpoint Node) is to perform checkpointing of the 
current state of HDFS:

 

The SNN retrieves the fsimage and edits files from the NN 

The NN rolls the edits file

The SNN Loads the fsimage into memory 

Then the SNN replays the edits log file to merge the two

Then the SNN transfers the merged checkpoint back to the NN

The NN uses the checkpoint as the new fsimage file

 

It’s true that technically you could use the fsimage from the SNN if completely 
lost the NN – and yes as you said you would “lose” any changes to HDFS that 
occurred between the NN dieing and the last time the checkpoint occurred. But 
as mentioned the SNN is not a backup for the NN.

 

Regards,

Vijay

 

From: Rahul Bhattacharjee [mailto:rahul.rec@gmail.com] 
Sent: 03 April 2013 15:40
To: user@hadoop.apache.org
Subject: NameNode failure and recovery!

 

Hi all,

I was reading about Hadoop and got to know that there are two ways to protect 
against the name node failures.

1) To write to a nfs mount along with the usual local disk.

 -or-

2) Use secondary name node. In case of failure of NN , the SNN can take in 
charge. 

My questions :-

1) SNN is always lagging , so when SNN becomes primary in event of a NN failure 
,  then the edits which have not been merged into the image file would be lost 
, so the system of SNN would not be consistent with the NN before its failure.

2) Also I have read that other purpose of SNN is to periodically merge the edit 
logs with the image file. In case a setup goes with option #1 (writing to NFS, 
no SNN) , then who does this merging.

 

Thanks,
Rahul

 



RE: In Compatible clusterIDs

2013-02-20 Thread Vijay Thakorlal
Hi Nagarjuna,

 

What's is in your /etc/hosts file? I think the line in logs where it says
DataNodeRegistration(0.0.0.0 [..], should be the hostname or IP of the
datanode (124.123.215.187 since you said it's a pseudo-distributed setup)
and not 0.0.0.0.

 

By the way are you using the dfs.hosts parameter for specifying the
datanodes that can connect to the namenode?

 

Vijay

 

From: nagarjuna kanamarlapudi [mailto:nagarjuna.kanamarlap...@gmail.com] 
Sent: 20 February 2013 15:52
To: user@hadoop.apache.org
Subject: Re: In Compatible clusterIDs

 

 

Hi Jean Marc,

 

Yes, this is the cluster I am trying  to create and then will scale up.

 

As per your suggestion I deleted the folder
/Users/nagarjunak/Documents/hadoop-install/hadoop-2.0.3-alpha/tmp_20 an
formatted the cluster.

 

 

Now I get the following error.

 

 

2013-02-20 21:17:25,668 FATAL
org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
block pool Block pool BP-811644675-124.123.215.187-1361375214801 (storage id
DS-1515823288-124.123.215.187-50010-1361375245435) service to
nagarjuna/124.123.215.187:9000

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol
.DisallowedDatanodeException): Datanode denied communication with namenode:
DatanodeRegistration(0.0.0.0,
storageID=DS-1515823288-124.123.215.187-50010-1361375245435, infoPort=50075,
ipcPort=50020,
storageInfo=lv=-40;cid=CID-723b02a7-3441-41b5-8045-2a45a9cf96b0;nsid=1805451
571;c=0)

at
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatano
de(DatanodeManager.java:629)

at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNames
ystem.java:3459)

at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(Na
meNodeRpcServer.java:881)

at
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.reg
isterDatanode(DatanodeProtocolServerSideTranslatorPB.java:90)

at
org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtoco
lService$2.callBlockingMethod(DatanodeProtocolProtos.java:18295)

at org.apache.hadoop.ipc.Protob

 

 

On Wed, Feb 20, 2013 at 9:10 PM, Jean-Marc Spaggiari
jean-m...@spaggiari.org wrote:

Hi Nagarjuna,

Is it a test cluster? Do you have another cluster running close-by?
Also, is it your first try?

It seems there is some previous data in the dfs directory which is not
in sync with the last installation.

Maybe you can remove the content of
/Users/nagarjunak/Documents/hadoop-install/hadoop-2.0.3-alpha/tmp_20
if it's not usefull for you, reformat your node and restart it?

JM

2013/2/20, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com:

 Hi,

 I am trying to setup single node cluster of hadop 2.0.*

 When trying to start datanode I got the following error. Could anyone help
 me out

 Block pool BP-1894309265-124.123.215.187-1361374377471 (storage id
 DS-1175433225-124.123.215.187-50010-1361374235895) service to nagarjuna/
 124.123.215.187:9000
 java.io.IOException: Incompatible clusterIDs in

/Users/nagarjunak/Documents/hadoop-install/hadoop-2.0.3-alpha/tmp_20/dfs/dat
a:
 namenode clusterID = CID-800b7eb1-7a83-4649-86b7-617913e82ad8; datanode
 clusterID = CID-1740b490-8413-451c-926f-2f0676b217ec
 at

org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.
java:391)
 at

org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(Dat
aStorage.java:191)
 at

org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(Dat
aStorage.java:219)
 at

org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:85
0)
 at

org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:
821)
 at

org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceI
nfo(BPOfferService.java:280)
 at

org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshak
e(BPServiceActor.java:222)
 at

org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.jav
a:664)
 at java.lang.Thread.run(Thread.java:680)
 2013-02-20 21:03:39,856 WARN
 org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service
 for: Block pool BP-1894309265-124.123.215.187-1361374377471 (storage id
 DS-1175433225-124.123.215.187-50010-1361374235895) service to nagarjuna/
 124.123.215.187:9000
 2013-02-20 21:03:39,958 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool
 BP-1894309265-124.123.215.187-1361374377471 (storage id
 DS-1175433225-124.123.215.187-50010-1361374235895)
 2013-02-20 21:03:41,959 WARN
 org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
 2013-02-20 21:03:41,961 INFO org.apache.hadoop.util.ExitUtil: Exiting with
 status 0
 2013-02-20 21:03:41,963 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
 /
 

RE: getimage failed in Name Node Log

2013-02-15 Thread Vijay Thakorlal
Hi Janesh,

 

I think your SNN may be starting up with the wrong IP, I'm sure the machine
parameter should say 192.168.0.101?

 

http://namenode:50070/getimage?putimage=1
http://namenode:50070/getimage?putimage=1port=50090machine=0.0.0.0token=
-32:1989419481:0:136084943:1360849122845
port=50090machine=0.0.0.0token=-32:1989419481:0:136084943:13608491228
45

 

Are you able to retrieve the fsimage from the SNN from the command line?
Using curl or wget:

 

wget  'http://192.168.0.105:50070/getimage?getimage=1' -O fsimage.dmp

 

If this actually retrieves an error page then the NN is reachable from the
SNN and the port is definitely open. Otherwise double check that this is not
due to the OS firewall blocking the connection, assuming it is on?

 

That said the PrivilegedActionException in the error may actually mean it's 

 

Vijay

 

From: janesh mishra [mailto:janeshmis...@gmail.com] 
Sent: 15 February 2013 12:27
To: user@hadoop.apache.org
Subject: getimage failed in Name Node Log

 

Hi, 

I am new in Hadoop and i set the hadoop cluster with the help of Michell
Noll Multi-Node setup
(http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-
node-cluster/). When i setup the single Node Hadoop then every things works
fine. 

But in Multi Node setup i found that my fsimage and editlogs files are not
updated on SNN, roll back of edit is done i have edit.new on NN 

Logs Form NN: 

2013-02-14 19:13:52,468 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hduser cause:java.net.ConnectException: Connection refused 

2013-02-14 19:13:52,468 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hduser cause:java.net.ConnectException: Connection refused 

2013-02-14 19:13:52,477 WARN org.mortbay.log: /getimage:
java.io.IOException: GetImage failed. java.net.ConnectException: Connection
refused 

Logs From SNN: 

-- 

2013-02-14 19:13:52,350 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
namenode:50070putimage=1port=50090machine=0.0.0.0token=32:1989419481:0:13
6084943:1360849122845 

2013-02-14 19:13:52,374 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
doCheckpoint:

2013-02-14 19:13:52,375 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
java.io.FileNotFoundException: 

http://namenode:50070/getimage?putimage=1
http://namenode:50070/getimage?putimage=1port=50090machine=0.0.0.0token=
-32:1989419481:0:136084943:1360849122845
port=50090machine=0.0.0.0token=-32:1989419481:0:136084943:13608491228
45 

at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection
.java:1613) 

atorg.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(Trans
ferFsImage.java:160) at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.putFSImage(Secondar
yNameNode.java:377) at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(Second
aryNameNode.java:418) at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNam
eNode.java:312) at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNo
de.java:275) at java.lang.Thread.run(Thread.java:722) 

My setup includes 

Version : hadoop-1.0.4 

   1. Name Node (192.168.0.105) 

   2. Secondary Name Node (192.168.0.101) 

   3. Data Node (192.168.0.100) 

Name Node also works as Data Node. 

Conf File For Name Node: 

core-hdfs.xml 

- 

?xml version=1.0? 

?xml-stylesheet type=text/xsl href=configuration.xsl? 

!-- Put site-specific property overrides in this file. -- 

configuration 

property 

namehadoop.tmp.dir/name 

value/app/hadoop/tmp/value 

descriptionA base for other temporary directories./description 

/property 

property 

namefs.default.name/name 

valuehdfs://namenode:54310/value 

descriptionThe name of the default file system. A URI whose 

scheme and authority determine the FileSystem implementation. The 

uri's scheme determines the config property (fs.SCHEME.impl) naming 

the FileSystem implementation class. The uri's authority is used to 

determine the host, port, etc. for a filesystem./description 

/property 

property 

namefs.checkpoint.period/name 

value300/value 

descriptionThe number of seconds between two periodic checkpoints. 

/description 

/property 

/configuration 

hdfs-site.xml 

- 

?xml version=1.0? 

?xml-stylesheet type=text/xsl href=configuration.xsl? 

!-- Put site-specific property overrides in this file. -- 

configuration 

property 

namedfs.replication/name 

value2/value 

descriptionDefault block replication. 

The actual number of replications can be specified when the file is created.


The default is used if replication is not specified in create time. 

/description 

/property 

property 

namedfs.hosts/name 

value/usr/local/hadoop/includehosts/value 

descriptionips that works as datanode/description 

/property 

property 


RE: Error for Pseudo-distributed Mode

2013-02-12 Thread Vijay Thakorlal
Hi,

 

Could you first try running the example:

$ /usr/bin/hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar
grep input output 'dfs[a-z.]+'

 

Do you receive the same error?

 

Not sure if it's related to a lack of RAM, but as the stack trace shows
errors with network timeout (I realise that you're running in
pseudo-distributed mode):

 

Caused by: com.google.protobuf.ServiceException:
java.net.SocketTimeoutException: Call From localhost.localdomain/127.0.0.1
to localhost.localdomain:54113 failed on socket timeout exception:
java.net.SocketTimeoutException: 6 millis timeout while waiting for
channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/127.0.0.1:60976 remote=localhost.localdomain/127.0.0.1:54113]; For
more details see:  http://wiki.apache.org/hadoop/SocketTimeout


 

Your best bet is probably to start with checking the items mentioned in the
wiki page linked to above. While the default firewall rules (on CentOS)
usually allows pretty much all traffic on the lo interface it might be worth
temporarily turning off iptables (assuming it is on).

 

Vijay

 

 

 

From: yeyu1899 [mailto:yeyu1...@163.com] 
Sent: 12 February 2013 12:58
To: user@hadoop.apache.org
Subject: Error for Pseudo-distributed Mode

 

Hi all,

I installed a redhat_enterprise-linux-x86 in VMware Workstation, and set the
virtual machine 1G memory. 

 

Then I followed steps guided by Installing CDH4 on a Single Linux Node in
Pseudo-distributed Mode --
https://ccp.cloudera.com/display/CDH4DOC/Installing+CDH4+on+a+Single+Linux+N
ode+in+Pseudo-distributed+Mode.

 

When at last, I ran an example Hadoop job with the command $ hadoop jar
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23
'dfs[a-z.]+'

 

then the screen showed as follows, 

depending AttemptID:attempt_1360528029309_0001_r_00_0 Timed out after
600 secs and I wonder is that because my virtual machine's memory too
little~~??

 

[hadoop@localhost hadoop-mapreduce]$ hadoop jar
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23
'dfs[a-z]+'


13/02/11 04:30:44 WARN mapreduce.JobSubmitter: No job jar file set.  User
classes may not be found. See Job or Job#setJar(String).


13/02/11 04:30:44 INFO input.FileInputFormat: Total input paths to process :
4   

13/02/11 04:30:45 INFO mapreduce.JobSubmitter: number of splits:4


13/02/11 04:30:45 WARN conf.Configuration: mapred.output.value.class is
deprecated. Instead, use mapreduce.job.output.value.class


13/02/11 04:30:45 WARN conf.Configuration: mapreduce.combine.class is
deprecated. Instead, use mapreduce.job.combine.class


13/02/11 04:30:45 WARN conf.Configuration: mapreduce.map.class is
deprecated. Instead, use mapreduce.job.map.class


13/02/11 04:30:45 WARN conf.Configuration: mapred.job.name is deprecated.
Instead, use mapreduce.job.name

13/02/11 04:30:45 WARN conf.Configuration: mapreduce.reduce.class is
deprecated. Instead, use mapreduce.job.reduce.class


13/02/11 04:30:45 WARN conf.Configuration: mapred.input.dir is deprecated.
Instead, use mapreduce.input.fileinputformat.inputdir


13/02/11 04:30:45 WARN conf.Configuration: mapred.output.dir is deprecated.
Instead, use mapreduce.output.fileoutputformat.outputdir


13/02/11 04:30:45 WARN conf.Configuration: mapreduce.outputformat.class is
deprecated. Instead, use mapreduce.job.outputformat.class


13/02/11 04:30:45 WARN conf.Configuration: mapred.map.tasks is deprecated.
Instead, use mapreduce.job.maps   

13/02/11 04:30:45 WARN conf.Configuration: mapred.output.key.class is
deprecated. Instead, use mapreduce.job.output.key.class


13/02/11 04:30:45 WARN conf.Configuration: mapred.working.dir is deprecated.
Instead, use mapreduce.job.working.dir


13/02/11 04:30:46 INFO mapred.YARNRunner: Job jar is not present. Not adding
any jar to the list of resources.   

13/02/11 04:30:46 INFO mapred.ResourceMgrDelegate: Submitted application
application_1360528029309_0001 to ResourceManager at /0.0.0.0:8032


13/02/11 04:30:46 INFO mapreduce.Job: The url to track the job:
http://localhost.localdomain:8088/proxy/application_1360528029309_0001/


13/02/11 04:30:46 INFO mapreduce.Job: Running job: job_1360528029309_0001


13/02/11 04:31:01 INFO mapreduce.Job: Job job_1360528029309_0001 running in
uber mode : false

13/02/11 04:31:01 INFO mapreduce.Job:  map 0% reduce 0%


13/02/11 04:47:22 INFO mapreduce.Job: Task Id :
attempt_1360528029309_0001_r_00_0, Status : FAILED   

AttemptID:attempt_1360528029309_0001_r_00_0 Timed out after 600 secs


cleanup failed for container container_1360528029309_0001_01_06 :
java.lang.reflect.UndeclaredThrowableException


at
org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAn
dThrowException(YarnRemoteExceptionPBImpl.java:135)


at
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.stopC

RE: Reg Too many fetch-failures Error

2013-02-01 Thread Vijay Thakorlal
Hi Manoj,

 

As you may be aware this means the reduces are unable to fetch intermediate
data from TaskTrackers that ran map tasks - you can try:

* increasing tasktracker.http.threads so there are more threads to handle
fetch requests from reduces. 

* decreasing mapreduce.reduce.parallel.copies : so fewer copy / fetches are
performed in parallel

 

It could also be due to a temporary DNS issue.

 

See slide 26 of this presentation for potential causes for this message:
http://www.slideshare.net/cloudera/hadoop-troubleshooting-101-kate-ting-clou
dera

 

Not sure why you did not receive the problem before but was it the same data
or different data? Did you have other jobs running on your cluster?

 

Hope that helps

 

Regards

Vijay

 

From: Manoj Babu [mailto:manoj...@gmail.com] 
Sent: 01 February 2013 15:09
To: user@hadoop.apache.org
Subject: Reg Too many fetch-failures Error

 

Hi All,

 

I am getting Too many fetch-failures exception.

What might be the reason for this exception, For same size of data i dint
face this error earlier and there is change in code.

How to avoid this?

 

Thanks in advance.

 

Cheers!

Manoj.



RE: Maximum Storage size in a Single datanode

2013-01-30 Thread Vijay Thakorlal
Hi Jeba,

 

There are other considerations too, for example, if a single node holds 1 PB
of data and if it were to die this would cause a significant amount of
traffic as NameNode arranges for new replicas to be created.

 

Vijay

 

From: Bertrand Dechoux [mailto:decho...@gmail.com] 
Sent: 30 January 2013 09:14
To: user@hadoop.apache.org; jeba earnest
Subject: Re: Maximum Storage size in a Single datanode

 

I would say the hard limit is due to the OS local file system (and your
budget).

So short answer for ext3 : it doesn't seems so.
http://en.wikipedia.org/wiki/Ext3

And I am not sure the answer is the most interesting. Even if you could put
1 Peta on one node, what is usually interesting is the ratio
storage/compute.

Bertrand

On Wed, Jan 30, 2013 at 9:08 AM, jeba earnest jebaearn...@yahoo.com wrote:

 

Hi,



Is it possible to keep 1 Petabyte in a single data node?

If not, How much is the maximum storage for a particular data node?

 

Regards,
M. Jeba