RE: How to set Rack Id of DataNodes?

2013-04-15 Thread Vijay Thakorlal
Hi Mohammad,

 

Yes that's correct your "rack awareness" script takes the IP address of a
node and returns the rack name/id.

You then just have to ensure the script is executable and referenced (using
an absolute path) in the parameter topology.script.file.name in
core-site.xml.

 

Regards,

Vijay

 

From: Mohammad Mustaqeem [mailto:3m.mustaq...@gmail.com] 
Sent: 15 April 2013 13:05
To: user@hadoop.apache.org
Subject: How to set Rack Id of DataNodes?

 

Hello everyone??

I want to set the Rack Id of each DataNodes??

I have read somewhere that we have to write a script that gives Rack Id of
nodes.

I want to clarify that the input of that script will be IP Address of
DataNode and the output will be the RackId..

Is it??


 

-- 

With regards ---

Mohammad Mustaqeem,

M.Tech (CSE)

MNNIT Allahabad

9026604270

 



RE: NameNode failure and recovery!

2013-04-03 Thread Vijay Thakorlal
Hi Rahul,

 

The SNN does not act as a backup / standby NameNode in the event of failure. 

 

The sole purpose of the Secondary NameNode (or as it’s otherwise / more 
correctly known as the Checkpoint Node) is to perform checkpointing of the 
current state of HDFS:

 

The SNN retrieves the fsimage and edits files from the NN 

The NN rolls the edits file

The SNN Loads the fsimage into memory 

Then the SNN replays the edits log file to merge the two

Then the SNN transfers the merged checkpoint back to the NN

The NN uses the checkpoint as the new fsimage file

 

It’s true that technically you could use the fsimage from the SNN if completely 
lost the NN – and yes as you said you would “lose” any changes to HDFS that 
occurred between the NN dieing and the last time the checkpoint occurred. But 
as mentioned the SNN is not a backup for the NN.

 

Regards,

Vijay

 

From: Rahul Bhattacharjee [mailto:rahul.rec@gmail.com] 
Sent: 03 April 2013 15:40
To: user@hadoop.apache.org
Subject: NameNode failure and recovery!

 

Hi all,

I was reading about Hadoop and got to know that there are two ways to protect 
against the name node failures.

1) To write to a nfs mount along with the usual local disk.

 -or-

2) Use secondary name node. In case of failure of NN , the SNN can take in 
charge. 

My questions :-

1) SNN is always lagging , so when SNN becomes primary in event of a NN failure 
,  then the edits which have not been merged into the image file would be lost 
, so the system of SNN would not be consistent with the NN before its failure.

2) Also I have read that other purpose of SNN is to periodically merge the edit 
logs with the image file. In case a setup goes with option #1 (writing to NFS, 
no SNN) , then who does this merging.

 

Thanks,
Rahul

 



RE: In Compatible clusterIDs

2013-02-20 Thread Vijay Thakorlal
Hi Nagarjuna,

 

What's is in your /etc/hosts file? I think the line in logs where it says
"DataNodeRegistration(0.0.0.0 [..]", should be the hostname or IP of the
datanode (124.123.215.187 since you said it's a pseudo-distributed setup)
and not 0.0.0.0.

 

By the way are you using the dfs.hosts parameter for specifying the
datanodes that can connect to the namenode?

 

Vijay

 

From: nagarjuna kanamarlapudi [mailto:nagarjuna.kanamarlap...@gmail.com] 
Sent: 20 February 2013 15:52
To: user@hadoop.apache.org
Subject: Re: In Compatible clusterIDs

 

 

Hi Jean Marc,

 

Yes, this is the cluster I am trying  to create and then will scale up.

 

As per your suggestion I deleted the folder
/Users/nagarjunak/Documents/hadoop-install/hadoop-2.0.3-alpha/tmp_20 an
formatted the cluster.

 

 

Now I get the following error.

 

 

2013-02-20 21:17:25,668 FATAL
org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
block pool Block pool BP-811644675-124.123.215.187-1361375214801 (storage id
DS-1515823288-124.123.215.187-50010-1361375245435) service to
nagarjuna/124.123.215.187:9000

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol
.DisallowedDatanodeException): Datanode denied communication with namenode:
DatanodeRegistration(0.0.0.0,
storageID=DS-1515823288-124.123.215.187-50010-1361375245435, infoPort=50075,
ipcPort=50020,
storageInfo=lv=-40;cid=CID-723b02a7-3441-41b5-8045-2a45a9cf96b0;nsid=1805451
571;c=0)

at
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatano
de(DatanodeManager.java:629)

at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNames
ystem.java:3459)

at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(Na
meNodeRpcServer.java:881)

at
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.reg
isterDatanode(DatanodeProtocolServerSideTranslatorPB.java:90)

at
org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtoco
lService$2.callBlockingMethod(DatanodeProtocolProtos.java:18295)

at org.apache.hadoop.ipc.Protob

 

 

On Wed, Feb 20, 2013 at 9:10 PM, Jean-Marc Spaggiari
 wrote:

Hi Nagarjuna,

Is it a test cluster? Do you have another cluster running close-by?
Also, is it your first try?

It seems there is some previous data in the dfs directory which is not
in sync with the last installation.

Maybe you can remove the content of
/Users/nagarjunak/Documents/hadoop-install/hadoop-2.0.3-alpha/tmp_20
if it's not usefull for you, reformat your node and restart it?

JM

2013/2/20, nagarjuna kanamarlapudi :

> Hi,
>
> I am trying to setup single node cluster of hadop 2.0.*
>
> When trying to start datanode I got the following error. Could anyone help
> me out
>
> Block pool BP-1894309265-124.123.215.187-1361374377471 (storage id
> DS-1175433225-124.123.215.187-50010-1361374235895) service to nagarjuna/
> 124.123.215.187:9000
> java.io.IOException: Incompatible clusterIDs in
>
/Users/nagarjunak/Documents/hadoop-install/hadoop-2.0.3-alpha/tmp_20/dfs/dat
a:
> namenode clusterID = CID-800b7eb1-7a83-4649-86b7-617913e82ad8; datanode
> clusterID = CID-1740b490-8413-451c-926f-2f0676b217ec
> at
>
org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.
java:391)
> at
>
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(Dat
aStorage.java:191)
> at
>
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(Dat
aStorage.java:219)
> at
>
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:85
0)
> at
>
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:
821)
> at
>
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceI
nfo(BPOfferService.java:280)
> at
>
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshak
e(BPServiceActor.java:222)
> at
>
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.jav
a:664)
> at java.lang.Thread.run(Thread.java:680)
> 2013-02-20 21:03:39,856 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service
> for: Block pool BP-1894309265-124.123.215.187-1361374377471 (storage id
> DS-1175433225-124.123.215.187-50010-1361374235895) service to nagarjuna/
> 124.123.215.187:9000
> 2013-02-20 21:03:39,958 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool
> BP-1894309265-124.123.215.187-1361374377471 (storage id
> DS-1175433225-124.123.215.187-50010-1361374235895)
> 2013-02-20 21:03:41,959 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
> 2013-02-20 21:03:41,961 INFO org.apache.hadoop.util.ExitUtil: Exiting with
> status 0
> 2013-02-20 21:03:41,963 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
> /
> SHUTDO

RE: Namenode formatting problem

2013-02-19 Thread Vijay Thakorlal
Hi Keith,

When you run the format command on the namenode machine it actually starts
the namenode, formats it then shuts it down (see:
http://hadoop.apache.org/docs/stable/commands_manual.html). Before you run
the format command do you see any processes already listening on port 9212
via netstat -anlp | grep 9212 on the namenode? 

As per the recommendations on the link in the error message
(http://wiki.apache.org/hadoop/BindException) you could try changing the
port used by the namenode

I'm not familiar with deploying Hadoop on EC2 so I'm not sure if this is
different for EC2 deployments, however, the namenode usually listens on port
8020 for file system metadata operations so I guess you specified a
different port in the fs.default.parameter hdfs-site.xml?

Vijay

-Original Message-
From: Keith Wiley [mailto:kwi...@keithwiley.com] 
Sent: 19 February 2013 15:10
To: user@hadoop.apache.org
Subject: Re: Namenode formatting problem

Hmmm, okay.  Thanks.  Umm, is this a Yarn thing because I also tried it with
Hadoop 2.0 MR1 (which I think should behave almost exactly like older
versions of Hadoop) and it had the exact same problem.  Does H2.0MR1 us
journal nodes?  I'll try to read up more on this later today.  Thanks for
the tip.

On Feb 18, 2013, at 16:32 , Azuryy Yu wrote:

> Because journal nodes are also be formated during NN format, so you need
to start all JN daemons firstly.
> 
> On Feb 19, 2013 7:01 AM, "Keith Wiley"  wrote:
> This is Hadoop 2.0.  Formatting the namenode produces no errors in the
shell, but the log shows this:
> 
> 2013-02-18 22:19:46,961 FATAL 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode 
> join
> java.net.BindException: Problem binding to [ip-13-0-177-110:9212]
java.net.BindException: Cannot assign requested address; For more details
see:  http://wiki.apache.org/hadoop/BindException
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:710)
> at org.apache.hadoop.ipc.Server.bind(Server.java:356)
> at org.apache.hadoop.ipc.Server$Listener.(Server.java:454)
> at org.apache.hadoop.ipc.Server.(Server.java:1833)
> at org.apache.hadoop.ipc.RPC$Server.(RPC.java:866)
> at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java
:375)
> at
org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:350
)
> at org.apache.hadoop.ipc.RPC.getServer(RPC.java:695)
> at org.apache.hadoop.ipc.RPC.getServer(RPC.java:684)
> at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcS
erver.java:238)
> at
org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.jav
a:452)
> at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:434
)
> at
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:608)
> at
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:589)
> at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java
:1140)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:120
> 4)
> 2013-02-18 22:19:46,988 INFO org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1
> 2013-02-18 22:19:46,990 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NameNode at ip-13-0-177-11/127.0.0.1 
> /
> 
> No java processes begin (although I wouldn't expect formatting the
namenode to start any processes, only starting the namenode or datanode
should do that), and "hadoop fs -ls /" gives me this:
> 
> ls: Call From [CLIENT_HOST]/127.0.0.1 to [MASTER_HOST]:9000 failed on 
> connection exception: java.net.ConnectException: Connection refused; 
> For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
> 
> My /etc/hosts looks like this:
> 127.0.0.1   localhost localhost.localdomain CLIENT_HOST
> MASTER_IP MASTER_HOST master
> SLAVE_IP SLAVE_HOST slave01
> 
> This is on EC2.  All of the nodes are in the same security group and the
security group has full inbound access.  I can ssh between all three
machines (client/master/slave) without a password ala authorized_keys.  I
can ping the master node from the client machine (although I don't know how
to ping a specific port, such as the hdfs port (9000)).  Telnet doesn't
behave on EC2 which makes port testing a little difficult.
> 
> Any ideas?



Keith Wiley kwi...@keithwiley.com keithwiley.com
music.keithwiley.com

"What I primarily learned in grad school is how much I *don't* know.
Consequently, I left grad school with a higher ignorance to knowledge ratio
than when I entered."
   --  Keith Wiley
__

RE: getimage failed in Name Node Log

2013-02-15 Thread Vijay Thakorlal
Hi Janesh,

 

I think your SNN may be starting up with the wrong IP, I'm sure the machine
parameter should say 192.168.0.101?

 

http://namenode:50070/getimage?putimage=1

&port=50090&machine=0.0.0.0&token=-32:1989419481:0:136084943:13608491228
45

 

Are you able to retrieve the fsimage from the SNN from the command line?
Using curl or wget:

 

wget  'http://192.168.0.105:50070/getimage?getimage=1' -O fsimage.dmp

 

If this actually retrieves an error page then the NN is reachable from the
SNN and the port is definitely open. Otherwise double check that this is not
due to the OS firewall blocking the connection, assuming it is on?

 

That said the PrivilegedActionException in the error may actually mean it's 

 

Vijay

 

From: janesh mishra [mailto:janeshmis...@gmail.com] 
Sent: 15 February 2013 12:27
To: user@hadoop.apache.org
Subject: getimage failed in Name Node Log

 

Hi, 

I am new in Hadoop and i set the hadoop cluster with the help of Michell
Noll Multi-Node setup
(http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-
node-cluster/). When i setup the single Node Hadoop then every things works
fine. 

But in Multi Node setup i found that my fsimage and editlogs files are not
updated on SNN, roll back of edit is done i have edit.new on NN 

Logs Form NN: 

2013-02-14 19:13:52,468 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hduser cause:java.net.ConnectException: Connection refused 

2013-02-14 19:13:52,468 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hduser cause:java.net.ConnectException: Connection refused 

2013-02-14 19:13:52,477 WARN org.mortbay.log: /getimage:
java.io.IOException: GetImage failed. java.net.ConnectException: Connection
refused 

Logs From SNN: 

-- 

2013-02-14 19:13:52,350 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
namenode:50070putimage=1&port=50090&machine=0.0.0.0&token=32:1989419481:0:13
6084943:1360849122845 

2013-02-14 19:13:52,374 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
doCheckpoint:

2013-02-14 19:13:52,375 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
java.io.FileNotFoundException: 

http://namenode:50070/getimage?putimage=1

&port=50090&machine=0.0.0.0&token=-32:1989419481:0:136084943:13608491228
45 

at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection
.java:1613) 

atorg.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(Trans
ferFsImage.java:160) at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.putFSImage(Secondar
yNameNode.java:377) at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(Second
aryNameNode.java:418) at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNam
eNode.java:312) at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNo
de.java:275) at java.lang.Thread.run(Thread.java:722) 

My setup includes 

Version : hadoop-1.0.4 

   1. Name Node (192.168.0.105) 

   2. Secondary Name Node (192.168.0.101) 

   3. Data Node (192.168.0.100) 

Name Node also works as Data Node. 

Conf File For Name Node: 

core-hdfs.xml 

- 

 

 

 

 

 

hadoop.tmp.dir 

/app/hadoop/tmp 

A base for other temporary directories. 

 

 

fs.default.name 

hdfs://namenode:54310 

The name of the default file system. A URI whose 

scheme and authority determine the FileSystem implementation. The 

uri's scheme determines the config property (fs.SCHEME.impl) naming 

the FileSystem implementation class. The uri's authority is used to 

determine the host, port, etc. for a filesystem. 

 

 

fs.checkpoint.period 

300 

The number of seconds between two periodic checkpoints. 

 

 

 

hdfs-site.xml 

- 

 

 

 

 

 

dfs.replication 

2 

Default block replication. 

The actual number of replications can be specified when the file is created.


The default is used if replication is not specified in create time. 

 

 

 

dfs.hosts 

/usr/local/hadoop/includehosts 

ips that works as datanode 

 

 

dfs.namenode.secondary.http-address 

secondarynamenode:50090 

 

The address and the base port on which the dfs NameNode Web UI will listen. 

If the port is 0, the server will start on a free port. 

 

 

 

dfs.http.address 

namenode:50070 

 

The address and the base port on which the dfs NameNode Web UI will listen. 

If the port is 0, the server will start on a free port. 

 

 

 

I sync these file to all my nodes. (I read somewhere in Cloud Era doc that
all nodes should have same conf files). 

Please help me out. 

Thanks 

Janesh 



RE: Error for Pseudo-distributed Mode

2013-02-12 Thread Vijay Thakorlal
Hi,

 

Could you first try running the example:

$ /usr/bin/hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar
grep input output 'dfs[a-z.]+'

 

Do you receive the same error?

 

Not sure if it's related to a lack of RAM, but as the stack trace shows
errors with "network" timeout (I realise that you're running in
pseudo-distributed mode):

 

Caused by: com.google.protobuf.ServiceException:
java.net.SocketTimeoutException: Call From localhost.localdomain/127.0.0.1
to localhost.localdomain:54113 failed on socket timeout exception:
java.net.SocketTimeoutException: 6 millis timeout while waiting for
channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/127.0.0.1:60976 remote=localhost.localdomain/127.0.0.1:54113]; For
more details see:  http://wiki.apache.org/hadoop/SocketTimeout


 

Your best bet is probably to start with checking the items mentioned in the
wiki page linked to above. While the default firewall rules (on CentOS)
usually allows pretty much all traffic on the lo interface it might be worth
temporarily turning off iptables (assuming it is on).

 

Vijay

 

 

 

From: yeyu1899 [mailto:yeyu1...@163.com] 
Sent: 12 February 2013 12:58
To: user@hadoop.apache.org
Subject: Error for Pseudo-distributed Mode

 

Hi all,

I installed a redhat_enterprise-linux-x86 in VMware Workstation, and set the
virtual machine 1G memory. 

 

Then I followed steps guided by "Installing CDH4 on a Single Linux Node in
Pseudo-distributed Mode" --
https://ccp.cloudera.com/display/CDH4DOC/Installing+CDH4+on+a+Single+Linux+N
ode+in+Pseudo-distributed+Mode.

 

When at last, I ran an example Hadoop job with the command "$ hadoop jar
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23
'dfs[a-z.]+'"

 

then the screen showed as follows, 

depending "AttemptID:attempt_1360528029309_0001_r_00_0 Timed out after
600 secs" and I wonder is that because my virtual machine's memory too
little~~??

 

[hadoop@localhost hadoop-mapreduce]$ hadoop jar
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23
'dfs[a-z]+'


13/02/11 04:30:44 WARN mapreduce.JobSubmitter: No job jar file set.  User
classes may not be found. See Job or Job#setJar(String).


13/02/11 04:30:44 INFO input.FileInputFormat: Total input paths to process :
4   

13/02/11 04:30:45 INFO mapreduce.JobSubmitter: number of splits:4


13/02/11 04:30:45 WARN conf.Configuration: mapred.output.value.class is
deprecated. Instead, use mapreduce.job.output.value.class


13/02/11 04:30:45 WARN conf.Configuration: mapreduce.combine.class is
deprecated. Instead, use mapreduce.job.combine.class


13/02/11 04:30:45 WARN conf.Configuration: mapreduce.map.class is
deprecated. Instead, use mapreduce.job.map.class


13/02/11 04:30:45 WARN conf.Configuration: mapred.job.name is deprecated.
Instead, use mapreduce.job.name

13/02/11 04:30:45 WARN conf.Configuration: mapreduce.reduce.class is
deprecated. Instead, use mapreduce.job.reduce.class


13/02/11 04:30:45 WARN conf.Configuration: mapred.input.dir is deprecated.
Instead, use mapreduce.input.fileinputformat.inputdir


13/02/11 04:30:45 WARN conf.Configuration: mapred.output.dir is deprecated.
Instead, use mapreduce.output.fileoutputformat.outputdir


13/02/11 04:30:45 WARN conf.Configuration: mapreduce.outputformat.class is
deprecated. Instead, use mapreduce.job.outputformat.class


13/02/11 04:30:45 WARN conf.Configuration: mapred.map.tasks is deprecated.
Instead, use mapreduce.job.maps   

13/02/11 04:30:45 WARN conf.Configuration: mapred.output.key.class is
deprecated. Instead, use mapreduce.job.output.key.class


13/02/11 04:30:45 WARN conf.Configuration: mapred.working.dir is deprecated.
Instead, use mapreduce.job.working.dir


13/02/11 04:30:46 INFO mapred.YARNRunner: Job jar is not present. Not adding
any jar to the list of resources.   

13/02/11 04:30:46 INFO mapred.ResourceMgrDelegate: Submitted application
application_1360528029309_0001 to ResourceManager at /0.0.0.0:8032


13/02/11 04:30:46 INFO mapreduce.Job: The url to track the job:
http://localhost.localdomain:8088/proxy/application_1360528029309_0001/


13/02/11 04:30:46 INFO mapreduce.Job: Running job: job_1360528029309_0001


13/02/11 04:31:01 INFO mapreduce.Job: Job job_1360528029309_0001 running in
uber mode : false

13/02/11 04:31:01 INFO mapreduce.Job:  map 0% reduce 0%


13/02/11 04:47:22 INFO mapreduce.Job: Task Id :
attempt_1360528029309_0001_r_00_0, Status : FAILED   

AttemptID:attempt_1360528029309_0001_r_00_0 Timed out after 600 secs


cleanup failed for container container_1360528029309_0001_01_06 :
java.lang.reflect.UndeclaredThrowableException


at
org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAn
dThrowException(YarnRemoteExceptionPBImpl.java:135)


at
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.stopC
onta

RE: Reg Too many fetch-failures Error

2013-02-01 Thread Vijay Thakorlal
Hi Manoj,

 

As you may be aware this means the reduces are unable to fetch intermediate
data from TaskTrackers that ran map tasks - you can try:

* increasing tasktracker.http.threads so there are more threads to handle
fetch requests from reduces. 

* decreasing mapreduce.reduce.parallel.copies : so fewer copy / fetches are
performed in parallel

 

It could also be due to a temporary DNS issue.

 

See slide 26 of this presentation for potential causes for this message:
http://www.slideshare.net/cloudera/hadoop-troubleshooting-101-kate-ting-clou
dera

 

Not sure why you did not receive the problem before but was it the same data
or different data? Did you have other jobs running on your cluster?

 

Hope that helps

 

Regards

Vijay

 

From: Manoj Babu [mailto:manoj...@gmail.com] 
Sent: 01 February 2013 15:09
To: user@hadoop.apache.org
Subject: Reg Too many fetch-failures Error

 

Hi All,

 

I am getting Too many fetch-failures exception.

What might be the reason for this exception, For same size of data i dint
face this error earlier and there is change in code.

How to avoid this?

 

Thanks in advance.

 

Cheers!

Manoj.



RE: Maximum Storage size in a Single datanode

2013-01-30 Thread Vijay Thakorlal
Jeba,

 

I'm not aware of any hadoop limitations in this respect (others may be able
to comment on this); since blocks are just files on the OS, the datanode
will create subdirectories to store blocks to avoid problems with large
numbers of files in a single directory. So I would think the limitations are
primarily around the type of file system you select, for ext3 it
theoretically supports up to 16TB (http://en.wikipedia.org/wiki/Ext3) and
for ext4 up to 1EB (http://en.wikipedia.org/wiki/Ext4). Although you're
probably already planning on deploying 64-bit servers, I believe for large
FS on ext4 you'd be better off with a 64-bit server.

 

As far as OS is concerned anecdotally (based on blogs, hadoop mailing lists
etc) I believe there are more production deployments using RHEL and/or
CentOS than Ubuntu. 

 

It's probably not practical to have nodes with 1PB of data for the reasons
that others have mentioned and due to the replication traffic that will be
generated if the node dies. Not to mention fsck times with large file
systems.

 

Vijay

 

 

 

From: jeba earnest [mailto:jebaearn...@yahoo.com] 
Sent: 30 January 2013 10:40
To: user@hadoop.apache.org
Subject: Re: Maximum Storage size in a Single datanode

 

 

I want to use either UBUNTU or REDHAT .

I just want to know how much storage space we can allocate in a single data
node.

 

Is there any limitations in hadoop for storage in single node?

 

 

 

Regards,

Jeba

  _  

From: "Pamecha, Abhishek" 
To: "user@hadoop.apache.org" ; jeba earnest
 
Sent: Wednesday, 30 January 2013 2:45 PM
Subject: Re: Maximum Storage size in a Single datanode

 

What would be the reason you would do that? 

 

You would want to leverage distributed dataset for higher availability and
better response times.

 

The maximum storage depends completely on the disks  capacity of your nodes
and what your OS supports. Typically I have heard of about 1-2 TB/node to
start with, but I may be wrong.

-abhishek

 

 

From: jeba earnest 
Reply-To: "user@hadoop.apache.org" , jeba earnest

Date: Wednesday, January 30, 2013 1:38 PM
To: "user@hadoop.apache.org" 
Subject: Maximum Storage size in a Single datanode

 

 

Hi,



Is it possible to keep 1 Petabyte in a single data node?

If not, How much is the maximum storage for a particular data node? 

 

Regards,
M. Jeba

 



RE: Maximum Storage size in a Single datanode

2013-01-30 Thread Vijay Thakorlal
Hi Jeba,

 

There are other considerations too, for example, if a single node holds 1 PB
of data and if it were to die this would cause a significant amount of
traffic as NameNode arranges for new replicas to be created.

 

Vijay

 

From: Bertrand Dechoux [mailto:decho...@gmail.com] 
Sent: 30 January 2013 09:14
To: user@hadoop.apache.org; jeba earnest
Subject: Re: Maximum Storage size in a Single datanode

 

I would say the hard limit is due to the OS local file system (and your
budget).

So short answer for ext3 : it doesn't seems so.
http://en.wikipedia.org/wiki/Ext3

And I am not sure the answer is the most interesting. Even if you could put
1 Peta on one node, what is usually interesting is the ratio
storage/compute.

Bertrand

On Wed, Jan 30, 2013 at 9:08 AM, jeba earnest  wrote:

 

Hi,



Is it possible to keep 1 Petabyte in a single data node?

If not, How much is the maximum storage for a particular data node?

 

Regards,
M. Jeba