HDFS: Forcing Master to Leave Safemode

2008-05-14 Thread Cagdas Gerede
What is the implication of manually forcing name node to leave safemode?

What properties do HDFS lose by doing that?

One gain to that is that the file system will be available for writes
immediately.


Cagdas

-- 

Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info


Master Heap Size and Master Startup Time vs. Number of Blocks

2008-05-02 Thread Cagdas Gerede
In the system I am working, we have 6 million blocks total and the namenode
heap size is about 600 MB and it takes about 5 minutes for namenode to leave
the safemode.

I try to estimate what would be the heap size if we have 100 - 150 million
blocks, and what would be the amount of time for namenode to leave the
safemode.
From the extrapolation based on the numbers I have, I am calculating very
scary numbers for both (Terabytes for heap size) and half an hour or so
startup time. I am hoping that my extrapolation is not accurate.

From your clusters, could you provide some numbers for number of files and
blocks in the system vs. the master heap size and master startup time.

I really appreciate your help.

Thanks.
Cagdas


-- 

Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info


Re: Master Heap Size and Master Startup Time vs. Number of Blocks

2008-05-02 Thread Cagdas Gerede
Thanks Doug for your answers. Our interest is on more distributed file
system part rather than map reduce.
I must confess that our block size is not as large as how a lot of people
configure. I appreciate if I can get your and others' input.

Do you think these numbers are suitable?

We will have 5 million files each having 20 blocks of 2MB. With the minimum
replication of 3, we would have 300 million blocks.
300 million blocks would store 600TB. At ~10TB/node, this means a 60 node
system.

Do you think these numbers are suitable for Hadoop DFS.

Cagdas



At ~100M per block, 100M blocks would store 10PB.  At ~1TB/node, this means
 a ~10,000 node system, larger than Hadoop currently supports well (for this
 and other reasons).

 Doug




-- 

Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info


Re: Master Heap Size and Master Startup Time vs. Number of Blocks

2008-05-02 Thread Cagdas Gerede
fault tolerance. As files are uploaded into our server, we can continuously
write the data in small chunks and if our server fails, we can tolerate this
failure by switching our user to another server and the user can continue to
write. Otherwise we have to wait on the server until we get the whole file
to write it to Hadoop (if server fails then we lose all the data), or we
need the user to cash all the data he is generating which is not feasible
for our requirements.

I appreciate your comment on this.

Cagdas

On Fri, May 2, 2008 at 1:09 PM, Ted Dunning [EMAIL PROTECTED] wrote:


 Why did you pick such a small block size?

 Why not go with the default of 64MB?

 That would give you only 10 million blocks for your 600TB.

 I don't see any advantage to the tiny block size.

 On 5/2/08 1:06 PM, Cagdas Gerede [EMAIL PROTECTED] wrote:

  Thanks Doug for your answers. Our interest is on more distributed file
  system part rather than map reduce.
  I must confess that our block size is not as large as how a lot of
 people
  configure. I appreciate if I can get your and others' input.
 
  Do you think these numbers are suitable?
 
  We will have 5 million files each having 20 blocks of 2MB. With the
 minimum
  replication of 3, we would have 300 million blocks.
  300 million blocks would store 600TB. At ~10TB/node, this means a 60
 node
  system.
 
  Do you think these numbers are suitable for Hadoop DFS.
 
  Cagdas
 
 
 
  At ~100M per block, 100M blocks would store 10PB.  At ~1TB/node, this
 means
  a ~10,000 node system, larger than Hadoop currently supports well (for
 this
  and other reasons).
 
  Doug
 
 
 




-- 

Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info


Re: Block reports: memory vs. file system, and Dividing offerService into 2 threads

2008-05-01 Thread Cagdas Gerede
 As far as I understand, the current focus is on how to reduce namenode's
CPU time to process block reports from a lot of datanodes.

Don't we miss another issue? Doesn't the way a block report is computed
delays the master startup time. I have to make sure the master is up as
quick as possible for maximum availability. The bottleneck seems like the
scanning of the local disk. I wrote a simple java program that only scanned
the datanode directories as Hadoop code did, and the time the java program
took was equivalent to the 90% of the time that took for block report
generation and sending. It seems scanning is very costly. It takes about 2-4
minutes.

 To address the problem, can we have *two types of block reports*. Once is
generated from memory and the other from localfs. For master starts, we can
trigger the block report that is generated from memory, and for periodic
ones we can trigger the block report that is computed from localfs.



Another issue I have is even if we do it block reports every 10 days, once
it happens, it will almost freeze the datanode functions. More specifically,
data node won't be able to report to namenode about new blocks until this
report is computed. This takes at least a couple of minutes in my system for
each datanode. As a result, master thinks a block is not yet replicated
enough and it rejects addition of a new block to a file. Then, since it does
not wait for enough time, it eventually causes the failure of writing a
file. To address the first problem, can we separate this process of scanning
the underlying disk as a separate thread then reporting of newly received
blocks?

Dhruba points out
 This sequential nature is critical in ensuring that there is no erroneous
race condition in the Namenode

I do not have any insight to this.


Cagdas

-- 

Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info


Block reports: memory vs. file system, and Dividing offerService into 2 threads

2008-04-30 Thread Cagdas Gerede
Currently,

Block reports are computed by scanning all the files and folders in the
local disk.
This happens not only in startup, but also in periodic block reports.

What is the intention behind doing this way instead of creating the block
report from the data structure already located in the memory?
One issue with the current approach is that it takes quite some time. For
instance, the system I am working with has 1 million blocks in each data
node and every time block report is computed, it takes about 4 minutes. You
might expect this may not any effect. But there is. The problem is the
computation and sending of block reports and sending the list of the new
received blocks are part of the same thread. As a result, when the block
report computation takes a long time, it delays the reporting of new
received blocks.
As you know when you are writing a multi block file and you request a new
block from the namenode, the namenode checks whether the very previous block
is replicated enough number of times. If at this point the data node is
computing the block report, it is very likely that it didn't inform the
namenode about the very previous block yet. As a result, the namenode
rejects this. Then client tries to repeat this 5 more times while doubling
the wait time in between up to about 6 seconds (starts with 200ms and
doubles it 5 times). Then, client raises an exception to the application.
Given that the block report computation takes minutes, this situation is
very likely to occur.

Do you have any suggestions on how to handle this situation?
Are there any plans to take block reporting code segment and reporting of
new received blocks to separate threads (ref: offerService method of
DataNode.java file)?

Thanks for your response,


-- 

Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info


Re: Help: When is it safe to discard a block in the application layer

2008-04-24 Thread Cagdas Gerede
In DataStreamer class (in DFSClient.java), there is a line in run() method
like this:

* if (progress != null) { progress.progress(); }*

I think the progress is called only if the block is replicated at least
minimum number of times.

I can pass my progress object and wait on it until this method is invoked to
delete my application cache.


Does this seem right? Am I missing something?

Thanks for your answer.

-- 

Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info


On Thu, Apr 17, 2008 at 10:21 AM, dhruba Borthakur [EMAIL PROTECTED]
wrote:

  The DFSClient caches small packets (e.g. 64K write buffers) and they are
 lazily flushed to the datanoeds in the pipeline. So, when an application
 completes a out.write() call, it is definitely not guaranteed that data is
 sent to even one datanode.



 One option would be to retrieve cache hints from the Namenode and determine
 if the block has three locations.



 Thanks,

 dhruba


  --

 *From:* Cagdas Gerede [mailto:[EMAIL PROTECTED]
 *Sent:* Wednesday, April 16, 2008 7:40 PM
 *To:* core-user@hadoop.apache.org
 *Subject:* Help: When is it safe to discard a block in the application
 layer



 I am working on an application on top of Hadoop Distributed File System.

 High level flow goes like this: User data block arrives to the application
 server. The application server uses DistributedFileSystem api of Hadoop and
 write the data block to the file system. Once the block is replicated three
 times, the application server will notify the user so that the user can get
 rid of the data since it is now in a persistent fault tolerant storage.

 I couldn't figure out the following. Let's say, this is my sample program
 to write a block.

 byte data[] = new byte[blockSize];
 out.write(data, 0, data.length);
 ...

 where out is
 out = fs.create(outFile, FsPermission.getDefault(), true, 4096,
 (short)replicationCount, blockSize, progress);


 My application writes the data to the stream and then it goes to the next
 line. At this point, I believe I cannot be sure that the block is replicated
 at least, say 3, times. Possibly, under the hood, the DFSClient is still
 trying to push this data to others.

 Given this, how is my application going to know that the data block is
 replicated 3 times and it is safe to discard this data?

 There are a couple of things you might think:
 1) Set the minimum replica property to 3: Even if you do this, the
 application still goes to the next line before the data actually replicated
 3 times.
 2) Right after you write, you continuously get cache hints from master and
 check if master is aware of 3 replicas of this block: My problem with this
 approach is that the application will wait for a while for every block it
 needs to store since it will take some time for datanodes to report and
 master to process the blockreports. What is worse, if some datanode in the
 pipeline fails, we have no way of knowing the error.

 To sum-up, I am not sure when is the right time to discard a block of data
 with the guarantee that it is replicated certain number of times.

 Please help,

 Thanks,
 Cagdas



Benchmarking and Statistics for Hadoop Distributed File System

2008-04-23 Thread Cagdas Gerede
Does any body aware of any benchmarking of hadoop distributed file system?

Some numbers I am interested in are
- How long does it take for master to recover if there are, say, 1 million
blocks in the system?
- How does the recovery time change as the number of blocks in the system
change? Is it linear? Is it exponential?
- What is the file read/write throughput of Hadoop File System with
different configurations and loads?



-- 

Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info


Please Help: Namenode Safemode

2008-04-23 Thread Cagdas Gerede
I have a hadoop distributed file system with 3 datanodes. I only have 150
blocks in each datanode. It takes a little more than a minute for namenode
to start and pass safemode phase.

The steps for namenode start, as much as I understand, are:
1) Datanode send a heartbeat to namenode. Namenode tells datanode to send
blockreport as a piggyback to heartbeat.
2) Datanode computes the block report.
3) Datanode sends it to Namenode.
4) Namenode processes the block report.
5) Namenode safe mode thread monitor checks for exiting, and namenode exist
if threshold is reached and the extension time is passed.

Here are my numbers:
Step 1) Datanodes send heartbeats every 3 seconds.
Step 2) Datanode computes the block report. (this takes about 20 miliseconds
- as shown in the datanodes' logs)
Step 3) No idea? (Depends on the size of blockreport. I suspect this should
not be more than a couple of seconds).
Step 4) No idea? Shouldn't be more than a couple of seconds.
Step 5) Thread checks every second. The extension value in my configuration
is 0. So there is no wait if threshold is achieved.

Given these numbers, can any body explain where does one minute come from?
Shouldn't this step take 10-20 seconds?
Please help. I am very confused.



-- 

Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info


Re: multiple datanodes in the same machine

2008-04-15 Thread cagdas . gerede
Testing when I do not have 10 machines.


On 4/15/08, Ted Dunning [EMAIL PROTECTED] wrote:

 Why do you want to do this perverse thing?

 How does it help to have more than one datanode per machine?  And what in
 the world is better when you have 10?


 On 4/15/08 12:53 PM, Cagdas Gerede [EMAIL PROTECTED] wrote:

  I have a follow-up question,
  Is there a way to programatically configure datanode parameters and start
  the datanode process?
  If I want to create 10 datanodes on the same host, do I have to create 10
  config files?
 
 
  On Tue, Apr 15, 2008 at 12:29 PM, dhruba Borthakur [EMAIL PROTECTED]
  wrote:
 
  Yes, just point the Datanodes to different config files, different sets
  of ports, different data directories. Etc.etc.
 
  Thanks,
  dhruba
 
  -Original Message-
  From: Cagdas Gerede [mailto:[EMAIL PROTECTED]
  Sent: Tuesday, April 15, 2008 11:21 AM
  To: core-user@hadoop.apache.org
  Subject: multiple datanodes in the same machine
 
  Is there a way to run multiple datanodes in the same machine?
 
 
  --
  
  Best Regards, Cagdas Evren Gerede
  Home Page: http://cagdasgerede.info
 
 
 




-- 

Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info


Re: multiple datanodes in the same machine

2008-04-15 Thread Cagdas Gerede
I am working on Distributed File System part. I do not use MR part,
and I need to run multiple processes to test some scenarios on the file
system.

On Tue, Apr 15, 2008 at 1:37 PM, Ted Dunning [EMAIL PROTECTED] wrote:


 I have had no issues in scaling the number of datanodes.  The location of
 the data is almost invisible to MR programs.

 I have had issues in going from local to distributed mode, but that has
 entirely been due to class path like issues.  Since MR naturally restricts
 your focus, it is pretty much the rule that programs scale without much
 thought.

 If you test with two tasktrackers and one data node, you should have a
 pretty solid test environment.


 On 4/15/08 1:12 PM, [EMAIL PROTECTED] [EMAIL PROTECTED]
  wrote:

  Testing when I do not have 10 machines.
 
 
  On 4/15/08, Ted Dunning [EMAIL PROTECTED] wrote:
 
  Why do you want to do this perverse thing?
 
  How does it help to have more than one datanode per machine?  And what
 in
  the world is better when you have 10?
 
 
  On 4/15/08 12:53 PM, Cagdas Gerede [EMAIL PROTECTED] wrote:
 
  I have a follow-up question,
  Is there a way to programatically configure datanode parameters and
 start
  the datanode process?
  If I want to create 10 datanodes on the same host, do I have to create
 10
  config files?
 
 
  On Tue, Apr 15, 2008 at 12:29 PM, dhruba Borthakur 
 [EMAIL PROTECTED]
  wrote:
 
  Yes, just point the Datanodes to different config files, different
 sets
  of ports, different data directories. Etc.etc.
 
  Thanks,
  dhruba
 
  -Original Message-
  From: Cagdas Gerede [mailto:[EMAIL PROTECTED]
  Sent: Tuesday, April 15, 2008 11:21 AM
  To: core-user@hadoop.apache.org
  Subject: multiple datanodes in the same machine
 
  Is there a way to run multiple datanodes in the same machine?
 
 
  --
  
  Best Regards, Cagdas Evren Gerede
  Home Page: http://cagdasgerede.info
 
 
 
 
 
 




-- 

Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info


HDFS: What happens when a harddrive fails

2008-03-26 Thread Cagdas Gerede
I was wondering
1) what happens if a data node is alive but its harddrive fails? Does it
throw an exception and dies?
2) If It continues to run and continue to do blockreporting, is there a
console showing datanodes with healthy hard drives and unhealthy hard
drives? I know the web server of the name node shows runing and not running
data nodes, but I am not sure if it differentiates between datanodes with
healthy and unhealthy hard drives.

Thanks for your help
Cagdas


HDFS: how to append

2008-03-18 Thread Cagdas Gerede
The HDFS documentation says it is possible to append to an HDFS file.

In org.apache.hadoop.dfs.DistributedFileSystem class,
there is no method to open an existing file for writing (there are
methods for reading).
Only similar methods are create methods which return FSDataOutputStream.
When I look at FSDataOutputStream class, it seems there is no append
method, and
all write methods overwrite an existing file or return an error if
such a file exists.

Does anybody know how to append to a file in HDFS?
I appreciate your help.
Thanks,

Cagdas


HDFS: Flash Application and Available APIs

2008-03-18 Thread Cagdas Gerede
I have two questions:

- I was wondering if an HDFS client can be invoked from a Flash application.
- What are the available APIs for HDFS? (I read that there is a C/C++
api for Hadoop Map/Reduce but is there a C/C++ API for HDFS? or Can it
only be invoked from a Java application?


Thanks for your help,
Cagdas


HadoopDfsReadWriteExample

2008-03-13 Thread Cagdas Gerede
I tried HadoopDfsReadWriteExample. I am getting the following error. I
appreciate any help. I provide more info at the end.


Error while copying file
Exception in thread main java.io.IOException: Cannot run program
df: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
at java.lang.Runtime.exec(Runtime.java:593)
at java.lang.Runtime.exec(Runtime.java:466)
at org.apache.hadoop.fs.ShellCommand.runCommand(ShellCommand.java:48)
at org.apache.hadoop.fs.ShellCommand.run(ShellCommand.java:42)
at org.apache.hadoop.fs.DF.getAvailable(DF.java:72)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:296)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:326)
at 
org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1483)
at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.openBackupStream(DFSClient.java:1450)
at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:1592)
at 
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:140)
at 
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:122)
at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1728)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49)
at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64)
at HadoopDFSFileReadWrite.main(HadoopDFSFileReadWrite.java:106)
Caused by: java.io.IOException: CreateProcess error=2, The system
cannot find the file specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.init(ProcessImpl.java:81)
at java.lang.ProcessImpl.start(ProcessImpl.java:30)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
... 17 more



Note: I am on a Windows machine. The namenode is running in the same
Windows machine. The way I initialized the configuration is:

Configuration conf = new Configuration();
conf.addResource(new
Path(C:\\cygwin\\hadoop-management\\hadoop-conf\\hadoop-site.xml));
FileSystem fs = FileSystem.get(conf);


Any suggestions?

Cagdas


Fault Tolerance: Inquiry for approaches to solve single point of failure when name node fails

2008-03-13 Thread Cagdas Gerede
I have a question. As we know, the name node forms a single point of failure.
In a production environment, I imagine a name node would run in a data
center. If that data center
fails, how would you a put a new name node in place in another data
center that can take over without minimum interruption?

I was wondering if anyone has any experience/ideas/comments on this.

Thanks

-Cagdas


Re: Fault Tolerance: Inquiry for approaches to solve single point of failure when name node fails

2008-03-13 Thread Cagdas Gerede
 If your data center fails, then you probably have to worry more about how to 
 get your data.

I assume having multiple data centers. I know thanks to HDFS
replication data in the other data center will be enough.
However, as much as I see for now, HDFS has no support for replication
of namenode.
Is this true?
If there is no automated support, and If I need to do this replication
with some custom code or manual intervention,
what are the steps to do this replication?

Any help is appreciated.

Cagdas


Re: HDFS interface

2008-03-12 Thread Cagdas Gerede
I would like to use HDFS component of Hadoop but not interested in
MapReduce.
All the Hadoop examples I have seen so far uses MapReduce classes and from
these examples there is no reference to HDFS classes including File System
API of Hadoop 
(http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/fs/FileSystem.html
)http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/fs/FileSystem.html
Everything seems to happen under the hood.

I was wondering if there is any example source code that is using HDFS
directly.


Thanks,

- CEG


Searching email list

2008-03-12 Thread Cagdas Gerede
Is there an easy way to search this email list?
I couldn't find any web interface.

Please help.


CEG


Re: HDFS interface

2008-03-12 Thread Cagdas Gerede
I see the following paragraphs in the wiki (
http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample)http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample

Create a [image: [WWW]]
FileSystemhttp://hadoop.apache.org/core/api/org/apache/hadoop/fs/FileSystem.htmlinstance
by passing a new Configuration object. Please note that the
following example code assumes that the Configuration object will
automatically load the *hadoop-default.xml* and
*hadoop-site.xml*configuration files. You may need to explicitly add
these resource paths if
you are not running inside of the Hadoop runtime environment.

and

 Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);

When I do

Path[] apples = fs.globPaths(new Path(*));
for(Path apple : apples) {
System.out.println(apple);
}


It prints out all the local file names.

How do I point my application to running HDFS instance?
What does explicitly add these resource paths if you are not running inside
of the Hadoop runtime environment. mean?

Thanks,

- CEG


Re: HDFS interface

2008-03-12 Thread Cagdas Gerede
I found the solution.  Please let me know if you have a better idea.

I added the following addResource lines.

Configuration conf = new Configuration();

conf.addResource(new Path(location_of_hadoop-default.xml));
conf.addResource(new Path(location_of_hadoop-site.xml));

FileSystem fs = FileSystem.get(conf);

(Would be good to update the wiki page).

- CEG


On Wed, Mar 12, 2008 at 5:04 PM, Cagdas Gerede [EMAIL PROTECTED]
wrote:

 I see the following paragraphs in the wiki (
 http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample)http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample

 Create a [image: [WWW]] 
 FileSystemhttp://hadoop.apache.org/core/api/org/apache/hadoop/fs/FileSystem.htmlinstance
  by passing a new Configuration object. Please note that the
 following example code assumes that the Configuration object will
 automatically load the *hadoop-default.xml* and 
 *hadoop-site.xml*configuration files. You may need to explicitly add these 
 resource paths if
 you are not running inside of the Hadoop runtime environment.

 and

  Configuration conf = new Configuration();
 FileSystem fs = FileSystem.get(conf);

 When I do

 Path[] apples = fs.globPaths(new Path(*));
 for(Path apple : apples) {
 System.out.println(apple);
 }


 It prints out all the local file names.

 How do I point my application to running HDFS instance?
 What does explicitly add these resource paths if you are not running
 inside of the Hadoop runtime environment. mean?

 Thanks,

 - CEG






-- 

Best Regards, Cagdas Evren Gerede
Home Page: http://www.cs.ucsb.edu/~gerede
Pronunciation: http://www.cs.ucsb.edu/~gerede/cagdas.html