Re: Any possible to set hdfs block size to a value smaller than 64MB?

Konstantin Shvachko Wed, 19 May 2010 15:14:37 -0700

Hi Brian,

Interesting observations.
This is probably in line with the "client side mount table" approach, [soon to 
be] proposed in HDFS-1053.
Another way to provide personalized view of the file system would be to use 
symbolic links, which is
available now in 0.21/22.
For 4K files I would probably use h-archives, especially if the data, as you 
describe it, is not changing.
But high-energy physicist should be considering using HBase over HDFS.


--konst


On 5/18/2010 11:57 AM, Brian Bockelman wrote:


Hey Konstantin,

Interesting paper :)

One thing which I've been kicking around lately is "at what scale does the 
file/directory paradigm break down?"

At some point, I think the human mind can no longer comprehend so many files (certainly, 
I can barely organize the few thousand files on my laptop).  Thus, file growth comes from 
(a) having lots of humans use a single file system or (b) automated programs generate the 
files.  For (a), you don't need a central global namespace, you just need the ability to 
have a "local" namespace per-person that can be shared among friends.  For (b), 
a program isn't going to be upset if you replace a file system with a database / dataset 
object / bucket.

Two examples:
- structural biology: I've seen a lot of different analysis workflows (such as autodock) 
that compares a protein against a "database" of ligands, where the database is 
80,000 O(4KB) files.  Each file represents a known ligand that the biologist might come 
back and examine if it is relevant to the study of their protein.
- high-energy physics: Each detector can produce millions of a events a night, 
and experiments will produce many billions of events.  These are saved into 
files (each file containing hundreds or thousands of events); these files are 
kept in collections (called datasets, data blocks, lumi sections, or runs, 
depending on what you're doing).  Tasks are run against datasets; they will 
output smaller datasets which the physicists will iterate upon until they get 
some dataset which fits onto their laptop.
Here's my Claim: The biologists have a small enough number of objects to manage 
each one as a separate file; they do this because it's easier for humans 
navigating around in a terminal.  The physicists have such a huge number of 
objects that there's no way to manage them using one file per object, so they 
utilize files only as a mechanism to serialize bytes of data and have 
higher-order data structures for management
Here's my Question: at what point do you move from the biologist's model (named 
objects, managed independently, single files) to the physicist's model 
(anonymous objects, managed in large groups, files are only used because we 
save data on file systems)?

Another way to look at this is to consider DNS.  DNS maintains the namespace of 
the globe, but appears to do this just fine without a single central catalog.  
If you start with a POSIX filesystem namespace (and the guarantees it implies), 
what rules must you relax in order to arrive at DNS?  On the scale of managing 
million (billion? ten billion? trillion?) files, are any of the assumptions 
relevant?

I don't know the answers to these questions, but I suspect they become 
important over the next 10 years.

Brian

PS - I starting thinking along these lines during MSST when the LLNL guy was speculating 
about what it meant to "fsck" a file system with 1 trillion files.

On May 18, 2010, at 12:56 PM, Konstantin Shvachko wrote:

You can also get some performance numbers and answers to the block size dilemma 
problem here:

http://developer.yahoo.net/blogs/hadoop/2010/05/scalability_of_the_hadoop_dist.html

I remember some people were using Hadoop for storing or streaming videos.
Don't know how well that worked.
It would be interesting to learn about your experience.

Thanks,
--Konstantin


On 5/18/2010 8:41 AM, Brian Bockelman wrote:

Hey Hassan,

1) The overhead is pretty small, measured in a small number of milliseconds on 
average
2) HDFS is not designed for "online latency".  Even though the average is small, if 
something "bad happens", your clients might experience a lot of delays while going 
through the retry stack.  The initial design was for batch processing, and latency-sensitive 
applications came later.

Additionally since the NN is a SPOF, you might want to consider your uptime 
requirements.  Each organization will have to balance these risks with the 
advantages (such as much cheaper hardware).

There's a nice interview with the GFS authors here where they touch upon the 
latency issues:

http://queue.acm.org/detail.cfm?id=1594206

As GFS and HDFS share many design features, the theoretical parts of their 
discussion might be useful for you.

As far as overall throughput of the system goes, it depends heavily upon your 
implementation and hardware.  Our HDFS routinely serves 5-10 Gbps.

Brian

On May 18, 2010, at 10:29 AM, Nyamul Hassan wrote:

This is a very interesting thread to us, as we are thinking about deploying
HDFS as a massive online storage for a on online university, and then
serving the video files to students who want to view them.

We cannot control the size of the videos (and some class work files), as
they will mostly be uploaded by the teachers providing the classes.

How would the overall through put of HDFS be affected in such a solution?
Would HDFS be feasible at all for such a setup?

Regards
HASSAN



On Tue, May 18, 2010 at 21:11, He Chen<airb...@gmail.com>   wrote:

If you know how to use AspectJ to do aspect oriented programming. You can
write a aspect class. Let it just monitors the whole process of MapReduce

On Tue, May 18, 2010 at 10:00 AM, Patrick Angeles<patr...@cloudera.com

wrote:

Should be evident in the total job running time... that's the only metric
that really matters :)

On Tue, May 18, 2010 at 10:39 AM, Pierre ANCELOT<pierre...@gmail.com

wrote:

Thank you,
Any way I can measure the startup overhead in terms of time?


On Tue, May 18, 2010 at 4:27 PM, Patrick Angeles<patr...@cloudera.com

wrote:

Pierre,

Adding to what Brian has said (some things are not explicitly

mentioned

in

the HDFS design doc)...

- If you have small files that take up<   64MB you do not actually use

the

entire 64MB block on disk.
- You *do* use up RAM on the NameNode, as each block represents

meta-data

that needs to be maintained in-memory in the NameNode.
- Hadoop won't perform optimally with very small block sizes. Hadoop

I/O

is

optimized for high sustained throughput per single file/block. There

is

penalty for doing too many seeks to get to the beginning of each

block.

Additionally, you will have a MapReduce task per small file. Each

MapReduce

task has a non-trivial startup overhead.
- The recommendation is to consolidate your small files into large

files.

One way to do this is via SequenceFiles... put the filename in the
SequenceFile key field, and the file's bytes in the SequenceFile

value

field.

In addition to the HDFS design docs, I recommend reading this blog

post:

http://www.cloudera.com/blog/2009/02/the-small-files-problem/

Happy Hadooping,

- Patrick

On Tue, May 18, 2010 at 9:11 AM, Pierre ANCELOT<pierre...@gmail.com

wrote:

Okay, thank you :)


On Tue, May 18, 2010 at 2:48 PM, Brian Bockelman<

bbock...@cse.unl.edu

wrote:


On May 18, 2010, at 7:38 AM, Pierre ANCELOT wrote:

Hi, thanks for this fast answer :)
If so, what do you mean by blocks? If a file has to be

splitted,

it

will

be

splitted when larger than 64MB?


For every 64MB of the file, Hadoop will create a separate block.

So,

if

you have a 32KB file, there will be one block of 32KB.  If the

file

is

65MB,

then it will have one block of 64MB and another block of 1MB.

Splitting files is very useful for load-balancing and

distributing

I/O

across multiple nodes.  At 32KB / file, you don't really need to

split

the

files at all.

I recommend reading the HDFS design document for background

issues

like

this:

http://hadoop.apache.org/common/docs/r0.20.0/hdfs_design.html

Brian




On Tue, May 18, 2010 at 2:34 PM, Brian Bockelman<

bbock...@cse.unl.edu

wrote:

Hey Pierre,

These are not traditional filesystem blocks - if you save a

file

smaller

than 64MB, you don't lose 64MB of file space..

Hadoop will use 32KB to store a 32KB file (ok, plus a KB of

metadata

or

so), not 64MB.

Brian

On May 18, 2010, at 7:06 AM, Pierre ANCELOT wrote:

Hi,
I'm porting a legacy application to hadoop and it uses a

bunch

of

small

files.
I'm aware that having such small files ain't a good idea but

I'm

not

doing

the technical decisions and the port has to be done for

yesterday...

Of course such small files are a problem, loading 64MB blocks

for

few

lines of text is an evident loss.
What will happen if I set a smaller, or even way smaller

(32kB)

blocks?


Thank you.

Pierre ANCELOT.



--
http://www.neko-consulting.com
Ego sum quis ego servo
"Je suis ce que je protège"
"I am what I protect"



--
http://www.neko-consulting.com
Ego sum quis ego servo
"Je suis ce que je protège"
"I am what I protect"




--
http://www.neko-consulting.com
Ego sum quis ego servo
"Je suis ce que je protège"
"I am what I protect"




--
Best Wishes！
顺送商祺！

－－
Chen He
(402)613-9298
PhD. student of CSE Dept.
Holland Computing Center
University of Nebraska-Lincoln
Lincoln NE 68588

Re: Any possible to set hdfs block size to a value smaller than 64MB?

Reply via email to