I see your point.

I have one more question. If you are writing 10 blocks for a file and let's
say in 10th block namenode fails, all previous 9 blocks are lost because you
were not able to close the file and therefore namenode did not persist the
information about 9 blocks to the fsimage file.

How would you solve this problem? Why does the namenode not persist every
block and wait until closing of the file? Then, don't you need to keep a
copy of all the data you are writing to hadoop file until you close the file
successfully. Does it make sense to have this semantics with the assumption
of very large files?

Thanks for your response,

Cagdas

On Fri, May 2, 2008 at 2:29 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:

>
>
> First of all, your 10MB files will only be 10MB long and will take one
> block
> rather than 5.  This is a win already.
>
> Secondly, you can have a consolidation process that merges your small
> files
> every hour or day into large files.  By building the consolidated file in
> a
> side directory that is moved into the right place when it is complete, you
> can have a very short window when things don't look quite right.  If you
> use
> an external transactional store such as zookeeper to keep the state
> straight, you can guarantee data integrity.
>
> When the archive file capability is available, this will provide you with
> a
> very nice way to do this.
>
> This does require a bit of machinery, but it provides order of magnitude
> or
> better improvement in hadoop performance.  These two suggestions together
> can give you >30x boost.
>
>
> On 5/2/08 2:21 PM, "Cagdas Gerede" <[EMAIL PROTECTED]> wrote:
>
> >> But you could do all this with larger blocks as well.  Having a large
> > block
> >> size only says that a block CAN be that long, not that it MUST be that
> > long.
> >
> > No you cannot.
> >
> > Imagine a streaming server where users send real time generated data to
> your
> > server and each file is not more than 100MB. Let's assume user do not
> have
> > more than 10 MB of local cache space. So user cannot keep more than 10
> MB of
> > data while he is generating the data. So user caches the data, and
> streams
> > it to your server. As one chunk of data accumulates, your server writes
> that
> > chunk to Hadoop, gets confirmation from Hadoop and sends an ack to the
> user
> > so that user can delete data from his cache (because data is persisted).
> > This way you are making the system tolerant to the failure of your
> servers.
> >
> > How would you do the same thing with a block size of 100MB?
> > What am I missing?
> >
> > Cagdas
> >
> > On Fri, May 2, 2008 at 1:20 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> >
> >>
> >>
> >>
> >> Also, you said that the average size was ~ 40MB (20 x 2MB blocks).  If
> >> that
> >> is so, then you should be able to radically decrease the number of
> blocks
> >> with a larger block size.
> >>
>
>


-- 
------------
Best Regards, Cagdas Evren Gerede
Home Page: http://cagdasgerede.info

Reply via email to