Hi,

HDFS does store the data how you writing to it. It will not organize the data. 
HDFS has flexibility I terms of placements.
If you want to write in this fashion bunch of blocks should be allocated once 
and client should write all of them based on you portion. Which is sounding 
something similar to striping approach. Which is not supported right now in 
HDFS. It is being developed with erasure codes branch. HDFS-7285. Correct me if 
I misunderstood ur needs here.

Regards,
Uma

-----Original Message-----
From: Abhishek Das [mailto:abhishek.b...@gmail.com] 
Sent: Tuesday, February 17, 2015 11:14 PM
To: hdfs-dev
Subject: Re: Block creation in HDFS

Hi,

Thanks Vinay for your response. I dont need blocks of variable size. But 
setting only the block size probably wont help in my case. Let me give an 
example to explain what I am trying to do.

Let say the main file has 12 integers 1 to 12. The block size is such that each 
block will have 3 integers. Now if I ask hdfs to create the blocks, it would 
create 4 blocks - first one would have 1-3, second one would have 4-7. 
According to my requirement, the data in the main file is partitioned into 3 
clusters. (1,2,3,4), (5,6,7,8) and (9,10,11,12). Now when the blocks will be 
created, I need data from all partitions get represented in each block. So in 
this case, the first block would have (1,5,9), second one would have (2,6,10) 
etc... So i want to change how the data is allocated in each of the blocks.

Is it feasible to change  the default block creation policy in current 
implementation?

Regards,
Abhishek Das

On Tue, Feb 17, 2015 at 2:25 AM, Vinayakumar B <vinayakum...@apache.org>
wrote:

> Hi abhishek,
> Is Your partitions of same sizes? If yes, then you can set that as 
> block size.
>
> If not you can use the latest feature.. variable block size.
> To verify your use case.
> You can close the current block after each partition data is written 
> and append to new block for new partition data.
> This feature is not yet available in any of the release. Hope to see 
> in future 2.7 release. As of now you can verify in any of the 
> trunk/branch-2 builds.
>
> Hope this helps.
>
> -Vinay
> On Feb 17, 2015 8:30 AM, "Abhishek Das" <abhishek.b...@gmail.com> wrote:
>
> > Hi,
> >
> > I am new in this group. I had a question regarding block creation in
> HDFS.
> > By default the file is split into multiple blocks of size equal to 
> > block size. I need to introduce new block creation policy into the 
> > system. In
> my
> > case the main file is divided into multiple partitions. My goal is 
> > to create the blocks where data is represented from each partition 
> > of the file. Is it possible to introduce the new policy ? If yes, 
> > what would the starting point in the code I should look at.
> >
> > Regards,
> > Abhishek Das
> >
>

Reply via email to