Jeff - 
I've paraphrased Tejas's response here - 
1. There is no way to know how big a file will be until the fclose() is 
received. 
2. What would we do about files that change sizes across the cutoff line? 
3. We could perhaps add a size parameter to the rebalance/defrag scripts we 
have. 

Would a process that redistributed the file on some sort of a schedule work? 

Craig 



-- 
Craig Carl 



Gluster, Inc. 
Cell - (408) 829-9953 (California, USA) 
Gtalk - [email protected] 

----- Original Message ----- 
From: "Jeff Anderson-Lee" <[email protected]> 
To: "Craig Carl" <[email protected]> 
Cc: [email protected] 
Sent: Thursday, May 13, 2010 6:39:31 PM GMT -08:00 US/Canada Pacific 
Subject: Re: [Gluster-users] small files and cluster/stripe 

On 5/13/2010 6:24 PM, Craig Carl wrote: 


Jeff - 
Thanks for your email, I think I've got a grasp of your environment now and I 
understand the problem. If we create a "/gluster/small_files" and a 
"/gluster/large_files" your users are unlikely to respect distinction, plus it 
is a management nightmare, right? 
If you have time I'd like your help writing a feature request that would 
implement what you need. Something like - 

Gluster should provide the option of distributing files based on size to 
different volumes. 
This distribution should be transparent to users. 
This distribution only needs to happen the first time a file is written. 
The Gluster administrator should have the ability to provide a file size range 
for each volume. 
The different volumes could be different types; mirror, stripe, mirror & 
distribute, etc. 

What have I missed? 

Craig 

That would be one solution. I would target another that I suspecr is probably 
simpler: 

Gluster should provide the option of pseudo-randomizing the distribution of 
file stripes across volumes, so that all small files do not end up on the same 
subvolume of a cluster/stripe. 
This distribution should be transparent to users. 
This distribution only needs to happen the first time a file is written and may 
be based on the file name hash (a la cluster/distribute). 

The net behavior could be such that small files (less that the block-size) 
would have the same data distribution pattern as they would have with 
cluster/distribute, while larger files (greater than the stripe block-size) 
would have their upper blocks ditributed in a round-robin from that starting 
place. 

Given that the code already exists for distributing files based on namehash in 
cluster/distribute I think this could be an easier feature to add. 

Jeff 

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to