Re: [Gluster-users] small files and cluster/stripe

Jeff Anderson-Lee Fri, 14 May 2010 16:37:46 -0700

On 5/14/2010 4:20 PM, Craig Carl wrote:

Jeff -
   I've paraphrased Tejas's response here -
1. There is no way to know how big a file will be until thefclose() is received.2. What would we do about files that change sizes across thecutoff line?3. We could perhaps add a size parameter to therebalance/defrag scripts we have.
Would a process that redistributed the file on some sort of a schedulework?

All these reasons are ones that would lead me *not* to try abig-file/small-file distribution scheme. Combining a distributed(hash-based) offset with file striping makes much more sense to me. Itdoesn't work well for hard links or simple rename, but it makes the restsimpler.


Jeff

Craig

--
Craig Carl
Gluster, Inc.
Cell - (408) 829-9953 (California, USA)
Gtalk - [email protected]


----- Original Message -----
From: "Jeff Anderson-Lee" <[email protected]>
To: "Craig Carl" <[email protected]>
Cc: [email protected]
Sent: Thursday, May 13, 2010 6:39:31 PM GMT -08:00 US/Canada Pacific
Subject: Re: [Gluster-users] small files and cluster/stripe

On 5/13/2010 6:24 PM, Craig Carl wrote:

    Jeff -
        Thanks for your email, I think I've got a grasp of your
    environment now and I understand the problem. If we create a
    "/gluster/small_files" and a "/gluster/large_files" your users are
    unlikely to respect distinction, plus it is a management
    nightmare, right?
    If you have time I'd like your help writing a feature request that
    would implement what you need.  Something like -

    Gluster should provide the option of distributing files based on
    size to different volumes.
    This distribution should be transparent to users.
    This distribution only needs to happen the first time a file is
    written.
    The Gluster administrator should have the ability to provide a
    file size range for each volume.
    The different volumes could be different types; mirror, stripe,
    mirror & distribute, etc.

    What have I missed?

    Craig
That would be one solution. I would target another that I suspecr isprobably simpler:
Gluster should provide the option of pseudo-randomizing thedistribution of file stripes across volumes, so that all small filesdo not end up on the same subvolume of a cluster/stripe.
This distribution should be transparent to users.
This distribution only needs to happen the first time a file iswritten and may be based on the file name hash (a la cluster/distribute).
The net behavior could be such that small files (less that theblock-size) would have the same data distribution pattern as theywould have with cluster/distribute, while larger files (greater thanthe stripe block-size) would have their upper blocks ditributed in around-robin from that starting place.
Given that the code already exists for distributing files based onnamehash in cluster/distribute I think this could be an easier featureto add.
Jeff

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] small files and cluster/stripe

Reply via email to