> I'm not currently a Gluster user but I'm hoping it's the answer to a > problem I'm working on. > > I manage a private web site that is basically a reporting tool for > equipment located at several hundred sites. Each site regularly uploads > zipped XML files to a cloud based server and this also provides a web > interface to the data using apache/PHP. The problem I need to solve is > that with a single server disk I/O has become a bottleneck. > > The plan is to use a load balancer and multiple web servers with a > 4-node Gluster volume behind to store the data. Data would be replicated > over 2 nodes. > > The uploaded files are stored and then unzipped ready for reading by the > web interface code. Each file is unzipped into a temporary file and then > renamed, e.g. > > file1.xml.zip --unzip--> uniquename.tmp --rename--> file1.xml > > Use of the rename function makes these updates atomic. > > How can I achieve atomic updates in this way using a Gluster volume? My > understanding is that renaming a file on a Gluster volume causes a link > file to be created and that clearly wouldn't be appropriate where there > are frequent updates.
Creating a file with one name and then renaming it to another *might* cause creation of linkfiles, but I think concerns about linkfiles are often overblown. The one extra call to create a linkfile isn't much compared to those for creating the file, writing into it, and then renaming it even if the rename is local to one brick. What really matters is the performance of the entire sequence, with or without the linkfile. That said, there's also a trick you can use to avoid creation of a linkfile. Other tools, such as rsync and our own object interface, use the same write-then-rename idiom. To serve them, there's an option called extra-hash-regex that can be used to place files on the "right" brick according to their final name even though they're created with another. Unfortunately, specifying that option via the command line doesn't seem to work (it creates a malformed volfile) so you have to mount a bit differently. For example: glusterfs --volfile-server=a_server --volfile-id=a_volume \ --xlator-option a_volume-dht.extra_hash_regex='(.*+)tmp' \ /a/mountpoint The important part is that second line. That causes any file with a "tmp" suffix to be hashed and placed as though only the part in the first parenthesized part of the regex (i.e. without the "tmp") was there. Therefore, creating "xxxtmp" and then renaming it to "xxx" is the same as just creating "xxx" in the first place as far as linkfiles etc. are concerned. Note that the excluded part can be anything that a regex can match, including a unique random number. If I recall, rsync uses temp files something like this: fubar = .fubar.NNNNNN (where NNNNNNN is a random number) I know this probably seems a little voodoo-ish, but with a little bit of experimentation to find the right regex you should be able to avoid those dreaded linkfiles altogether. _______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users