On Fri, Nov 09, 2007 at 04:27:49AM -0500, Daniel Ouellet wrote: > >>Any clue as to how to tackle this problem, or any trick around it? > > > >I really do not understand the problem here. But you might be able to > >detect sparse files compartaring the size vs the number of blocks it uses. > > Without making a bit writing out of it. Let say that the problem is for > now a storage capacity problem on the destinations servers, a timing one > in the extended transfer process and the additional bandwidth required > at some of the destination point and the volumes of files. Let just say > that if it was syncing 100K files, it would be a piece of cake, but it's > much bigger. > > Just for example, a source file that is sparse badly, don't really have > allocated disk block yet, but when copy over, via scp, or rsync will > actually use that space on the destination servers. All the servers are > identical (or suppose to be anyway) but what is happening is the copy of > them are running out of space at time in the copy process. Like when it > is copying them, it may easy use twice the amount of space in the > process and sadly filling up the destinations then then the sync process > stop making the distribution of the load unusable. I need to increase > the capacity yes, except that it will take me times to do so. > > Sparse file for database example is a very good thing, but not for > everything however. > > The problem is not the sparse file at the source. It sure can stay as > is. It's just offset pointers anyway. > > The problem is in the sync process between multiple servers using the > Internet to sync them and the bandwidth waisted as well as the lack of > space available at the destination. Plus because the copy is different > in size, then the sync process see it as different files and as such > will copy them again.
The size will not be different, just the disk space used. > > Or it can be copy using -S with rsync, however this process will inflate > the file at the destination and run out of space during the process and > make them smaller at the end. Plus this obviously take a lots more time > and as such, the timely sync process that was good for a long time now, > well... Let say, not reliable. Let say, sync without concern for sparse > is done just in a few minutes, but then use lots more space on the > destination. Doing it with -S to address the capacity issue fix that, > but then it takes a HUGE amount of time more and sadly there is useless > transfer of null data cause from the sparse source empty space. So your problem seems to be that rsync -S is inefficient to the point where it is not useable. I do not use rsync a lot, so I do not know if there's a solution to that problem. It does seem strange that a feature to solve a problem actually make the problem worse. > I can manage, I find ways to use ls -laR, or du -k and do diff's between > them and fine the files that are getting out of wack, replace them and > then continue, but this really is painful. stat -s gives the raw info in one go. Some shell script hacking should make it easy to detect sparse files. -Otto > Obviously when the capacity will be there, it will be a none issue, > however I am sadly not at that point yet and it will take me some time. > > Not sure if that explain it any better, I hope so. > > But I was looking if it was possible to identify these files in a more > efficient way. > > If not, I will just deal with it. > > It's just going to be painful for sometime that's all. > > The issue is really in the transfer process and at the final > destination. Not at the source. > > I hope it make more sense explaining it this way, if not I apologists > for the lack of better thinking at the moment in explaining it. > > Best, > > Daniel