Re: Massive loss of disk space

Goffredo Baroncelli Thu, 03 Aug 2017 09:38:12 -0700

On 2017-08-03 13:39, Austin S. Hemmelgarn wrote:
> On 2017-08-02 17:05, Goffredo Baroncelli wrote:
>> On 2017-08-02 21:10, Austin S. Hemmelgarn wrote:
>>> On 2017-08-02 13:52, Goffredo Baroncelli wrote:
>>>> Hi,
>>>>
>> [...]
>>
>>>> consider the following scenario:
>>>>
>>>> a) create a 2GB file
>>>> b) fallocate -o 1GB -l 2GB
>>>> c) write from 1GB to 3GB
>>>>
>>>> after b), the expectation is that c) always succeed [1]: i.e. there is 
>>>> enough space on the filesystem. Due to the COW nature of BTRFS, you cannot 
>>>> rely on the already allocated space because there could be a small time 
>>>> window where both the old and the new data exists on the disk.
>>
>>> There is also an expectation based on pretty much every other FS in 
>>> existence that calling fallocate() on a range that is already in use is a 
>>> (possibly expensive) no-op, and by extension using fallocate() with an 
>>> offset of 0 like a ftruncate() call will succeed as long as the new size 
>>> will fit.
>>
>> The man page of fallocate doesn't guarantee that.
>>
>> Unfortunately in a COW filesystem the assumption that an allocate area may 
>> be simply overwritten is not true.
>>
>> Let me to say it with others words: as general rule if you want to _write_ 
>> something in a cow filesystem, you need space. Doesn't matter if you are 
>> *over-writing* existing data or you are *appending* to a file.
> Yes, you need space, but you don't need _all_ the space.  For a file that 
> already has data in it, you only _need_ as much space as the largest chunk of 
> data that can be written at once at a low level, because the moment that 
> first write finishes, the space that was used in the file for that region is 
> freed, and the next write can go there.  Put a bit differently, you only need 
> to allocate what isn't allocated in the region, and then a bit more to handle 
> the initial write to the file.
> 
> Also, as I said below, _THIS WORKS ON ZFS_.  That immediately means that a 
> CoW filesystem _does not_ need to behave like BTRFS is.


It seems that ZFS on linux doesn't support fallocate

see https://github.com/zfsonlinux/zfs/issues/326

So I think that you are referring to a posix_fallocate and ZFS on solaris, 
which I can't test so I can't comment.

[...]
>> In terms of a COW filesystem, you need the space of a) + the space of b)
> No, that is only required if the entire file needs to be written atomically.  
> There is some maximal size atomic write that BTRFS can perform as a single 
> operation at a low level (I'm not sure if this is equal to the block size, or 
> larger, but it doesn't matter much, either way, I'm talking the largest chunk 
> of data it will write to a disk in a single operation before updating 
> metadata to point to that new data). 

On the best of my knowledge there is only a time limit: IIRC every 30seconds a 
transaction is closed. If you are able to fill the filesystem in this time 
window you are in trouble.

[...]-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Massive loss of disk space

Reply via email to