Dominic Giampaolo <mailto:dgiampa...@apple.com>
December 14, 2017 at 9:17 AM

Which brings me to my first (and biggest) question: F_PEOFPOSMODE allocates from the 
"physical" end of file. What is the physical end of file?

Is is the current size of the file as reported by stat() in the st_size field.  
This is different than the amount of data allocated to the file which is 
returned in st_blocks.  The difference between st_blocks * fs-block-size and 
st_size is how much the file has pre-allocated.  (well you should round up 
st_size to a multiple of the fs block size before doing the subtraction).

Thanks a million. That piece of information has helped me immensely. I have now (a) figured out what's going wrong with F_PREALLOCATE, (b) discovered that APFS and HFS+ treat F_PREALLOCATE differently, and (c) believe I've figured out a workaround.

Let's say I have a 1MB file and request a 2MB preallocation. Afterwards, is the 
"physical" eof 1MB or 3MB? If I perform another 2MB preallocation will the preallocated 
space remain at 2MB or will it grow to 4MB? If the latter, how does one determine the 
"physical" end of file?

As noted above, the physical end of file is simply the size reported for the 
file.

But this is the crux of your issue: you're asking to grow the file but the amount 
you want to grow it by doesn't exceed the current amount already pre-allocated for 
the file and so apfs returns EINVAL (error 22).  Put another way, if (st_size + 
amount_you_asked_to_allocate)<  (st_blocks * fs_block_size) then apfs returns 
EINVAL.
Agreed, and I think this is the root of my problem.

#1 The APFS F_PREALLOCATE bug

Back to my bug report. I can now reproduce the bug in several different ways, and I also believe I understand (at least indirectly) why it's failing.

I modified my test code to perform an fstat() on the file before, and again after, each preallocation request. This let me compare the preallocation request (fst_length -> fst_bytesalloc) against the actual change in the file's allocated size (st_blocks).

My first success at failure was by requesting a 1MB pre-allocation on an empty file, writing a small (20K) block of data, then requesting a 1MB pre-allocation again, repeating until it failed. Here's what happened

<--------- before F_PREALLOC request --------> <- request / result -> <--------------- after request ---------------> #: leof st_size st_blocks( *512) fst_length alloc st_size st_blocks( *512) blocks-delta 1: 0 0 0( 0) 1048576 1048576 0 2048( 1048576) 1048576 2: 20480 20480 2048( 1048576) 1048576 20480 20480 2088( 1069056) 20480 3: 40960 40960 2088( 1069056) 1048576 20480 40960 2128( 1089536) 20480 4: 61440 61440 2128( 1089536) 1048576 20480 61440 2168( 1110016) 20480

Before the first preallocation, the file is empty, st_blocks is 0. After requesting a 1MB preallocation, it changes to 2048 (1MB). This is as expected and agrees with the amount returned in fst_bytesalloc (the alloc column).

After writing 20K, another 1MB preallocation is requested. This time there is 1MB-20K of preallocated space still left, so F_PREALLOCATE allocates only the difference between what's already preallocated and the request, which is exactly 20K. The request reports that 20K was preallocated (fst_bytesalloc) which agrees with the change in st_blocks (from 2048 to 2088, or 20K).

This goes on for some time without any problems. Then, the surprise...

4845: 99205120 99205120 195768( 100233216) 1048576 20480 99205120 195808( 100253696) 20480 4846: 99225600 99225600 195808( 100253696) 1048576 20480 99225600 195848( 100274176) 20480 4847: 99246080 99246080 195848( 100274176) 1048576 20480 99246080 228616( 117051392) 16777216
preallocation failed: errno=22

Request #4,846 requests 1MB, allocates 20K, the file allocation grows by 20K, and all is good. Then request #4,847 requests exactly the same, but this time the preallocation reports that it allocated 20K but actually allocated 16,777,216 (16MB), based on the change in the st_blocks.

The next request for 1MB fails with errno 22 (EINVAL). As explained earlier, this is likely because the amount requested (1MB) is less than the amount already allocated.

Here's my theory: I think APFS is applying it's own preemptive preallocation logic based on how the file is getting written, and at some point it internally decides to preallocate more space so all of those little write() calls won't create a horribly fragmented file. But because the F_PREALLOCATE now requires that you don't request less that wha't already been preallocated, the next call fails.

I've found combinations of write and preallocation sizes that run indefinitely without any problem and others that meet a similar fate. I'll be uploading my bug report with this information shortly.

#2 HFS+ is different

I've also discovered, using the same code, that HFS+ treats F_PREALLOCATION differently. Specifically, it always adds to the file's allocation, by whatever is requested. Two successive, 1MB, preallocation requests will allocate 1MB of space on APFS, but 2MB on HFS+. Which also explains why my code never failed on HFS+; it never matters how much space has already been preallocated, F_PREALLOCATE will just add to it.

#3 My workaround

My workaround, which seems to work under limited testing, is to not trust fst_bytesalloc to report how much space is/was actually allocated. By using fstat(), I can determine how much space is/was actually allocated and adjust my logic so I don't make a "short" allocation requests. This leads so some strange allocation results (including requests that allocate nothing), but no errors so far:

#: leof st_size st_blocks( *512) fst_length alloc st_size st_blocks( *512) blocks-delta 5850: 119787520 119787520 235968( 120815616) 1048576 20480 119787520 236008( 120836096) 20480 5851: 119808000 119808000 236008( 120836096) 1048576 20480 119808000 236048( 120856576) 20480 5852: 119828480 119828480 236048( 120856576) 1048576 20480 119828480 268816( 137633792) 16777216 5853: 136601600 136601600 268816( 137633792) 1048576 16384 136601600 301584( 154411008) 16777216 5854: 153374720 153374720 301584( 154411008) 1048576 12288 153374720 334352( 171188224) 16777216 5855: 170147840 170147840 334352( 171188224) 1048576 8192 170147840 367120( 187965440) 16777216 5856: 186920960 186920960 367120( 187965440) 1048576 4096 186920960 399888( 204742656) 16777216 5857: 203694080 203694080 399888( 204742656) 1048576 0 203694080 399888( 204742656) 0 5858: 203714560 203714560 399888( 204742656) 1048576 20480 203714560 432656( 221519872) 16777216 5859: 220487680 220487680 432656( 221519872) 1048576 16384 220487680 465424( 238297088) 16777216 5860: 237260800 237260800 465424( 238297088) 1048576 12288 237260800 498192( 255074304) 16777216 5861: 254033920 254033920 498192( 255074304) 1048576 8192 254033920 530960( 271851520) 16777216 5862: 270807040 270807040 530960( 271851520) 1048576 4096 270807040 563728( 288628736) 16777216 5863: 287580160 287580160 563728( 288628736) 1048576 0 287580160 563728( 288628736) 0

Bonus: this logic should work for both APFS and HFS+


 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list      (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to