On 08/18/2017 07:43 PM, Josef Bacik wrote:
> On Fri, Aug 18, 2017 at 06:23:18PM +0200, Goffredo Baroncelli wrote:
>> On 08/18/2017 01:39 AM, Josef Bacik wrote:
>> [...]
>>> This is happening because the app (the guest OS in this case, we saw this a 
>>> lot
>>> with windows guests) is changing the pages while they are in flight.  We
>>> calculate the checksum of the page before it's written, so if it changes 
>>> while
>>> in flight we'll end up with a csum mismatch.
>>>
>>> To fix this change kvm to not use O_DIRECT or set NODATASUM on your qcow2 
>>> image.
>>> You'll have to re-create the image because NODATASUM won't apply to the 
>>> already
>>> invalid checksums.  Thanks,
>>
>> Hi Josef,
>>
>> could you elaborate: do you are saying that using O_DIRECT is incompatible 
>> with DATASUM ?
>>
> 
> No, I'm saying using O_DIRECT with applications that don't protect in-flight
> memory are incompatible with DATASUM.  

This is what I call an 'incompatibility'. Even is a "corner" case, it is still 
an incompatibility. And to be honest, it is still difficult to say that a "VM" 
is a "corner" case.

> We have no way of making sure nobody
> touches the page while we're writing it out, so after we calculate the 
> checksum
> any changes to the page are going to cause a checksum mismatch.  O_DIRECT are
> user space pages, there's nothing we can do to stop user space from doing 
> stupid
> things.

I understand the technical difficulties; however I can't agree about "user 
space [...] doing *stupid* things". If it is not explicitly forbidden, it is 
legal; not "stupid"

How the application know that the page aren't in-flight anymore ? It is 
sufficient to wait the end of the write() syscall ? Or it has to wait the end 
of a fsync() ?
 
> The options I looked into before were things like detecting the page had 
> changed
> since we calculated the checksum, and re-submitting the write.  This punishes
> applications that do the right thing (databases for example) by forcing us to
> calculate checksums twice.

There are other "cases" where it is possible to have the same problem ? It is 
the same for mmap() ?

> 
> This is a shit situation because users aren't going to understand this
> limitation, and it bites them in the ass with all these weird errors.  I think
> maybe we need to go back to the double-checksum thing by default, and have a
> flag or something for users to set if they know their application behaves
> properly.  

Or... disable checksum for the "O_DIRECT" writings... If you can't trust the 
checksums at 100%, these don't make sense.

> 
> Josef
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to