Great work, thank you for tracking this down Nicolas!

Best,
-Nikolaus

--
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«



On Thu, 17 Sep 2020, at 17:26, Nicolas Deschildre wrote:
> Hello,
> 
> Ok I found it!! (Or at least, one of them).
> And a sure way to reproduce the crash.
> All this time I was looking at a async/await bug... but it was finally 
> another kind of concurrency issue :-)
> 
> The bug :
> If you move a file from a folder to another, and the inode cache happens to 
> be in a very specific condition (file to be copied uncached, destination 
> folder in cache about to be evicted on next inode cache miss, origin folder 
> in cache about to be evicted next after).
> --> fs.py:654 Reduce a lot the occurence of this bug by ensuring the origin 
> and destination folder are in inode cache
> --> fs.py:658 self._lookup() load the inode of the file to be copied, 
> evicting the destination folder inode.
> --> fs.py:719 self.inodes[id_p_old] Load the origin folder which was still in 
> cache
> --> fs.py:720 self.inodes[id_p_new] Load the destination folder, evicts the 
> origin folder inode.
> --> fs.py:721/723 Changes are made on the origin folder inode, which is no 
> longer handled by the inode cache. We have the bug.
> 
> Fix here : https://github.com/s3ql/s3ql/pull/199
> 
> Thanks for the pointers!
> 
> Nicolas Deschildre
> 
> On Friday, September 11, 2020 at 8:50:24 AM UTC+2 Nicolas Deschildre wrote:
>> Hello,
>> 
>> Thanks, Ok, I now understand the code better.
>> InodeCache holds 100 Inodes as cache in a ring. When a new inode not in 
>> cache is requested, an inode in the cache is flushed, and the new inode is 
>> stored in the cache instead.
>> The bug : Race condition : 2 inodes are requested from InodeCache at the 
>> same time. Thread 1 requesting inode 1 flush and remove inode 2 from cache. 
>> Thread 2 got inode 2 before it was removed from cache by Thread 1, but makes 
>> changes after it was removed and flushed by Thread 1. Thead 2 ends, there 
>> are no longer references to inode 2, python garbage collect it, and this 
>> trigger the bug.
>> 
>> I see 2 possibles solutions : 
>> 1/ https://github.com/s3ql/s3ql/pull/196 : The _Inode class keeps a 
>> reference to the InodeCache. On __del__, if we encouter the above bug, we 
>> flush ourselves. The problem (and I'm not familiar enough with python) : I 
>> guess garbage collection could happen at shutdown, when the InodeCache SQL 
>> connection is no longer valid. Do you see a way to make this approach work?
>> 2/ (Not coded yet) : The InodeCache is a ring, first in first out. What if 
>> we store access time on InodeCache.__getitem__, and the inode to be removed 
>> is the most old accessed one? This solution should reduce a lot (but not 
>> eliminate) the race condition. What do you think?
>> 
>> Finally : I tried to reproduce locally, with a unencrypted, uncompressed 
>> local backend, with mass parralel attribute changes, but I was not able to 
>> reproduce the bug. 
>> 
>> Thanks,
>> Nicolas Deschildre
>> 
>> On Wednesday, September 9, 2020 at 5:50:19 PM UTC+2 dan...@jagszent.de wrote:
>>> Hello Nicolas,
>>>  
>>> 
>>>> 
>>>>> S3QL somehow manages to delete/garbage collects an _Inode object that is 
>>>>> dirty (i.e. has an attribute modification that is not persisted in the 
>>>>> S3QL database) 
>>>> 
>>>> So, if I understand correctly, since it is a pending modification on a now 
>>>> deleted inode, this is not really a problem, right? Said otherwise, the 
>>>> filesystem is not corrupted? [...]
>>> 
>>> No, the inode/file does not have to be deleted. There is a pending metadata 
>>> modification (access time, modification time, file size, file mode) that 
>>> should have been persisted (written in the Sqlite-Database) but it did not 
>>> made it in the database.
>>> File data is not corrupted, but some metadata of your files might not be in 
>>> the correct state.
>>> 
>>> 
> 

> --
> You received this message because you are subscribed to the Google Groups 
> "s3ql" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to s3ql+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/s3ql/e1882d14-f582-42ce-8468-1d07d0eada3cn%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/s3ql/e1882d14-f582-42ce-8468-1d07d0eada3cn%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to s3ql+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/s3ql/c482cb9e-a295-45d6-b4c8-8137029da6f3%40www.fastmail.com.

Reply via email to