Am 04.08.2014 um 18:28 schrieb Hager, Roland <[email protected]>:

> [...]
> The errors occurs on all webservers and for different users. It seems that 
> there are some users (~100) that are affected more often, where the majority 
> (~6500) is not affected. Affected users seem to be quite active having more 
> than several thousand files / directories and some of them even more than 
> several 100.000 files / directories. Affected users seem to have entries in 
> oc_filecache with a size of „-1“, which also leads to a wrong quota being 
> displayed, and in general less entries than files in filesystem. After using 
> "./occ files:scan“ for the affected users the size and number of entries has 
> been corrected and the error messages disappeared for some time, but came 
> back after a while. There seem to be certain conditions triggering those 
> errors but I could not reproduce or identify them yet. Maybe there are some 
> long running background tasks (file scans?) interfering with user actions? 
> The "./occ files:scan“ command took up to 30 - 60 minutes for some users.
> [...]

I’d just checked how much users are really affected. In the last 34 hours  43 
users had one or more „duplicate key errors“. That is less than 1% of our users 
but still more than you want to have on the telephone.
I have another feedback from an affected user. He was uploading multiple files 
in several folders with his sync client. Within one folder he got those 
„duplicate key error“ for four files. After some time those files got synced 
without any further action by an admin. So the „./occ files:scan“ did clean up 
the users filecache but it might have been unnecessary.

My guess, for now, is the following cascade
        1. A file is going to be uploaded, scanned or sth. similar
        2. A cache entry has to be updated or inserted.
        3. A SQL query is executed to check if that cache entry already exists.
        4. This query is aborted because of a deadlock OR there is no entry but 
before the next step another process creates it
        5. Since there is no row in the result set, OC assumes the cache entry 
does not exist
        6. An insert query is being executed which leads to an duplicate key 
error.

But, if a query is aborted because of a Deadlock, shouldn’t that lead to an 
exception and interruption of the normal code flow?
What about the case I mentioned in the last mail where the „duplicate entry“ 
actually did not exist right after the error message, but was created some 
minutes later? Is it possible that two or more processes are trying to create 
the same cache entry. One of them already created the cache entry but both are 
getting in a deadlock situation somehow where this entry is being discarded 
afterwards?

Is anybody aware of a change in code between 5.0.14 and 5.0.17 that might cause 
raise conditions between concurrent processes? Something like „Scanner vs 
Useraction"


Best regards
Roland Hager

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Devel mailing list
[email protected]
http://mailman.owncloud.org/mailman/listinfo/devel

Reply via email to