Am 04.08.2014 um 18:28 schrieb Hager, Roland <[email protected]>:
> [...]
> The errors occurs on all webservers and for different users. It seems that
> there are some users (~100) that are affected more often, where the majority
> (~6500) is not affected. Affected users seem to be quite active having more
> than several thousand files / directories and some of them even more than
> several 100.000 files / directories. Affected users seem to have entries in
> oc_filecache with a size of „-1“, which also leads to a wrong quota being
> displayed, and in general less entries than files in filesystem. After using
> "./occ files:scan“ for the affected users the size and number of entries has
> been corrected and the error messages disappeared for some time, but came
> back after a while. There seem to be certain conditions triggering those
> errors but I could not reproduce or identify them yet. Maybe there are some
> long running background tasks (file scans?) interfering with user actions?
> The "./occ files:scan“ command took up to 30 - 60 minutes for some users.
> [...]
I’d just checked how much users are really affected. In the last 34 hours 43
users had one or more „duplicate key errors“. That is less than 1% of our users
but still more than you want to have on the telephone.
I have another feedback from an affected user. He was uploading multiple files
in several folders with his sync client. Within one folder he got those
„duplicate key error“ for four files. After some time those files got synced
without any further action by an admin. So the „./occ files:scan“ did clean up
the users filecache but it might have been unnecessary.
My guess, for now, is the following cascade
1. A file is going to be uploaded, scanned or sth. similar
2. A cache entry has to be updated or inserted.
3. A SQL query is executed to check if that cache entry already exists.
4. This query is aborted because of a deadlock OR there is no entry but
before the next step another process creates it
5. Since there is no row in the result set, OC assumes the cache entry
does not exist
6. An insert query is being executed which leads to an duplicate key
error.
But, if a query is aborted because of a Deadlock, shouldn’t that lead to an
exception and interruption of the normal code flow?
What about the case I mentioned in the last mail where the „duplicate entry“
actually did not exist right after the error message, but was created some
minutes later? Is it possible that two or more processes are trying to create
the same cache entry. One of them already created the cache entry but both are
getting in a deadlock situation somehow where this entry is being discarded
afterwards?
Is anybody aware of a change in code between 5.0.14 and 5.0.17 that might cause
raise conditions between concurrent processes? Something like „Scanner vs
Useraction"
Best regards
Roland Hager
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Devel mailing list [email protected] http://mailman.owncloud.org/mailman/listinfo/devel
