If using the defaults isn't this set to delayed_commits = true still?
Can't this lead to just this type of data corruption? I'd like to see
delayed_commits = false and see if this is still happening.
I'd also be keen on seeing this data replicated to a different piece of
hardware with the same compaction schedule and see if the issue
persists. I'm inclined to point the finger at a hard disk issue, but
would like to see some confirmation that this can be reproduced with the
same exact code on different hardware.
I've run this same version heavily in production on several different
systems doing essentially the same thing and have never seen a data
corruption. The main difference is I always use delayed_commits = false
Wendall
On 04/19/2013 01:31 AM, Dave Cottlehuber wrote:
On 19 April 2013 00:41, Victor Nicollet <[email protected]> wrote:
I searched the logs for any signs of error. The operations performed on the
prod-folder database in the two hours before the first crash were :
https://gist.github.com/VictorNicollet/878d0176960cc71d9ac1
The compact at 10:54:08 finished without a hitch.
The compact at 11:54:07 finished with :
https://gist.github.com/VictorNicollet/4d6ccd60bec2ae922a32
Hi Victor,
thanks for that information.
Can we get a working copy of the database, so we can compare the
corrupt compressed documents with the working ones and see if there's
any pattern?
I recommend you assume there's some storage system issue and:
- check dmesg / syslog for disk related errors
- fsck the filesystem where the couches are
- if this is a managed / hosted server you might want to get the
supplier to check if there are any disk / storage issues
- if it's not virtualised hardware, see if smartmontools tells you
anything useful
If you wish, you can encrypt files using my public key,
http://www.apache.org/dist/couchdb/KEYS dch@ apache.org.
A+
Dave