If using the defaults isn't this set to delayed_commits = true still? Can't this lead to just this type of data corruption? I'd like to see delayed_commits = false and see if this is still happening.

I'd also be keen on seeing this data replicated to a different piece of hardware with the same compaction schedule and see if the issue persists. I'm inclined to point the finger at a hard disk issue, but would like to see some confirmation that this can be reproduced with the same exact code on different hardware.

I've run this same version heavily in production on several different systems doing essentially the same thing and have never seen a data corruption. The main difference is I always use delayed_commits = false

Wendall

On 04/19/2013 01:31 AM, Dave Cottlehuber wrote:
On 19 April 2013 00:41, Victor Nicollet <[email protected]> wrote:
I searched the logs for any signs of error. The operations performed on the
prod-folder database in the two hours before the first crash were :

https://gist.github.com/VictorNicollet/878d0176960cc71d9ac1

The compact at 10:54:08 finished without a hitch.
The compact at 11:54:07 finished with :

https://gist.github.com/VictorNicollet/4d6ccd60bec2ae922a32

Hi Victor,

thanks for that information.

Can we get a working copy of the database, so we can compare the
corrupt compressed documents with the working ones and see if there's
any pattern?

I recommend you assume there's some storage system issue and:

- check dmesg / syslog for disk related errors
- fsck the filesystem where the couches are
- if this is a managed / hosted server you might want to get the
supplier to check if there are any disk / storage issues
- if it's not virtualised hardware, see if smartmontools tells you
anything useful

If you wish, you can encrypt files using my public key,
http://www.apache.org/dist/couchdb/KEYS dch@ apache.org.

A+
Dave

Reply via email to