[ 
https://issues.apache.org/jira/browse/COUCHDB-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885340#comment-13885340
 ] 

Robert Newson commented on COUCHDB-2040:
----------------------------------------

Ah, not comfortable letting the compaction silently drop data it can't read, 
for any reason. The better fix is to add parity to the file and attempt repair 
during compaction Since dual-syndrome is the minimum for detecting and 
correcting the kind of error I'm speculated you suffered from, that's a bit of 
a task. Btw, A good backstop against bit rot itself is regular compaction. We 
can and should improve the error handling though.

So you've deleted the 'fixed' version of your database and kept the corrupted 
and uncompactable one? :)

Compaction should still reduce databases even if you never delete documents as 
there will be old btree nodes representing now unreachable previous states of 
the database. The replicated target should be closer to the compaction size as 
it obviously omits those.



> Compaction fails when copying attachment
> ----------------------------------------
>
>                 Key: COUCHDB-2040
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-2040
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>            Reporter: Igor Klimer
>
> Orignal discussion from the user mailing list: 
> http://mail-archives.apache.org/mod_mbox/couchdb-user/201401.mbox/%3cd14f971a540b974bb75adc55f00f34ca69a35...@sex1.getback.ad2008r2.corp%3e
> Digest:
> During database compaction, the process fails at about 50% with the following 
> error: http://pastebin.com/qeaZNHMj (CouchDB 1.2.0, Windows Server 2008 R2 
> Enterprise).
> After server and CouchDB upgrade the error is still the same: 
> http://pastebin.com/feJWu7bN (CouchDB 1.5.0, Ubuntu 12.04.3 LTS (GNU/Linux 
> 3.8.0-33-generic x86_64)).
> There was one prior attempt at compaction that failed because of insufficient 
> disk space: http://pastebin.com/S1URXN0p
> After this initial failure, I've made sure that there's sufficient disk space 
> for the .compact file.
> The .compact file was always removed before trying compaction again.
> At the request of Robert Samuel Newson, I've also tried with an empty 
> .compact file - the results were the same: http://pastebin.com/MJCgGM8C.
> Our I/O subsystem consists of some RAID5 matrices - the admins claim that 
> they've been running error-free since inception ;) We have yet to run a 
> parity check, since that'd require taking the matrix offline and I'd rather 
> not do that without exhausting other options.
> Config files from the 1.2.0/Windows server (since that's where the fault must 
> have occured):
> default.ini: http://pastebin.com/kUz0qyNk
> local.ini: http://pastebin.com/srZUMwzB
> Other than the default delayed_commits set to true, there are no options that 
> could affect fsync()ing and such.
> I've run:
> curl localhost:5984/ecrepo/_changes?include_docs=true
> curl localhost:5984/ecrepo/_all_docs?include_docs=true
> and both calls succeeded, which would suggest that a faulty (incorrect 
> checksum/length) is at fault somewhere.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to