[ https://issues.apache.org/jira/browse/COUCHDB-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884269#comment-13884269 ]
Robert Newson commented on COUCHDB-2040: ---------------------------------------- The next step is to build a 1.5.0 Windows package with more logging statements, assuming Igor is prepared to run it. If so, I guess we'll need to know which version of Windows and 32/64-bitness. I'll defer that to [~dch] because I only know the kind of windows that have curtains. While the title is accurate, it's not the case that we're expecting to find a bug in compaction per se. This appears to be a corrupted file, it's just that the compactor has to read all live data. It's probably just the messenger. We'll know more with more debug output. > Compaction fails when copying attachment > ---------------------------------------- > > Key: COUCHDB-2040 > URL: https://issues.apache.org/jira/browse/COUCHDB-2040 > Project: CouchDB > Issue Type: Bug > Components: Database Core > Reporter: Igor Klimer > > Orignal discussion from the user mailing list: > http://mail-archives.apache.org/mod_mbox/couchdb-user/201401.mbox/%3cd14f971a540b974bb75adc55f00f34ca69a35...@sex1.getback.ad2008r2.corp%3e > Digest: > During database compaction, the process fails at about 50% with the following > error: http://pastebin.com/qeaZNHMj (CouchDB 1.2.0, Windows Server 2008 R2 > Enterprise). > After server and CouchDB upgrade the error is still the same: > http://pastebin.com/feJWu7bN (CouchDB 1.5.0, Ubuntu 12.04.3 LTS (GNU/Linux > 3.8.0-33-generic x86_64)). > There was one prior attempt at compaction that failed because of insufficient > disk space: http://pastebin.com/S1URXN0p > After this initial failure, I've made sure that there's sufficient disk space > for the .compact file. > The .compact file was always removed before trying compaction again. > At the request of Robert Samuel Newson, I've also tried with an empty > .compact file - the results were the same: http://pastebin.com/MJCgGM8C. > Our I/O subsystem consists of some RAID5 matrices - the admins claim that > they've been running error-free since inception ;) We have yet to run a > parity check, since that'd require taking the matrix offline and I'd rather > not do that without exhausting other options. > Config files from the 1.2.0/Windows server (since that's where the fault must > have occured): > default.ini: http://pastebin.com/kUz0qyNk > local.ini: http://pastebin.com/srZUMwzB > Other than the default delayed_commits set to true, there are no options that > could affect fsync()ing and such. > I've run: > curl localhost:5984/ecrepo/_changes?include_docs=true > curl localhost:5984/ecrepo/_all_docs?include_docs=true > and both calls succeeded, which would suggest that a faulty (incorrect > checksum/length) is at fault somewhere. -- This message was sent by Atlassian JIRA (v6.1.5#6160)