Hi Jean-Louis, I thought I had applied the patches on this machine also, but it turns out I didn't (sorry about that). I applied the patches and ran a new dump. This time all 4 file systems succeeded, though /usr got the same "strange" message as on the other machine:
1251752282.483677: sendbackup: pid 15574 ruid 150 euid 150 version 2.6.2alpha: start at Mon Aug 31 14:58:02 2009 1251752282.484788: sendbackup: Version 2.6.2alpha 1251752282.523952: sendbackup: pid 15574 ruid 150 euid 150 version 2.6.2alpha: rename at Mon Aug 31 14:58:02 2009 1251752282.527795: sendbackup: Parsed request as: program `DUMP' 1251752282.527880: sendbackup: disk `/usr' 1251752282.527931: sendbackup: device `/usr' 1251752282.527978: sendbackup: level 0 1251752282.528025: sendbackup: since NODATE 1251752282.528072: sendbackup: options `' 1251752282.533704: sendbackup: start: bromine.example.com:/usr lev 0 1251752282.537190: sendbackup: dumping device '/dev/rsd0d' with 'ffs' 1251752282.543664: sendbackup: Spawning "/sbin/dump dump 0usf 1048576 - /dev/rsd0d" in pipeline 1251752282.548532: sendbackup: Started backup 1251752282.561558: sendbackup: fd 3 is set with O_NONBLOCK before dup2 1251752282.625055: sendbackup: Started index creator: "/sbin/restore -tvf - 2>&1 | sed -e ' s/^leaf[ ]*[0-9]*[ ]*\.// t /^dir[ ]/ { s/^dir[ ]*[0-9]*[ ]*\.// s%$%/% t } d '" 1251752282.627853: sendbackup: 90: normal(|): DUMP: Date of this level 0 dump: Mon Aug 31 14:58:02 2009 1251752282.641040: sendbackup: 90: normal(|): DUMP: Date of last level 0 dump: the epoch 1251752282.754862: sendbackup: 90: normal(|): DUMP: Dumping /dev/rsd0d (/usr) to standard output 1251752282.792706: sendbackup: 90: normal(|): DUMP: mapping (Pass I) [regular files] 1251752286.777085: sendbackup: 90: normal(|): DUMP: mapping (Pass II) [directories] 1251752286.779538: sendbackup: 90: normal(|): DUMP: estimated 653358 tape blocks. 1251752286.789190: sendbackup: 90: normal(|): DUMP: Volume 1 started at: Mon Aug 31 14:58:06 2009 1251752286.827782: sendbackup: 90: normal(|): DUMP: dumping (Pass III) [directories] 1251752290.646004: sendbackup: 90: normal(|): DUMP: dumping (Pass IV) [regular files] 1251752290.822019: sendbackup: 115: strange(?): sed: stdout: Resource temporarily unavailable 1251752424.852658: sendbackup: 43: size(|): DUMP: 664333 tape blocks 1251752424.854957: sendbackup: 90: normal(|): DUMP: Date of this level 0 dump: Mon Aug 31 14:58:02 2009 1251752424.857083: sendbackup: 90: normal(|): DUMP: Volume 1 completed at: Mon Aug 31 15:00:24 2009 1251752424.859064: sendbackup: 90: normal(|): DUMP: Volume 1 took 0:02:18 1251752424.860994: sendbackup: 90: normal(|): DUMP: Volume 1 transfer rate: 4814 KB/s 1251752424.863042: sendbackup: 90: normal(|): DUMP: Date this dump completed: Mon Aug 31 15:00:24 2009 1251752424.864993: sendbackup: 90: normal(|): DUMP: Average transfer rate: 4814 KB/s 1251752424.869138: sendbackup: 90: normal(|): DUMP: level 0 dump on Mon Aug 31 14:58:02 2009 1251752424.878026: sendbackup: 90: normal(|): DUMP: DUMP IS DONE 1251752424.888324: sendbackup: Index pipe exited with status 1 1251752424.906986: sendbackup: Parsed backup messages 1251752424.907361: sendbackup: pid 15574 finish time Mon Aug 31 15:00:24 2009 So now I've seen the same behavior that Stan noted: applying the debug patch seems to "fix" the dump. Also, in case you didn't see it before, here is a snippet from an email thread Stan started on an OpenBSD list: > > I am trying to get Amanda of a recent vintage working on 4.5. The developrs > > don't understand how writing to a blocking pipe can return EAGAIN. > > Should the write just block? > > Yes, it should, and a quick, about-to-go-to-bed readthrough of the > pipe write path shows that it shouldn't; is Amanda doing fcntls to > the fd? If so, check those to be absolutely sure that they're not > doing something hinkey. Thanks, Michael On Mon, Aug 31, 2009 at 2:41 PM, Jean-Louis Martineau <martin...@zmanda.com>wrote: > Do you have the patch I sent to stan on this system? > > The patch check before and after the write if the pipe is in O_NONBLOCK or > not and give an error if it is. > > I'm totally lost since it is in blocking mode and you get EAGAIN, which is > impossible!!!! > > Jean-louis > > Michael Burk wrote: > >> Here are the two sendbackup.*.debug files for the / fs. >> >> First: >> >> 1251693102.343436: sendbackup: pid 29087 ruid 150 euid 150 version >> 2.6.2alpha: start at Sun Aug 30 22:31:42 2009 >> 1251693102.345326: sendbackup: Version 2.6.2alpha >> 1251693102.379331: sendbackup: pid 29087 ruid 150 euid 150 version >> 2.6.2alpha: rename at Sun Aug 30 22:31:42 2009 >> 1251693102.382860: sendbackup: Parsed request as: program `DUMP' >> 1251693102.382940: sendbackup: disk `/' >> 1251693102.382991: sendbackup: device `/' >> 1251693102.383039: sendbackup: level 0 >> 1251693102.383086: sendbackup: since NODATE >> 1251693102.383134: sendbackup: options `' >> 1251693102.388833: sendbackup: start: bromine.example.com:/ lev 0 >> 1251693102.391475: sendbackup: dumping device '/dev/rsd0a' with 'ffs' >> 1251693102.398325: sendbackup: Spawning "/sbin/dump dump 0usf 1048576 - >> /dev/rsd0a" in pipeline >> 1251693102.403120: sendbackup: Started backup >> 1251693102.461663: sendbackup: 90: normal(|): DUMP: Date of this level >> 0 dump: Sun Aug 30 22:31:42 2009 >> 1251693102.468669: sendbackup: 90: normal(|): DUMP: Date of last level >> 0 dump: the epoch >> 1251693102.487536: sendbackup: Started index creator: "/sbin/restore -tvf >> - 2>&1 | sed -e ' >> s/^leaf[ ]*[0-9]*[ ]*\.// >> t >> /^dir[ ]/ { >> s/^dir[ ]*[0-9]*[ ]*\.// >> s%$%/% >> t >> } >> d >> '" >> 1251693102.519939: sendbackup: 90: normal(|): DUMP: Dumping /dev/rsd0a >> (/) to standard output >> 1251693102.588086: sendbackup: 90: normal(|): DUMP: mapping (Pass I) >> [regular files] >> 1251693104.583392: sendbackup: 90: normal(|): DUMP: mapping (Pass II) >> [directories] >> 1251693104.585824: sendbackup: 90: normal(|): DUMP: estimated 35134 >> tape blocks. >> 1251693104.595314: sendbackup: 90: normal(|): DUMP: Volume 1 started >> at: Sun Aug 30 22:31:44 2009 >> 1251693104.615444: sendbackup: 90: normal(|): DUMP: dumping (Pass III) >> [directories] >> 1251693104.919691: sendbackup: 90: normal(|): DUMP: dumping (Pass IV) >> [regular files] >> 1251693105.800795: sendbackup: critical (fatal): index tee cannot write >> [Resource temporarily unavailable] >> 1251693105.803486: sendbackup: 115: strange(?): sendbackup: index tee >> cannot write [Resource temporarily unavailable] >> 1251693105.827339: sendbackup: 90: normal(|): DUMP: Broken pipe >> 1251693105.832068: sendbackup: 90: normal(|): DUMP: The ENTIRE dump is >> aborted. >> 1251693105.839675: sendbackup: critical (fatal): error [dump (7257) >> /sbin/dump returned 3] >> >> Second: >> >> 1251693123.840560: sendbackup: pid 32144 ruid 150 euid 150 version >> 2.6.2alpha: start at Sun Aug 30 22:32:03 2009 >> 1251693123.841613: sendbackup: Version 2.6.2alpha >> 1251693123.892773: sendbackup: pid 32144 ruid 150 euid 150 version >> 2.6.2alpha: rename at Sun Aug 30 22:32:03 2009 >> 1251693123.896360: sendbackup: Parsed request as: program `DUMP' >> 1251693123.896437: sendbackup: disk `/' >> 1251693123.896487: sendbackup: device `/' >> 1251693123.896535: sendbackup: level 0 >> 1251693123.896584: sendbackup: since NODATE >> 1251693123.896632: sendbackup: options `' >> 1251693123.902332: sendbackup: start: bromine.example.com:/ lev 0 >> 1251693123.905012: sendbackup: dumping device '/dev/rsd0a' with 'ffs' >> 1251693123.911778: sendbackup: Spawning "/sbin/dump dump 0usf 1048576 - >> /dev/rsd0a" in pipeline >> 1251693123.916860: sendbackup: Started backup >> 1251693123.992261: sendbackup: Started index creator: "/sbin/restore -tvf >> - 2>&1 | sed -e ' >> s/^leaf[ ]*[0-9]*[ ]*\.// >> t >> /^dir[ ]/ { >> s/^dir[ ]*[0-9]*[ ]*\.// >> s%$%/% >> t >> } >> d >> '" >> 1251693123.995743: sendbackup: 90: normal(|): DUMP: Date of this level >> 0 dump: Sun Aug 30 22:32:03 2009 >> 1251693124.009013: sendbackup: 90: normal(|): DUMP: Date of last level >> 0 dump: the epoch >> 1251693124.038977: sendbackup: 90: normal(|): DUMP: Dumping /dev/rsd0a >> (/) to standard output >> 1251693124.143454: sendbackup: 90: normal(|): DUMP: mapping (Pass I) >> [regular files] >> 1251693126.173916: sendbackup: 90: normal(|): DUMP: mapping (Pass II) >> [directories] >> 1251693126.176287: sendbackup: 90: normal(|): DUMP: estimated 35134 >> tape blocks. >> 1251693126.185677: sendbackup: 90: normal(|): DUMP: Volume 1 started >> at: Sun Aug 30 22:32:06 2009 >> 1251693126.205305: sendbackup: 90: normal(|): DUMP: dumping (Pass III) >> [directories] >> 1251693126.505055: sendbackup: 90: normal(|): DUMP: dumping (Pass IV) >> [regular files] >> 1251693126.797271: sendbackup: critical (fatal): index tee cannot write >> [Resource temporarily unavailable] >> 1251693126.799929: sendbackup: 115: strange(?): sendbackup: index tee >> cannot write [Resource temporarily unavailable] >> 1251693126.821488: sendbackup: 90: normal(|): DUMP: Broken pipe >> 1251693126.825377: sendbackup: 90: normal(|): DUMP: The ENTIRE dump is >> aborted. >> 1251693126.833573: sendbackup: critical (fatal): error [dump (22791) >> /sbin/dump returned 3] >> >> Thanks, >> Michael >> >> On Mon, Aug 31, 2009 at 12:49 PM, Jean-Louis Martineau < >> martin...@zmanda.com <mailto:martin...@zmanda.com>> wrote: >> >> You get the error in the index pipe instead of the data path. The >> backup is correct but your index is empty. That's why you get a >> STRANGE result instead of a failure. >> >> Can you post a sendbackup.*.debug for a dle that failed? >> >> Jean-Louis >> >> Michael Burk wrote: >> >> Hello, >> >> I applied the patches to the 20090827 snapshot. I tried it on >> two OpenBSD 4.5 sparc64 systems, forcing both to do full >> backups. One system seemed to work on all 3 file systems, the >> other failed on all 4 file systems with the same errors as >> before (exactly like what Stan reported). I'm using bsdtcp auth. >> >> The system that worked, however, did give a "strange" result >> in the report on one filesystem. Here's the corresponding >> "sendbackup" debug file: >> >> 1251692700.529842: sendbackup: pid 30459 ruid 150 euid 150 >> version 2.6.2alpha: start at Sun Aug 30 2 >> 2:25:00 2009 >> 1251692700.530411: sendbackup: Version 2.6.2alpha >> 1251692700.553135: sendbackup: pid 30459 ruid 150 euid 150 >> version 2.6.2alpha: rename at Sun Aug 30 >> 22:25:00 2009 >> 1251692700.555394: sendbackup: Parsed request as: program `DUMP' >> 1251692700.555435: sendbackup: disk `/usr' >> 1251692700.555461: sendbackup: device `/usr' >> 1251692700.555484: sendbackup: level 0 >> 1251692700.555507: sendbackup: since NODATE >> 1251692700.555530: sendbackup: options `' >> 1251692700.556750: sendbackup: start: >> zirconium.example.com:/usr lev 0 >> 1251692700.559495: sendbackup: dumping device '/dev/rsd1d' >> with 'ffs' >> 1251692700.566094: sendbackup: Spawning "/sbin/dump dump 0usf >> 1048576 - /dev/rsd1d" in pipeline >> 1251692700.571185: sendbackup: Started backup >> 1251692700.572425: sendbackup: fd 3 is set with O_NONBLOCK >> before dup2 >> 1251692700.597388: sendbackup: Started index creator: >> "/sbin/restore -tvf - 2>&1 | sed -e ' >> s/^leaf[ ]*[0-9]*[ ]*\.// >> t >> /^dir[ ]/ { >> s/^dir[ ]*[0-9]*[ ]*\.// >> s%$%/% >> t >> } >> d >> '" >> 1251692700.611526: sendbackup: 90: normal(|): DUMP: Date >> of this level 0 dump: Sun Aug 30 22:25: >> 00 2009 >> 1251692700.669703: sendbackup: 90: normal(|): DUMP: Date >> of last level 0 dump: the epoch >> 1251692700.670724: sendbackup: 90: normal(|): DUMP: >> Dumping /dev/rsd1d (/usr) to standard output >> 1251692700.763600: sendbackup: 90: normal(|): DUMP: >> mapping (Pass I) [regular files] >> 1251692705.618345: sendbackup: 90: normal(|): DUMP: >> mapping (Pass II) [directories] >> 1251692705.620040: sendbackup: 90: normal(|): DUMP: >> estimated 1330085 tape blocks. >> 1251692705.623976: sendbackup: 90: normal(|): DUMP: Volume >> 1 started at: Sun Aug 30 22:25:05 2009 >> 1251692705.650687: sendbackup: 90: normal(|): DUMP: >> dumping (Pass III) [directories] >> 1251692745.051547: sendbackup: 90: normal(|): DUMP: >> dumping (Pass IV) [regular files] >> 1251692747.915767: sendbackup: 115: strange(?): sed: stdout: >> Resource temporarily unavailable >> 1251692954.633083: sendbackup: 43: size(|): DUMP: >> 1437829 tape blocks >> 1251692954.637414: sendbackup: 90: normal(|): DUMP: Date >> of this level 0 dump: Sun Aug 30 22:25:00 2009 >> 1251692954.641414: sendbackup: 90: normal(|): DUMP: Volume >> 1 completed at: Sun Aug 30 22:29:14 2009 >> 1251692954.641858: sendbackup: Index pipe exited with status 1 >> 1251692954.642719: sendbackup: 90: normal(|): DUMP: Volume >> 1 took 0:04:09 >> 1251692954.643918: sendbackup: 90: normal(|): DUMP: Volume >> 1 transfer rate: 5774 KB/s >> 1251692954.645095: sendbackup: 90: normal(|): DUMP: Date >> this dump completed: Sun Aug 30 22:29:14 2009 >> 1251692954.646239: sendbackup: 90: normal(|): DUMP: >> Average transfer rate: 5774 KB/s >> 1251692954.647430: sendbackup: 90: normal(|): DUMP: level >> 0 dump on Sun Aug 30 22:25:00 2009 >> 1251692954.653480: sendbackup: 90: normal(|): DUMP: DUMP >> IS DONE >> 1251692954.653841: sendbackup: Parsed backup messages >> 1251692954.654032: sendbackup: pid 30459 finish time Sun Aug >> 30 22:29:14 2009 >> >> -- Michael >> >> On Fri, Aug 28, 2009 at 5:22 AM, stan <st...@panix.com >> <mailto:st...@panix.com> <mailto:st...@panix.com >> >> <mailto:st...@panix.com>>> wrote: >> >> On Thu, Aug 27, 2009 at 03:35:47PM -0600, Michael Burk wrote: >> > Cool - can you send me the patch? Or is it already in the >> 0826 >> snapshot? >> > >> > FYI - I got 2.5.0p2 working, but could never get 2.5.1p3 >> working, even with >> > the same config. No idea why. >> > I started looking at the source code last night in the 0825 >> snapshot. I'll >> > discontinue that search if there's a patch. >> > >> This is interesting. I sent Jean-Louis Martineau an email, >> saying >> teaks for >> fixing things, and he replied that the patch he sent me >> just added >> debugging, >> it was not intended to fix anything. >> >> Bit, using this snapshot 20090813, and the patch, which I will >> send to you. >> I have it working on my 3 OpenBSD 4.5 machines that I have >> upgraded so far. >> I ran several test yesterday, and a full fledged backup run >> last >> night, >> and I have not seen any issues since installing this code. >> >> I am going to attach the patch to this email. >> >> I would greatly appreciate it, if you would apply this patch to >> the above >> snapshot, test, and tell me what your results are. >> >> I am very puzzled at the moment. >> >> BTW, is your failure the same as mine? That is amcheck >> passes, and >> most >> times you get a PARTIAL backup? >> >> BTW, I am using bsdtcp "auth" and client compression now, but I >> have made >> this work with this code, with bsd auth, and no >> compression, so I >> don't >> think that has any affect on this issue. >> >> Thanks for the help! >> >> -- >> One of the main causes of the fall of the roman empire was >> that, >> lacking >> zero, they had no way to indicate successful termination of >> their C >> programs. >> >> >> >> >> >