Re: Backup issues with OpenBSD 4.5 machines
This was a good idea; I tried it with one modification: I determined earlier that the failure happens without indexing also, so I added just the line: fcntl(datafd, F_GETFL, 0); and that fixed the problem as well. So I guess this is truly the minimal patch! -- Michael On Tue, Sep 1, 2009 at 1:43 PM, Nathan Stratton Treadway natha...@ontko.com wrote: On Tue, Sep 01, 2009 at 11:31:26 -0600, Michael Burk wrote: I applied the 3-line patch to the 0831 snapshot and ran a full backup on both machines, with 4 file systems each. All 8 completed successfully with no strange messages. Next, I commented out the 3 new lines and tried the backup again on one of the machines. This time all 4 file systems failed; e.g.: [...] So it seems reliable that those 3 lines fix the problem somehow. Anything else you want to try before I ask for help on the OpenBSD list? I'm no expert on this topic, but if I were investigating something like this, I'd be curious to know if all three of the lines in the patch were necessary for the fix. Since, as you pointed out, the error seems to be tied to the indexing subprocess, I wonder what would happen if you included only the one fcntl(indexfd, F_GETFL, 0); line of the patch, but not the other two Nathan Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txthttp://www.ontko.com/%7Enathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Backup issues with OpenBSD 4.5 machines
I modified sendbackup-dump.c to run ktrace, e.g.: /usr/bin/ktrace -id -t censw -f /tmp/sendbackup.trc /sbin/dump 0usf 1048576 - /dev/rsd0d Unfortunately, I don't get a backup, even with the patch applied. The trace output shows write errors because of a broken pipe with or without the patches, like this: 21835 dump CALL write(0x2,0x53fbec,0x8) 21835 dump RET write -1 errno 32 Broken pipe So I suspect my approach is not correct. Any other ideas how I might get some useful trace output? Thanks, Michael On Tue, Sep 1, 2009 at 1:25 PM, Dustin J. Mitchell dus...@zmanda.comwrote: On Tue, Sep 1, 2009 at 3:18 PM, Jean-Louis Martineaumartin...@zmanda.com wrote: I have nothing else to try. The order of system call is a follow: If it's not too hard, it would be nice to have a ktrace or equivalent of this, first to look at here, and second to take to the OpenBSD list. I know that's tricky since this is a daemon process.. Dustin -- Open Source Storage Engineer http://www.zmanda.com
Re: Backup issues with OpenBSD 4.5 machines
On Wed, Sep 2, 2009 at 1:07 PM, Michael Burkbur...@gmail.com wrote: So I suspect my approach is not correct. Any other ideas how I might get some useful trace output? Can you have amandad sleep for, say, 120 seconds just before it launches sendbackup, and somenow notify you of the pid to which you should attach ktrace? Maybe by writing it to /dev/console or to syslog? By the way, it's the launching of sendbackup that seems to be failing, not the launching of dump. Dustin -- Open Source Storage Engineer http://www.zmanda.com
Estimate timeout
Amanda 2.6.1 on Solaris 10/Sparc Amanda 2.6.1, Solaris 10x86 Server has 21 clients with a total of 109 DLEs. One of the client systems has 51 DLEs, 1 ufs and 50 zfs partitions. The partitions/DLE are all part of the same ZFS pool, which I believe (listening to another discussion earlier this week) are checked sequentially. We seem to be exceeding a timeout limit. etimeout is for size estimates - so I don't think it applies. We have switched to server estimate for zfs-dump. Is there a per client amcheck estimate timeout, not based on number of client DLEs ? Amanda Backup Client Hosts Check WARNING: finsen: selfcheck request failed: timeout waiting for REP Client check: 21 hosts checked in 91.125 seconds. 1 problem found. thank you, Brian --- Brian R Cuttler brian.cutt...@wadsworth.org Computer Systems Support(v) 518 486-1697 Wadsworth Center(f) 518 473-6384 NYS Department of HealthHelp Desk 518 473-0773 IMPORTANT NOTICE: This e-mail and any attachments may contain confidential or sensitive information which is, or may be, legally privileged or otherwise protected by law from further disclosure. It is intended only for the addressee. If you received this in error or from someone who was not authorized to send it to you, please do not distribute, copy or use it or any attachments. Please notify the sender immediately by reply e-mail and delete this from your system. Thank you for your cooperation.
Re: very slow dumper (42.7KB/s)
Dustin J. Mitchell wrote: On Mon, Aug 31, 2009 at 11:51 PM, Tom Robinsontom.robin...@motec.com.au wrote: While the disk is reaching saturation (and recovering quickly) I'm thinking that the all the retransmissions would be slowing things down more. I don't see any errors on the client interface but there are four on the server interface over the last four days. Hmm, the causation may be going the other way -- if the disk is generating too many IRQs for the CPU to handle, then network packets might get dropped. Alternately, perhaps the PCI bus is maxed out? Anyway, this sounds like a problem local to the client. Is there a way to slow down the disk IO so that it doesn't wedge the machine? Thanks Dustin, I've found that our very old (RH7.1 seawolf), running a very old kernel (2.4.20) has a bug in the ide driver. I can't say categorically that this is the root cause of the dump issue I saw but, finally, I've got permission to move forward with a planned upgrade that I've been pushing for some time now. For those that are interested, I suspect this is the problem: https://bugzilla.redhat.com/show_bug.cgi?id=134579 Thanks for all the help Regards, Tom -- Tom Robinson System Administrator MoTeC 121 Merrindale Drive Croydon South 3136 Victoria Australia T: +61 3 9761 5050 F: +61 3 9761 5051 M: +61 4 3268 7026 E: tom.robin...@motec.com.au