Re: Backup issues with OpenBSD 4.5 machines

2009-09-02 Thread Michael Burk
This was a good idea; I tried it with one modification:  I determined
earlier that the failure happens without indexing also, so I added just the
line:
fcntl(datafd, F_GETFL, 0);
and that fixed the problem as well. So I guess this is truly the minimal
patch!

-- Michael

On Tue, Sep 1, 2009 at 1:43 PM, Nathan Stratton Treadway natha...@ontko.com
 wrote:

 On Tue, Sep 01, 2009 at 11:31:26 -0600, Michael Burk wrote:
  I applied the 3-line patch to the 0831 snapshot and ran a full backup on
  both machines, with 4 file systems each. All 8 completed successfully
 with
  no strange messages.
 
  Next, I commented out the 3 new lines and tried the backup again on one
 of
  the machines. This time all 4 file systems failed; e.g.:
 [...]
  So it seems reliable that those 3 lines fix the problem somehow.
  Anything else you want to try before I ask for help on the OpenBSD list?

 I'm no expert on this topic, but if I were investigating something like
 this, I'd be curious to know if all three of the lines in the patch were
 necessary for the fix.

 Since, as you pointed out, the error seems to be tied to the indexing
 subprocess, I wonder what would happen if you included only the one

   fcntl(indexfd, F_GETFL, 0);

 line of the patch, but not the other two


Nathan


 
 Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
 Ray Ontko  Co.  -  Software consulting services  -
 http://www.ontko.com/
  GPG Key: 
 http://www.ontko.com/~nathanst/gpg_key.txthttp://www.ontko.com/%7Enathanst/gpg_key.txt
   ID: 1023D/ECFB6239
  Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239



Re: Backup issues with OpenBSD 4.5 machines

2009-09-02 Thread Michael Burk
I modified sendbackup-dump.c to run ktrace, e.g.:
/usr/bin/ktrace -id -t censw -f /tmp/sendbackup.trc /sbin/dump 0usf 1048576
- /dev/rsd0d

Unfortunately, I don't get a backup, even with the patch applied. The trace
output shows write errors because of a broken pipe with or without the
patches, like this:
 21835 dump CALL  write(0x2,0x53fbec,0x8)
 21835 dump RET   write -1 errno 32 Broken pipe
So I suspect my approach is not correct. Any other ideas how I might get
some useful trace output?

Thanks,
Michael

On Tue, Sep 1, 2009 at 1:25 PM, Dustin J. Mitchell dus...@zmanda.comwrote:

 On Tue, Sep 1, 2009 at 3:18 PM, Jean-Louis
 Martineaumartin...@zmanda.com wrote:
  I have nothing else to try.
  The order of system call is a follow:

 If it's not too hard, it would be nice to have a ktrace or equivalent
 of this, first to look at here, and second to take to the OpenBSD
 list.  I know that's tricky since this is a daemon process..

 Dustin

 --
 Open Source Storage Engineer
 http://www.zmanda.com



Re: Backup issues with OpenBSD 4.5 machines

2009-09-02 Thread Dustin J. Mitchell
On Wed, Sep 2, 2009 at 1:07 PM, Michael Burkbur...@gmail.com wrote:
 So I suspect my approach is not correct. Any other ideas how I might get
 some useful trace output?

Can you have amandad sleep for, say, 120 seconds just before it
launches sendbackup, and somenow notify you of the pid to which you
should attach ktrace?  Maybe by writing it to /dev/console or to
syslog?

By the way, it's the launching of sendbackup that seems to be failing,
not the launching of dump.

Dustin

-- 
Open Source Storage Engineer
http://www.zmanda.com


Estimate timeout

2009-09-02 Thread Brian Cuttler

Amanda 2.6.1 on Solaris 10/Sparc
Amanda 2.6.1, Solaris 10x86

Server has 21 clients with a total of 109 DLEs.
One of the client systems has 51 DLEs, 1 ufs and 50 zfs partitions.

The partitions/DLE are all part of the same ZFS pool, which
I believe (listening to another discussion earlier this week)
are checked sequentially.

We seem to be exceeding a timeout limit.

etimeout is for size estimates - so I don't think it applies.

We have switched to server estimate for zfs-dump.

Is there a per client amcheck estimate timeout, not based on
number of client DLEs ?


Amanda Backup Client Hosts Check

WARNING: finsen: selfcheck request failed: timeout waiting for REP
Client check: 21 hosts checked in 91.125 seconds.  1 problem found.

thank you,

Brian
---
   Brian R Cuttler brian.cutt...@wadsworth.org
   Computer Systems Support(v) 518 486-1697
   Wadsworth Center(f) 518 473-6384
   NYS Department of HealthHelp Desk 518 473-0773



IMPORTANT NOTICE: This e-mail and any attachments may contain
confidential or sensitive information which is, or may be, legally
privileged or otherwise protected by law from further disclosure.  It
is intended only for the addressee.  If you received this in error or
from someone who was not authorized to send it to you, please do not
distribute, copy or use it or any attachments.  Please notify the
sender immediately by reply e-mail and delete this from your
system. Thank you for your cooperation.




Re: very slow dumper (42.7KB/s)

2009-09-02 Thread Tom Robinson
Dustin J. Mitchell wrote:
 On Mon, Aug 31, 2009 at 11:51 PM, Tom Robinsontom.robin...@motec.com.au 
 wrote:
   
 While the disk is reaching saturation (and recovering quickly) I'm
 thinking that the all the retransmissions would be slowing things down more.

 I don't see any errors on the client interface but there are four on the
 server interface over the last four days.
 

 Hmm, the causation may be going the other way -- if the disk is
 generating too many IRQs for the CPU to handle, then network packets
 might get dropped.  Alternately, perhaps the PCI bus is maxed out?
 Anyway, this sounds like a problem local to the client.  Is there a
 way to slow down the disk IO so that it doesn't wedge the machine?
   
Thanks Dustin,

I've found that our very old (RH7.1 seawolf), running a very old kernel
(2.4.20) has a bug in the ide driver. I can't say categorically that
this is the root cause of the dump issue I saw but, finally, I've got
permission to move forward with a planned upgrade that I've been pushing
for some time now.

For those that are interested, I suspect this is the problem:
https://bugzilla.redhat.com/show_bug.cgi?id=134579

Thanks for all the help

Regards,

Tom

-- 

Tom Robinson
System Administrator

MoTeC

121 Merrindale Drive
Croydon South
3136 Victoria
Australia

T: +61 3 9761 5050
F: +61 3 9761 5051   
M: +61 4 3268 7026
E: tom.robin...@motec.com.au