Re: BUG
--On January 5, 2006 11:05:44 AM -0700 John E Hein <[EMAIL PROTECTED]> wrote: I still think we need to be able to break up estimate requests into multiple chunks if necessary. I never got around to making a patch for that. Yeah and amandad would have to properly understand breaking up it's REPs (PREPs...?). -- "Genius might be described as a supreme capacity for getting its possessors into trouble of all kinds." -- Samuel Butler
Re: BUG
John E Hein wrote: I don't know if this is exactly the same as a problem I found and patched, but it smells similar. See the attached emails (from emails I sent to hackers last July). I patched my amanda (2.4.5) and have been using it without problems since. I don't know if this was ever checked in or put on the patches page. This patch was already merged in 2.4.5. These patches actually introduce the problem that Michael encounters: planner now splits the "REQ" type packets into max MAX_DGRAM/2 size, but amandad.c is not patched to allow for such "REQ" packets to span more than one UDP dgram. === RCS file: /cvsroot/amanda/amanda/common-src/dgram.h,v retrieving revision 1.6.2.2 diff -u -r1.6.2.2 dgram.h --- common-src/dgram.h 1999/09/01 17:58:39 1.6.2.2 +++ common-src/dgram.h 2001/06/29 17:52:34 @@ -34,7 +34,12 @@ #include "amanda.h" -#define MAX_DGRAM (64*1024) +/* + * Maximum datagram (UDP packet) we can generate. Size is limited by + * a 16 bit length field in an IPv4 header (65535), which must include + * the 24 byte IP header and the 8 byte UDP header. + */ +#define MAX_DGRAM (((1<<16)-1)-24-8) Better definition of the payload of a 64K packet indeed... [...] RCS file: /cvsroot/amanda/amanda/server-src/amcheck.c,v retrieving revision 1.50.2.20 diff -u -r1.50.2.20 amcheck.c [...] + if(req_len + l_len > MAX_DGRAM / 2) {/* allow 2X for err response */ amcheck has the same code as planner when assembling the packet for the remote amandad client. Good! [...] === RCS file: /cvsroot/amanda/amanda/server-src/planner.c,v retrieving revision 1.76.2.20 diff -u -r1.76.2.20 planner.c [...] + if(req_len + s_len > MAX_DGRAM / 2) {/* allow 2X for err response */ + amfree(s); + break; + } This is where planner limits the size of the REQ packet to 32K. [...] But there is no patch for amandad.c to allow for more than one REQ packet. FreeBSD has a max udp datagram size of 9216 ('sysctl -a | grep dgram' or reading sys/netinet/udp_usrreq.c shows it). So calling sendto(2) with a message larger than this causes EMSGSIZE. Dumping a host that has 42 disklist entries causes that for us. The sendsize request message gets large. There are requests for information about multiple dump levels per disklist entry, and we are seeing nominally 2-3 requests per DLE (42 * 2.5 * ~130 chars/line = 13650). Below is a patch to use setsockopt to bump that up to MAX_DGRAM for the amanda udp sockets. This should last a lot longer until we hit the ~64 KiB limit. There's still a problem with jrj's dgram split fix from 2001 once you hit that limit - specifically a truncation of the list rather than sending multiple estimate request chunks. So you get 'missing results' for the DLEs that sendsize is never asked to reports estimates. I'll address that in a separate email if I get the time to come up with a patch. This patch compiles on FreeBSD, but should work on other OS's, too, since SO_SNDBUF is a standard option. In fact, this option is used in stream.c to set socket buffer sizes. We could, in theory, use the same function (try_socksize) to set the UDP socket buffer size. It just uses a socket handle and doesn't care if it's udp/tcp. If someone wants to tweak the patch to share try_socksize() for dgram.c as well, that should work also. That would involve making try_socksize() non-static (possibly moving it to a different .c file if desired) and putting the prototype in a .h file that both dgram.c and stream.c could include. Lots of chatter for a little patch... --- common-src/dgram.c.orig Tue Nov 12 12:19:03 2002 +++ common-src/dgram.c Wed Jul 13 12:29:19 2005 @@ -53,6 +53,7 @@ socklen_t len; struct sockaddr_in name; int save_errno; +int size = MAX_DGRAM; if((s = socket(AF_INET, SOCK_DGRAM, 0)) == -1) { save_errno = errno; @@ -70,6 +71,10 @@ errno = EMFILE; /* out of range */ return -1; } +if (setsockopt(s, SOL_SOCKET, SO_SNDBUF, (void *) &size, + sizeof(size)) < 0) + dbprintf(("%s: dgram_bind: could not set udp send buffer to %d\n", + debug_prefix(NULL), size)); memset(&name, 0, sizeof(name)); name.sin_family = AF_INET; The above patch is not implemented. But the installation notes for FreeBSD do mention to increase the UDP dgram size using the sysctl command or /etc/sysctl.conf file. -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, ^^, * * F6, quit, ZZ, :q, :q!, M-Z,
Re: BUG (was: Re: Handitarded....odd (partial) estimate timeout errors.)
--On January 5, 2006 4:49:53 PM +0100 Paul Bijnens <[EMAIL PROTECTED]> wrote: Michael Loftis wrote: Paul asked for the logs, it seems like there's an amanda bug. The units Yes, indeed, there is a bug in Amanda! You have 236 DLE's for that host, and from my reading of the code the REQuest UDP packet is limited to 32K instead of 64K (see planner.c lines 1377-1383) (Need to update the documentation!) Woot, I'm NOT crazy! :D ...did I just say woot? My apologies. It seems that that planner splits up the REQuest packet into separate UDP-packets when exceeding MAX_DGRAM/2, i.e. 32K. Your first request was 32580 bytes. Adding the next string to that request would have excceeded the 32768 limit. The reason for division by 2 seems to reserver space for error replies on each of those. I knew it was size related but that my packets were significantly less than the MAX_DGRAM. This definitely explains it. However, the amandad client only expects one and only one REQuest packet. Any other REQuest packet coming from the same connection (5-tuple: protocol, remotehost, remoteport, localhost, localport) and having a type "REQ" is considered a duplicate. It should actually test for the handle and sequence to be identical too. It does not. It's not fixed quickly either: when receiving the first "REQ" packet, the amandad client forks and execs the request program (sendsize in this case) and reads from the results from a pipe. By the time the second, non-identical request comes in (with different handle, sequence -- which is currently not checked), sendsize is already started and cannot be given additional DLE's to estimate. As a temporary workaround, you could shorten the exclude-list string for that host by creating a symlink: ln -s /etc/amanda/exclude.gtar /.excl Yeah...This will help for a time. Hopefully long enough for a patch to fix amandad. I'll have to create a separate type for this server, since we've got well over a hundred now and they all share that main backup type. I figured shortening the UDP packets somehow would help, I knew it was just odd that it wasn't quite right and I seemed to be running into the problem way too early :) and use that as exclude-list: this shortens each line by 20 byte, which would shrink the package to fit again. (236 DLE's * 20 = 4720 bytes less in a REQuest UDP for that host!) AnywayI'm getting a headache thinking about it :) all my other DLEs seem ok for that host, and the ones that it misses are not always exactly the same, but all seem to be non-calcsize estimated. Just bad luck for those entries that happen to go in the end of the queue. On the other hand, when really unlucky, you could have up to three estimates for each DLE, overflowing even the 4K we saved by shrinking the exclude string... Like I said, hopefully by then either the hackers (or myself) will have put together a patch. ... I see three ways to fix this...one of which I don't know will fix, what about turning wait=yes to wait=no in my xinetd.conf? Not sure what that would break. The other two involve code...multiple sendsize's, *or* a protocol change to wait for a 'final start' packet, or an amandad change to wait a few extra seconds before starting the actual sendsize, coalescing the results. And you're right, the other ways aren't easy...one involves possibly breaking the protocol too.
Re: new feature: client-side, server-side encryption dumptype option
> > I think it would be helpful for you to write out your assumptions > > about threats. I am relatively unconcerned with people getting access > > to my tapes - they are locked up as well as the computers. > > They are locked up _today_. Do you know what will happen to them in a > couple of months/years? I remember at least two cases where big banks > have lost tapes with sensitive data on them and no one knows where the > tapes are or who have/had access to them. How do you know that this > will not happen to your tapes? I just stumbled over yet an other (current) case of lost backup tapes with sensitive data on them: Look at http://www.heise.de/newsticker/meldung/67824 Since this page is in german, here's a short summary: Marriot Hotels has lost backups with address and creditcard data of 206000 customers. [ ... ] A couple of weeks ago the bank ABN Amro has lost a backup tape with data of 2 million credit users. > > Really, I am trying to ask you to think about keeping transport and > > storage encryption conceptually separate, even if you have a mechanism > > that does both without any bits on the server. The above examples show that having unencrypted backups is not really a good idea. So please think once more about it. The only sane way is the client-encrypted public-key method. -- No software patents in Europe -- http://nosoftwarepatents.com -- Josef Wolf -- [EMAIL PROTECTED] --
Re: amflush 2.4.3 hanging?
On Thursday 05 January 2006 05:05, Wim Baetens wrote: >Hi, > >Is anyone aware of amflush 2.4.3 not doing what it's supposed to do? First, 2.4.3 is now _very_ long in the tooth, having been released on 2002-10-07. Second, it had a buglet fixed that might have resembled this back in version 2.4.4 on 2004-04-23, from the ChangeLog: 2004-04-23 Eric Siegerman <[EMAIL PROTECTED]> Bug fix: amflush would run, and consume a tape, even if there were no Amanda directories waiting to be flushed: * common-src/sl.c (is_empty_sl): New function to test whether a list is empty. * common-src/sl.h (is_empty_sl): Prototype. * server-src/amflush.sh (main): Use is_empty_sl(), rather than ==NULL, to test emptiness of datestamp_list. 2003-06-05 Jean-Louis Martineau <[EMAIL PROTECTED]> * server-src/amflush.c: Implement new -b and -s options. * man/amflush.8.in: Document it. 2003-06-02 Jean-Louis Martineau <[EMAIL PROTECTED]> Patch by Paul Bijnens <[EMAIL PROTECTED]> * server-src/amstatus.pl.in: Fix for missing amdump.1 or amflush.1. I don't know if that takes us all the way back to 2.4.3 though. I don't think it does. No, 2.4.3 was released 2002-10-07. >Here's the situation: I have a few backups on the holding disk that >failed to write to tape. I made sure that the correct tape is loaded, >and amcheck runs as expected. I launched amflush 2.4.3 in foreground >mode, which gives me correct list of backups. I select the first >backup to be flushed and confirm to proceed. And you did this as the correct user? root won't do. >amflush launches, writes the tape header, but then just sits there, >consuming about 97% of the system resources. After a long time, the >only solution is to kill amflush and do an amcleanup. Last nights >backup was successful, but I'm still stuck with the three backups > from the past days that need to flushed. I admit I have no clue of > how to proceed - any help would be welcome. In the current snapshot, and for about the last 2 years, there has been a keyword you could add to your amanda.conf that would cause any files that need to be flushed, to be included in the current backup run. >From the current srcdir/example/amanda.conf: autoflush no # # if autoflush is set to yes, then amdump will schedule all dump on # holding disks to be flush to tape during the run. And from the ChangeLog: 2004-11-08 Jean-Louis Martineau <[EMAIL PROTECTED]> Patch by Orion Poplawski <[EMAIL PROTECTED]> * server-src/amstatus.pl.in: a FLUSH command can't be in an estimate phase. * server-src/driver.c: Start autoflush while waiting for estimate. * server-src/planner.c: Write FLUSH line before estimate. 2001-11-08 Jean-Louis Martineau <[EMAIL PROTECTED]> * server-src/conffile.c (autoflush): New configuration options. * server-src/conffile.h (autoflush): New configuration options. So this was in 2.4.3, being a late 2.4.2 addition. I personally have zip experience with amflush since I gave up on tapes and converted to vtapes via the FILE: directive. I bought a big hard drive and its been maybe 0.1 percent of the trouble I was having with tapes, but then I was using DDS2's, all an old fart on SS could afford, but cheap tapes gets you headachey backups, TANSTAAFL. This hard drive is internal, so it wouldn't do to remove it for offsite storage. OTOH, I'm not running a major public agency here either. If you know the configuration used to build the version you are running, then its extremely simple to build and install the current snapshot, I do it here for every new one in the 2.4.x release series. It takes me less than 15 minutes here as I long since converted my configuration options to a script that configures and builds it, then a quick 'su -' to install it, run ldconfig, back to the user 'amanda' and run an amcheck. That script is available in this lists archives, it was posted here several times, but I stopped when I converted to FILE: devices as it became a bit simpler. Back up about 2 years to look for it. Or possibly someone else has adapted it for their system, and could post theirs for an example. It makes installing and updateing an amanda install into something thats not much more complex than installing from your distro's packaging system. But when you install from a packaged repo, you are stuck with the options compiled into it, and I think every packager has a different idea of where to put stuff, a not very effective way to maintain compatibility across upgrades IMO. Amanda is being developed, with new features all the time, but with the provision that anything done is backwards compatible, so old clients run with new server releases and vice versa. So methinks its time to update. Amanda is getting better all the time. Current snapshots are available in tarball form at:
BUG (was: Re: Handitarded....odd (partial) estimate timeout errors.)
Michael Loftis wrote: Paul asked for the logs, it seems like there's an amanda bug. The units Yes, indeed, there is a bug in Amanda! You have 236 DLE's for that host, and from my reading of the code the REQuest UDP packet is limited to 32K instead of 64K (see planner.c lines 1377-1383) (Need to update the documentation!) It seems that that planner splits up the REQuest packet into separate UDP-packets when exceeding MAX_DGRAM/2, i.e. 32K. Your first request was 32580 bytes. Adding the next string to that request would have excceeded the 32768 limit. The reason for division by 2 seems to reserver space for error replies on each of those. However, the amandad client only expects one and only one REQuest packet. Any other REQuest packet coming from the same connection (5-tuple: protocol, remotehost, remoteport, localhost, localport) and having a type "REQ" is considered a duplicate. It should actually test for the handle and sequence to be identical too. It does not. It's not fixed quickly either: when receiving the first "REQ" packet, the amandad client forks and execs the request program (sendsize in this case) and reads from the results from a pipe. By the time the second, non-identical request comes in (with different handle, sequence -- which is currently not checked), sendsize is already started and cannot be given additional DLE's to estimate. As a temporary workaround, you could shorten the exclude-list string for that host by creating a symlink: ln -s /etc/amanda/exclude.gtar /.excl and use that as exclude-list: this shortens each line by 20 byte, which would shrink the package to fit again. (236 DLE's * 20 = 4720 bytes less in a REQuest UDP for that host!) AnywayI'm getting a headache thinking about it :) all my other DLEs seem ok for that host, and the ones that it misses are not always exactly the same, but all seem to be non-calcsize estimated. Just bad luck for those entries that happen to go in the end of the queue. On the other hand, when really unlucky, you could have up to three estimates for each DLE, overflowing even the 4K we saved by shrinking the exclude string... -- Paul Bijnens, XplanationTel +32 16 397.511 Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax +32 16 397.512 http://www.xplanation.com/ email: [EMAIL PROTECTED] *** * I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, ^^, * * F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, * * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, * * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, * * init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... * * ... "Are you sure?" ... YES ... Phew ... I'm out * ***
amflush 2.4.3 hanging?
Hi, Is anyone aware of amflush 2.4.3 not doing what it's supposed to do? Here's the situation: I have a few backups on the holding disk that failed to write to tape. I made sure that the correct tape is loaded, and amcheck runs as expected. I launched amflush 2.4.3 in foreground mode, which gives me correct list of backups. I select the first backup to be flushed and confirm to proceed. amflush launches, writes the tape header, but then just sits there, consuming about 97% of the system resources. After a long time, the only solution is to kill amflush and do an amcleanup. Last nights backup was successful, but I'm still stuck with the three backups from the past days that need to flushed. I admit I have no clue of how to proceed - any help would be welcome. Thanks! Wim