Re: BUG

2006-01-05 Thread Michael Loftis



--On January 5, 2006 11:05:44 AM -0700 John E Hein <[EMAIL PROTECTED]> wrote:


I still think we need to be able to break up estimate requests into
multiple chunks if necessary.  I never got around to making a patch
for that.


Yeah and amandad would have to properly understand breaking up it's REPs 
(PREPs...?).


--
"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler


Re: BUG

2006-01-05 Thread Paul Bijnens

John E Hein wrote:

I don't know if this is exactly the same as a problem I found and
patched, but it smells similar.

See the attached emails (from emails I sent to hackers last July).  I
patched my amanda (2.4.5) and have been using it without problems
since.  I don't know if this was ever checked in or put on the patches
page.



This patch was already merged in 2.4.5.

These patches actually introduce the problem that Michael encounters:
planner now splits the "REQ" type packets into max MAX_DGRAM/2 size,
but amandad.c is not patched to allow for such "REQ" packets to span
more than one UDP dgram.



===
RCS file: /cvsroot/amanda/amanda/common-src/dgram.h,v
retrieving revision 1.6.2.2
diff -u -r1.6.2.2 dgram.h
--- common-src/dgram.h  1999/09/01 17:58:39 1.6.2.2
+++ common-src/dgram.h  2001/06/29 17:52:34
@@ -34,7 +34,12 @@
 
 #include "amanda.h"
 
-#define MAX_DGRAM  (64*1024)

+/*
+ * Maximum datagram (UDP packet) we can generate.  Size is limited by
+ * a 16 bit length field in an IPv4 header (65535), which must include
+ * the 24 byte IP header and the 8 byte UDP header.
+ */
+#define MAX_DGRAM  (((1<<16)-1)-24-8)


Better definition of the payload of a 64K packet indeed...

[...]

RCS file: /cvsroot/amanda/amanda/server-src/amcheck.c,v
retrieving revision 1.50.2.20
diff -u -r1.50.2.20 amcheck.c

[...]

+   if(req_len + l_len > MAX_DGRAM / 2) {/* allow 2X for err 
response */


amcheck has the same code as planner when assembling the packet for the 
remote amandad client.  Good!



[...]

===
RCS file: /cvsroot/amanda/amanda/server-src/planner.c,v
retrieving revision 1.76.2.20
diff -u -r1.76.2.20 planner.c

[...]

+   if(req_len + s_len > MAX_DGRAM / 2) {/* allow 2X for err 
response */
+   amfree(s);
+   break;
+   }



This is where planner limits the size of the REQ packet to 32K.

[...]

But there is no patch for amandad.c to allow for more than one REQ packet.




FreeBSD has a max udp datagram size of 9216 ('sysctl -a | grep dgram'
or reading sys/netinet/udp_usrreq.c shows it).

So calling sendto(2) with a message larger than this causes EMSGSIZE.

Dumping a host that has 42 disklist entries causes that for us.  The
sendsize request message gets large.  There are requests for
information about multiple dump levels per disklist entry, and we are
seeing nominally 2-3 requests per DLE (42 * 2.5 * ~130 chars/line =
13650).

Below is a patch to use setsockopt to bump that up to MAX_DGRAM for
the amanda udp sockets.  This should last a lot longer until we hit
the ~64 KiB limit.

There's still a problem with jrj's dgram split fix from 2001 once you
hit that limit - specifically a truncation of the list rather than
sending multiple estimate request chunks.  So you get 'missing
results' for the DLEs that sendsize is never asked to reports
estimates.  I'll address that in a separate email if I get the time to
come up with a patch.

This patch compiles on FreeBSD, but should work on other OS's, too,
since SO_SNDBUF is a standard option.

In fact, this option is used in stream.c to set socket buffer sizes.
We could, in theory, use the same function (try_socksize) to set the
UDP socket buffer size.  It just uses a socket handle and doesn't care
if it's udp/tcp.  If someone wants to tweak the patch to share
try_socksize() for dgram.c as well, that should work also.

That would involve making try_socksize() non-static (possibly moving
it to a different .c file if desired) and putting the prototype in a
.h file that both dgram.c and stream.c could include.

Lots of chatter for a little patch...

--- common-src/dgram.c.orig Tue Nov 12 12:19:03 2002
+++ common-src/dgram.c  Wed Jul 13 12:29:19 2005
@@ -53,6 +53,7 @@
 socklen_t len;
 struct sockaddr_in name;
 int save_errno;
+int size = MAX_DGRAM;
 
 if((s = socket(AF_INET, SOCK_DGRAM, 0)) == -1) {

save_errno = errno;
@@ -70,6 +71,10 @@
errno = EMFILE; /* out of range */
return -1;
 }
+if (setsockopt(s, SOL_SOCKET, SO_SNDBUF, (void *) &size,
+   sizeof(size)) < 0)
+   dbprintf(("%s: dgram_bind: could not set udp send buffer to %d\n",
+ debug_prefix(NULL), size));
 
 memset(&name, 0, sizeof(name));

 name.sin_family = AF_INET;


The above patch is not implemented.  But the installation notes for
FreeBSD do mention to increase the UDP dgram size using the sysctl
command or /etc/sysctl.conf file.


--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z,

Re: BUG (was: Re: Handitarded....odd (partial) estimate timeout errors.)

2006-01-05 Thread Michael Loftis



--On January 5, 2006 4:49:53 PM +0100 Paul Bijnens 
<[EMAIL PROTECTED]> wrote:



Michael Loftis wrote:



Paul asked for the logs, it seems like there's an amanda bug.  The units


Yes, indeed, there is a bug in Amanda!
You have 236 DLE's for that host, and from my reading of the code
the REQuest UDP packet is limited to 32K instead of 64K (see planner.c
lines 1377-1383)  (Need to update the documentation!)


Woot, I'm NOT crazy! :D

...did I just say woot?  My apologies.


It seems that that planner splits up the REQuest packet into separate
UDP-packets when exceeding MAX_DGRAM/2, i.e. 32K.
Your first request was 32580 bytes.  Adding the next string to that
request would have excceeded the 32768 limit.
The reason for division by 2 seems to reserver space for error replies
on each of those.


I knew it was size related but that my packets were significantly less than 
the MAX_DGRAM.  This definitely explains it.



However, the amandad client only expects one and only one REQuest packet.
Any other REQuest packet coming from the same connection (5-tuple:
protocol, remotehost, remoteport, localhost, localport) and having
a type "REQ" is considered a duplicate.
It should actually test for the handle and sequence to be identical
too. It does not.

It's not fixed quickly either:  when receiving the first "REQ" packet,
the amandad client forks and execs the request program (sendsize in
this case) and reads from the results from a pipe.

By the time the second, non-identical request comes in (with different
handle, sequence -- which is currently not checked), sendsize is already
started and cannot be given additional DLE's to estimate.


As a temporary workaround, you could shorten the exclude-list string for
that host by creating a symlink:

ln -s /etc/amanda/exclude.gtar /.excl


Yeah...This will help for a time.  Hopefully long enough for a patch to fix 
amandad.  I'll have to create a separate type for this server, since we've 
got well over a hundred now and they all share that main backup type.  I 
figured shortening the UDP packets somehow would help, I knew it was just 
odd that it wasn't quite right and I seemed to be running into the problem 
way too early :)



and use that as exclude-list: this shortens each line by 20 byte, which
would shrink the package to fit again. (236 DLE's * 20  = 4720 bytes
less in a REQuest UDP for that host!)



AnywayI'm getting a headache thinking about it :)  all my other DLEs
seem ok for that host, and the ones that it misses are not always
exactly the same, but all seem to be non-calcsize estimated.


Just bad luck for those entries that happen to go in the end of the
queue.  On the other hand, when really unlucky, you could have up to
three estimates for each DLE, overflowing even the 4K we saved by
shrinking the exclude string...


Like I said, hopefully by then either the hackers (or myself) will have put 
together a patch.  ...  I see three ways to fix this...one of which I don't 
know will fix, what about turning wait=yes to wait=no in my xinetd.conf? 
Not sure what that would break.  The other two involve code...multiple 
sendsize's, *or* a protocol change to wait for a 'final start' packet, or 
an amandad change to wait a few extra seconds before starting the actual 
sendsize, coalescing the results.


And you're right, the other ways aren't easy...one involves possibly 
breaking the protocol too.






Re: new feature: client-side, server-side encryption dumptype option

2006-01-05 Thread Josef Wolf

> > I think it would be helpful for you to write out your assumptions
> > about threats.  I am relatively unconcerned with people getting access
> > to my tapes - they are locked up as well as the computers.
> 
> They are locked up _today_.  Do you know what will happen to them in a
> couple of months/years?  I remember at least two cases where big banks
> have lost tapes with sensitive data on them and no one knows where the
> tapes are or who have/had access to them.  How do you know that this
> will not happen to your tapes?

I just stumbled over yet an other (current) case of lost backup tapes with
sensitive data on them: Look at http://www.heise.de/newsticker/meldung/67824
Since this page is in german, here's a short summary:

 Marriot Hotels has lost backups with address and creditcard data of
 206000 customers.  [ ... ] A couple of weeks ago the bank ABN Amro has
 lost a backup tape with data of 2 million credit users.

> > Really, I am trying to ask you to think about keeping transport and
> > storage encryption conceptually separate, even if you have a mechanism
> > that does both without any bits on the server.

The above examples show that having unencrypted backups is not really a
good idea.  So please think once more about it.  The only sane way is
the client-encrypted public-key method.

-- 
No software patents in Europe -- http://nosoftwarepatents.com
-- Josef Wolf -- [EMAIL PROTECTED] --


Re: amflush 2.4.3 hanging?

2006-01-05 Thread Gene Heskett
On Thursday 05 January 2006 05:05, Wim Baetens wrote:
>Hi,
>
>Is anyone aware of amflush 2.4.3 not doing what it's supposed to do?

First, 2.4.3 is now _very_ long in the tooth, having been released on 
2002-10-07.

Second, it had a buglet fixed that might have resembled this back in 
version 2.4.4 on 2004-04-23, from the ChangeLog:

2004-04-23  Eric Siegerman <[EMAIL PROTECTED]>
Bug fix: amflush would run, and consume a tape, even if there
were no Amanda directories waiting to be flushed:
* common-src/sl.c (is_empty_sl): New function to test
  whether a list is empty.
* common-src/sl.h (is_empty_sl): Prototype.
* server-src/amflush.sh (main): Use is_empty_sl(),
  rather than ==NULL, to test emptiness of datestamp_list.

2003-06-05  Jean-Louis Martineau <[EMAIL PROTECTED]>

* server-src/amflush.c: Implement new -b and -s options.
* man/amflush.8.in: Document it.

2003-06-02  Jean-Louis Martineau <[EMAIL PROTECTED]>
Patch by Paul Bijnens <[EMAIL PROTECTED]>

* server-src/amstatus.pl.in: Fix for missing amdump.1 or 
amflush.1.


I don't know if that takes us all the way back to 2.4.3 though.  I 
don't think it does.  No, 2.4.3 was released 2002-10-07.

>Here's the situation: I have a few backups on the holding disk that
>failed to write to tape. I made sure that the correct tape is loaded,
>and amcheck runs as expected. I launched amflush 2.4.3 in foreground
>mode, which gives me correct list of backups. I select the first
>backup to be flushed and confirm to proceed.

And you did this as the correct user?  root won't do.

>amflush launches, writes the tape header, but then just sits there,
>consuming about 97% of the system resources. After a long time, the
>only solution is to kill amflush and do an amcleanup. Last nights
>backup was successful, but I'm still stuck with the three backups
> from the past days that need to flushed. I admit I have no clue of
> how to proceed - any help would be welcome.

In the current snapshot, and for about the last 2 years, there has been 
a keyword you could add to your amanda.conf that would cause any files 
that need to be flushed, to be included in the current backup run.

>From the current srcdir/example/amanda.conf:
autoflush no #
# if autoflush is set to yes, then amdump will schedule all dump on
# holding disks to be flush to tape during the run.

And from the ChangeLog:
2004-11-08  Jean-Louis Martineau <[EMAIL PROTECTED]>
Patch by Orion Poplawski <[EMAIL PROTECTED]>

* server-src/amstatus.pl.in: a FLUSH command can't be in an 
estimate
  phase.
* server-src/driver.c: Start autoflush while waiting for 
estimate.
* server-src/planner.c: Write FLUSH line before estimate.
2001-11-08  Jean-Louis Martineau <[EMAIL PROTECTED]>

* server-src/conffile.c (autoflush): New configuration options.
* server-src/conffile.h (autoflush): New configuration options.

So this was in 2.4.3, being a late 2.4.2 addition.

I personally have zip experience with amflush since I gave up on tapes 
and converted to vtapes via the FILE: directive.  I bought a big hard 
drive and its been maybe 0.1 percent of the trouble I was having 
with tapes, but then I was using DDS2's, all an old fart on SS could 
afford, but cheap tapes gets you headachey backups, TANSTAAFL.

This hard drive is internal, so it wouldn't do to remove it for offsite 
storage.  OTOH, I'm not running a major public agency here either.

If you know the configuration used to build the version you are 
running, then its extremely simple to build and install the current 
snapshot, I do it here for every new one in the 2.4.x release series.  
It takes me less than 15 minutes here as I long since converted my 
configuration options to a script that configures and builds it, then 
a quick 'su -' to install it, run ldconfig, back to the user 'amanda' 
and run an amcheck.

That script is available in this lists archives, it was posted here 
several times, but I stopped when I converted to FILE: devices as it 
became a bit simpler.  Back up about 2 years to look for it.  Or 
possibly someone else has adapted it for their system, and could post 
theirs for an example.

It makes installing and updateing an amanda install into something 
thats not much more complex than installing from your distro's 
packaging system.  But when you install from a packaged repo, you are 
stuck with the options compiled into it, and I think every packager 
has a different idea of where to put stuff, a not very effective way 
to maintain compatibility across upgrades IMO.

Amanda is being developed, with new features all the time, but with the 
provision that anything done is backwards compatible, so old clients 
run with new server releases and vice versa.

So methinks its time to update.  Amanda is getting better all the time.
Current snapshots are available in tarball form at:



BUG (was: Re: Handitarded....odd (partial) estimate timeout errors.)

2006-01-05 Thread Paul Bijnens

Michael Loftis wrote:



Paul asked for the logs, it seems like there's an amanda bug.  The units 


Yes, indeed, there is a bug in Amanda!
You have 236 DLE's for that host, and from my reading of the code
the REQuest UDP packet is limited to 32K instead of 64K (see planner.c
lines 1377-1383)  (Need to update the documentation!)

It seems that that planner splits up the REQuest packet into separate
UDP-packets when exceeding MAX_DGRAM/2, i.e. 32K.
Your first request was 32580 bytes.  Adding the next string to that
request would have excceeded the 32768 limit.
The reason for division by 2 seems to reserver space for error replies
on each of those.

However, the amandad client only expects one and only one REQuest packet.
Any other REQuest packet coming from the same connection (5-tuple:
protocol, remotehost, remoteport, localhost, localport) and having
a type "REQ" is considered a duplicate.
It should actually test for the handle and sequence to be identical
too. It does not.

It's not fixed quickly either:  when receiving the first "REQ" packet,
the amandad client forks and execs the request program (sendsize in
this case) and reads from the results from a pipe.

By the time the second, non-identical request comes in (with different
handle, sequence -- which is currently not checked), sendsize is already
started and cannot be given additional DLE's to estimate.


As a temporary workaround, you could shorten the exclude-list string for 
that host by creating a symlink:


   ln -s /etc/amanda/exclude.gtar /.excl

and use that as exclude-list: this shortens each line by 20 byte, which
would shrink the package to fit again. (236 DLE's * 20  = 4720 bytes
less in a REQuest UDP for that host!)


AnywayI'm getting a headache thinking about it :)  all my other DLEs 
seem ok for that host, and the ones that it misses are not always 
exactly the same, but all seem to be non-calcsize estimated.


Just bad luck for those entries that happen to go in the end of the
queue.  On the other hand, when really unlucky, you could have up to 
three estimates for each DLE, overflowing even the 4K we saved by 
shrinking the exclude string...



--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***




amflush 2.4.3 hanging?

2006-01-05 Thread Wim Baetens

Hi,

Is anyone aware of amflush 2.4.3 not doing what it's supposed to do?

Here's the situation: I have a few backups on the holding disk that
failed to write to tape. I made sure that the correct tape is loaded,
and amcheck runs as expected. I launched amflush 2.4.3 in foreground
mode, which gives me correct list of backups. I select the first
backup to be flushed and confirm to proceed.

amflush launches, writes the tape header, but then just sits there,
consuming about 97% of the system resources. After a long time, the
only solution is to kill amflush and do an amcleanup. Last nights
backup was successful, but I'm still stuck with the three backups from
the past days that need to flushed. I admit I have no clue of how to
proceed - any help would be welcome.

Thanks!

Wim