Re: Amanda "info on small DLEs not saved" bug -- proposed patch

2018-11-09 Thread Gene Heskett
On Friday 09 November 2018 18:00:41 Nathan Stratton Treadway wrote:

> On Fri, Nov 02, 2018 at 13:00:38 -0400, Nathan Stratton Treadway wrote:
> > On Fri, Nov 02, 2018 at 12:35:06 -0400, Gene Heskett wrote:
> > > Fri Nov 02 03:42:06.085962877 2018: pid 13139: thd-0x9824e00:
> > > driver: not updating because origsize or dumpsize is 0 Fri Nov 02
> > > 03:42:06.086158866 2018: pid 13139: thd-0x9824e00: driver:
> > > Building type FILE header of 32768-32768 bytes with name='coyote'
> > > disk='/usr/games' dumplevel=1 and blocksize=0
> > > Fri Nov 02 03:42:06.086317001 2018: pid 13139: thd-0x9824e00:
> > > driver: Building type FILE header of 32768-32768 bytes with
> > > name='coyote' disk='/usr/games' dumplevel=1 and blocksize=0
> > > Fri Nov 02 03:42:06.092807051 2018: pid 13139: thd-0x9824e00:
> > > driver: driver: send-cmd time 2461.017 to taper0: FILE-WRITE
> > > worker0-0 00-00130 /usr/dumps/20181102030105/coyote._usr_games.1
> > > coyote /usr/games 1 20181102030105 "" "" "" 1 "" "" "" "" 10
> >
> > Ah, interesting, looks like this applies to level 1 dumps as well. 
> > What does "amadmin CONFIG info coyote /usr/games shop
> > /usr/lib/amanda" output?
>
> Okay, I believe I have tracked down this "info on small DLEs not
> saved" bug...
>
>
> Summary:
>
> The short-ish summary is that in Amanda 3.4 and 3.5, any dump which
> ends but being shorter than 1024 bytes long (after compression) is
> treated as not having happened at all, as far as Amanda's recording of
> "info" statistics goes.
>
> This is most significant for full dumps (i.e. of very small
> partitions), because it causes Amanda to think that DLE has never been
> dumped (or, that the last time it was dumped was on the final run made
> using Amanda 3.3 or older), and thus it schedules that DLE for a full
> dump on every run.
>
> However, the bug actually also affects incremental dumps (as we
> discussed in the above-quoted message) -- so DLEs that don't change
> much on a particular day and thus end up with very tiny incrementals
> end up recording those dumps as having taken place on 1969/12/31
> rather than the day they actually occurred.
>
>
> Neither of the above situations is "fatal" as far as preventing Amanda
> from actually backup up your data, but for people (such as Gene) who
> are effected, a workaround workaround is simply to make sure that the
> dump on a particular day is at least 1024 bytes.
>
> For full dumps, you can do this just by creating a "dummy" file on the
> otherwise-very-empty partition in question, using data that's already
> compressed so that the dump file is still big enough after Amanda
> compresses it.  (In my tests, I just used the bytes off the front of
> the amanda_3.4.2.orig.tar.gz file I happened to have sitting around.)
>
> (For incrementals the trick is to make sure there is enough changing
> on the partition that the incremental dump is over the cutoff size;
> the best way to do that will depend on what data is on that partition,
> etc.)
>
>
>
> Internal details and history:
>
> The bug happens because the messages that the chunker and driver
> processes use to communicate with each other specify the size of the
> files transferred in integral units of "kb" (=1024 bytes), and thus
> the size given for very small files is "0" -- but the code in the
> driver that handles updating the info database has a specific check
> for zero-length dumps and refuses to update the database in that case.
> (This check is found in driverio.c:update_info_dumper() .)
>
> It appears that the bug has existed since Amanda 3.4, when the old
> chunker.c version of the chunker program was replaced with a new Perl
> version.
>
> Both server-src/chunker.c as found in Amanda 3.3 and
> server-src/dumper.c as it exists in 3.5 take special care to round
> "kb" value passed for files which are short-but-not-empty files up to
> "1" -- but that logic was not implemented in the Perl chunker code
> when it was created..
>
> (Interestingly, if I am reading the old chunker.c code correctly, it
> used to round up not just very-small-but-not-empty files to 1 kb, but
> actually rounded the kb figure up to count any trailing partial
> kilobytes at the end of the file... while the new program seems to
> just ignore those trailing partial kilobytes.  Presumably this
> difference simply doesn't matter -- except for the when the size is
> rounded down to zero.)
>
>
> Proposed patch:
>
> This is where having an actual Amanda developer would be very handy...
> but given that planner.c and old-chunker.c both have special handling
> for small-but-not-empty files, I figured that adding a similar check
> to the new Chunker implementation is probably the best fix for this
> situation, and that hopefully doing so would be safe from unwanted
> side-effects
>
> So, I edited the perl/Amanda/Chunker/Controller.pm to implement such a
> check, as shown in the attached patch.  I've been running with this
> patch in place for a couple days now, and so far it seems 

Re: Amanda "info on small DLEs not saved" bug -- proposed patch

2018-11-09 Thread Nathan Stratton Treadway
On Fri, Nov 02, 2018 at 13:00:38 -0400, Nathan Stratton Treadway wrote:
> On Fri, Nov 02, 2018 at 12:35:06 -0400, Gene Heskett wrote:
> 
> > Fri Nov 02 03:42:06.085962877 2018: pid 13139: thd-0x9824e00: driver: not 
> > updating because origsize or dumpsize is 0
> > Fri Nov 02 03:42:06.086158866 2018: pid 13139: thd-0x9824e00: driver: 
> > Building type FILE header of 32768-32768 bytes with name='coyote' 
> > disk='/usr/games' 
> > dumplevel=1 and blocksize=0
> > Fri Nov 02 03:42:06.086317001 2018: pid 13139: thd-0x9824e00: driver: 
> > Building type FILE header of 32768-32768 bytes with name='coyote' 
> > disk='/usr/games' 
> > dumplevel=1 and blocksize=0
> > Fri Nov 02 03:42:06.092807051 2018: pid 13139: thd-0x9824e00: driver: 
> > driver: send-cmd time 2461.017 to taper0: FILE-WRITE worker0-0 
> > 00-00130 /usr/dumps/20181102030105/coyote._usr_games.1 coyote /usr/games 1 
> > 20181102030105 "" "" "" 1 "" "" "" "" 10
> > 
> 
> Ah, interesting, looks like this applies to level 1 dumps as well.  What
> does "amadmin CONFIG info coyote /usr/games shop /usr/lib/amanda" output?



Okay, I believe I have tracked down this "info on small DLEs not saved"
bug...


Summary:

The short-ish summary is that in Amanda 3.4 and 3.5, any dump which ends
but being shorter than 1024 bytes long (after compression) is treated as
not having happened at all, as far as Amanda's recording of "info"
statistics goes.

This is most significant for full dumps (i.e. of very small partitions),
because it causes Amanda to think that DLE has never been dumped (or,
that the last time it was dumped was on the final run made using Amanda
3.3 or older), and thus it schedules that DLE for a full dump on every
run.

However, the bug actually also affects incremental dumps (as we
discussed in the above-quoted message) -- so DLEs that don't change much
on a particular day and thus end up with very tiny incrementals end up
recording those dumps as having taken place on 1969/12/31 rather than
the day they actually occurred.


Neither of the above situations is "fatal" as far as preventing Amanda
from actually backup up your data, but for people (such as Gene) who are
effected, a workaround workaround is simply to make sure that the dump
on a particular day is at least 1024 bytes.

For full dumps, you can do this just by creating a "dummy" file on the
otherwise-very-empty partition in question, using data that's already
compressed so that the dump file is still big enough after Amanda
compresses it.  (In my tests, I just used the bytes off the front of the
amanda_3.4.2.orig.tar.gz file I happened to have sitting around.)

(For incrementals the trick is to make sure there is enough changing on
the partition that the incremental dump is over the cutoff size; the
best way to do that will depend on what data is on that partition, etc.)



Internal details and history: 

The bug happens because the messages that the chunker and driver
processes use to communicate with each other specify the size of the
files transferred in integral units of "kb" (=1024 bytes), and thus the
size given for very small files is "0" -- but the code in the driver
that handles updating the info database has a specific check for
zero-length dumps and refuses to update the database in that case. 
(This check is found in driverio.c:update_info_dumper() .)

It appears that the bug has existed since Amanda 3.4, when the old
chunker.c version of the chunker program was replaced with a new Perl
version.  

Both server-src/chunker.c as found in Amanda 3.3 and server-src/dumper.c
as it exists in 3.5 take special care to round "kb" value passed for
files which are short-but-not-empty files up to "1" -- but that logic
was not implemented in the Perl chunker code when it was created..

(Interestingly, if I am reading the old chunker.c code correctly, it
used to round up not just very-small-but-not-empty files to 1 kb, but
actually rounded the kb figure up to count any trailing partial
kilobytes at the end of the file... while the new program seems to just
ignore those trailing partial kilobytes.  Presumably this difference
simply doesn't matter -- except for the when the size is rounded down to
zero.)


Proposed patch:

This is where having an actual Amanda developer would be very handy...
but given that planner.c and old-chunker.c both have special handling
for small-but-not-empty files, I figured that adding a similar check to
the new Chunker implementation is probably the best fix for this
situation, and that hopefully doing so would be safe from unwanted
side-effects

So, I edited the perl/Amanda/Chunker/Controller.pm to implement such a
check, as shown in the attached patch.  I've been running with this
patch in place for a couple days now, and so far it seems to have
resolved the issue for me


Nathan




Nathan Stratton Treadway  -  

Re: Breaking DLEs up

2018-11-09 Thread Chris Hoogendyk

Too many comments on this thread to read them all (too many other things that 
need attention).

However, I just wanted to note that I had been having trouble with include and exclude to break up 
DLEs some time ago. When I posted asking for help, JLM replied back and told me that I really needed 
to use Application amgtar for my backups. If I were just using the program gnutar, then to implement 
my general expressions, the amanda user needed read access all the way down to the level in the 
subdirectories where the expressions were being applied. Otherwise, it would not be able to do it, 
and would come up with an empty list. I switched all my backups to Application amgtar, and have had 
no problems since.


I should note that some of my LVMs are broken up into dozens of DLEs, even though I'm using LTO7. 
Back when I was using AIT5, I tried to keep DLEs less than 100GB. Then with LTO6, I loosened it up 
to 300GB, with some larger. Now, with LTO7, I let them approach 1T sometimes, but use hardware 
compression so that I'm not eating all kinds of CPU with gzip processes. This has worked really well.


Researcher's data is often restricted access, and may be divided up into subdirectories by lab 
personnel. Many of those take up many TeraBytes. So, I end up having to go down into their 
subdirectories sorting out how to divide them into DLEs. I have about 100TB of storage on each of 
two different Departments' servers. We're currently building arrays with 10TB HGST Helium filled 
drives, Supermicro servers with Ubuntu 14.04 and 16.04, mdadm and LVM. I've found that I can build a 
RAID5 with 2 drives and then grow it by adding drives as needed. When I reach 5 drives, I request 
that the next addition be 2 drives so that I can convert it to RAID6 and grow it by 1 drive. All of 
that can be done live, although it can get scary sometimes.


My approach to DLEs is as follows. The name after the "/./" in the first line of each DLE is an 
arbitrary name for the DLE.


localhost    /data/professorA/./catchall    /data/professorA {
                gnutar-lto7-local
                exclude append "./directory1"
                exclude append "./directory2"
                # etc.
                } -1
localhost    /data/professorA/./directory1-catchall /data/professorA    {
                gnutar-lto7-local
                include "./directory1"
                exclude append "./directory1/EMR/CDH[1-9]*"
                # etc.
                } -1
localhost    /data/professorA/./directory1a    /data/professorA {
                gnutar-lto7-local
                include "./directory1/EMR/CDH[1-4]*"
                } -1
localhost    /data/professorA/./directory1b /data/professorA    {
                gnutar-lto7-local
                include "./directory1/EMR/CDH[5-9]*"
                } -1
localhost /data/professorA/./directory2-catchall    /data/professorA {

            # and so on, in a hierarchical top down structure.


The above example was constructed for this email message (so as to anonymize) based on real examples 
in my disklist. As I said earlier, some of my actual examples end up breaking up an LVM (which would 
correspond to, e.g., /data/professorA) into dozens of DLEs.



On 11/8/18 5:30 PM, Nathan Stratton Treadway wrote:

On Thu, Nov 08, 2018 at 15:21:00 -0500, Chris Nighswonger wrote:

On Thu, Nov 8, 2018 at 1:56 PM Cuttler, Brian R (HEALTH) <
brian.cutt...@health.ny.gov> wrote:



Your syntax



fileserver "/netdrives/CAMPUS/af" "/netdrives/CAMPUS" {
   comp-tar
   include "./[a-f]*"
   estimate server
}


[...]

Well, this fixes my problem, though why I do not know.

fileserver CAMPUS_a-f /netdrives/CAMPUS {
   comp-tar
   exclude file "./[g-z]*"
   estimate server
} 1

It seems a bit of work compared to the include directive. I tried "include
file" to no avail.

I haven't quite followed this whole thread, either, but have you taken a
look at
   http://wiki.zmanda.com/index.php/How_To:Split_DLEs_With_Exclude_Lists
? It may help explain some of the nuances in how things work.

As Stefan mentioned, it would be helpeful to know what amanda version
you are using, and which dump program.  If your client supports it,
using APPLICATION amgtar (rather than GNUTAR) is definitely a good idea.

But, especially if you are using GNUTAR, an important thing to keep in
mind is that the exclude list is processed by passing the whole list
using an --exclude option to GNU tar, which then processes that list as
it does the dump -- while in contrast the "include" options are
processed ahead of time by Amanda to build a list of directories to pass
on the command line (thus making up tar's list of "what should I be
backing up"?).  The tricky thing is that the tar process runs with root
priviledge, so it can see any directory out there... but the "include"
work is done by an unprivileged process and so the directory *above* the
include pattern must be readable by the amanda user...

So, what are the permissions on the 

Re: Breaking DLEs up

2018-11-09 Thread Chris Nighswonger
On Thu, Nov 8, 2018 at 10:57 PM Nathan Stratton Treadway 
wrote:

> On Thu, Nov 08, 2018 at 21:24:04 -0500, Chris Nighswonger wrote:
> > Dump program is GNUTAR. I can switch to amgtar.
> >
>
> Amgtar is more flexible to use (and easier to maintain, should
> development on Amanda ever resume), so while you're rolling this out
> it's probably worth switching.  (And it should let you work around this
> permission problem, too.)
>

This makes the most sense as it will hopefully require less maintenance.

Thanks to everyone for the help. It has been quite educational in many ways.

Kind regards,
Chris


Re: Breaking DLEs up

2018-11-09 Thread Olivier
Jon LaBadie  writes:

> On Thu, Nov 08, 2018 at 09:06:47PM +, Debra S Baddorf wrote:
>> Ah, good!   What does “file”  do in your include line?
>> Deb
>> 
> Include (and exclude) can take a first argument of "file" or "list".
> With "file" the following string is a "glob" expression.
> "file" is the default but I like to specify it anyway.
>
> With "include list" the string that follows is the name of a file
> containing "globs", one per line.
>
> You can have multiple "include file" globs if all but the first
> are "include file append".  So
>
>   include file "./[a-gA-G]*
>
> could also be specified as
>
>   include file "./[a-g]*
>   include file append "./[A-G]*
>
> Jon

A long, long time ago, I wrote a script to test the exclde files. I
believe it could be adapted to test the include files too.

I noticed the script was not online anymore, so I put it back:
http://www.cs.ait.ac.th/~on/testgtar

Best regards,

Olivier


>> 
>> 
>> > On Nov 8, 2018, at 2:54 PM, Jon LaBadie  wrote:
>> > 
>> > On Thu, Nov 08, 2018 at 08:43:49PM +, Debra S Baddorf wrote:
>> >> Yeah, I do use includes,  but I only do a single letter at a time
>> >>   include "./a*”
>> >> 
>> >> Perhaps the problem is with the syntax of doing more than one letter.
>> >> I only do   [a-f]   on my excludes.   Weird!
>> >> 
>> >> Deb Baddorf
>> > 
>> > I have a working entry that matches the OP.
>> > 
>> >include file "./201[7-9]*"
>> > 
>> > Jon
>> >> 

-- 



Re: Breaking DLEs up

2018-11-09 Thread Jon LaBadie
On Thu, Nov 08, 2018 at 09:06:47PM +, Debra S Baddorf wrote:
> Ah, good!   What does “file”  do in your include line?
> Deb
> 
Include (and exclude) can take a first argument of "file" or "list".
With "file" the following string is a "glob" expression.
"file" is the default but I like to specify it anyway.

With "include list" the string that follows is the name of a file
containing "globs", one per line.

You can have multiple "include file" globs if all but the first
are "include file append".  So

  include file "./[a-gA-G]*

could also be specified as

  include file "./[a-g]*
  include file append "./[A-G]*

Jon
> 
> 
> > On Nov 8, 2018, at 2:54 PM, Jon LaBadie  wrote:
> > 
> > On Thu, Nov 08, 2018 at 08:43:49PM +, Debra S Baddorf wrote:
> >> Yeah, I do use includes,  but I only do a single letter at a time
> >>   include "./a*”
> >> 
> >> Perhaps the problem is with the syntax of doing more than one letter.
> >> I only do   [a-f]   on my excludes.   Weird!
> >> 
> >> Deb Baddorf
> > 
> > I have a working entry that matches the OP.
> > 
> >include file "./201[7-9]*"
> > 
> > Jon
> >> 

-- 
Jon H. LaBadie j...@jgcomp.com
 11226 South Shore Rd.  (703) 787-0688 (H)
 Reston, VA  20190  (703) 935-6720 (C)