Re: tape filling algorithm? (flush, or dump)

2002-11-21 Thread Deb Baddorf


>>>On Wed, Nov 20, 2002 at 01:47:50PM -0600, Deb Baddorf wrote:
 What kind of optimiaztion does amanda use while flushing
 dumps to tape?

>>>There is no optimazation, first in, first out.
>>>That's a need feature.
>>>
>>>Jean-Louis



Here's a basic algorithm  (in pseudo-code).
Is this useful to somebody who is already working on this code?
Or should I attempt to fit it into the actual   taper  code myself?

Deb Baddorf

=
This *should*  handle both the FLUSH case  (all data is available from
the start)   and the on-going AMDUMP case  (data is added as we go).
I may notice more errors but I think I got them all.

To "grok"  the basic algorithm  (ridiculously simple)   ignore
or white out the  "SPIN"  paragraphs which wait for more data to
arrive.
===

stuff = collection of items to be put into knapsack.
   Dumpfiles in this case (includes tarfiles too)
Stuff is probably a doubly(?) linked list.  Operations required:
status = insert(item)
status = remove(ptr)  or maybe remove(item)
ptr = get_largest()
#boolean = is_empty()
boolean = has_data() #opposite sense to   is_empty()
boolean = more_coming()  #somebody elsewhere needs to tell us when no more 
dumps are coming down the pipe

item
number = size(item)
ptr = get_next_smaller()# return item next smaller than self
 (maybe should be a function of the COLLECTION instead of the item, 
for project OO code)
next_ptr #if you use a doubly linked list
prev_ptr #if you use a doubly linked list
string = get_name(item)   ??  for the index,  or some such.  Maybe for 
error message when dump won't fit.

knapsack#(or,  tape!)
number = size_remaining()
status = add(item) # responsible to DELETE from holding disk after 
successful  & from COLLECTION above
make_full()# decree it "full"  when no available items can fit 
in the remaining space
boolean = has_space()# is not yet full


bunch-o-knapsacks   #(or,  how many tapes are available?)
ptr = gimme_one()
boolean = is_more()
==

PACK_ALL_STUFF   #i.e.  start taping
{
mystuff = new stuff()   # filled somewhere else, or here, but before 
algorithm below
allsacks = new bunch-o-knapsacks()  #number comes from   RUNTAPES parameter

yada yada yada
maybe twiddle thumbs till significant data arrives, if holding disk allows


while ( allsacks.is_more()   # stop when we run out of sacks (tapes)
or  (not mystuff.has_data() # or when we run out of data
 and  not mystuff.more_coming() ) )# and know that no more 
is coming
 {
   this_sack = allsacks.gimme_one()

   while ( not mystuff.has_data()  )   #wait for some data to arrive
 {spin
  check "more coming" in case a client dies
 }


   while ( mystuff.has_data()and #there is still data waiting to 
be taped
   this_sack.has_space() ) #the sack (tape) has room left
   {
   #remember, in "timed" version, a bigger one may be added during 
any pass
   #so, we start with biggest each time and add the first that 
will fit
  fill_sack (this_sack, mystuff.get_largest())

  while ( not mystuff.has_data()  )   #wait for some more data to arrive
 {spin
  check "more coming" in case a client dies
 }
   }
 }
}
==
fill_sack(sackptr, item_ptr)
{
   if (item_ptr is NULL) #there are no more smaller items
{
sackptr.make_full()  #decree sack (tape) to be full, so we can 
move on to another
return
}

   isize=item_ptr.size();  ssize=sackptr.size_remaining();   #store for 
possible error msg
   #or assign to variable IN the if statement,  but too messy to 
do in psuedo-code here!

   if ( item_ptr.size() <= sackptr.size_remaining() )#if item fits, 
according to sack specs,
 {  #  put it in the sack
status = sackptr.add(item_ptr)# add MUST delete from holding 
disk,  if it finishes successfully
 # add MUST also adjust 
sack's  size_remaining value
  # it writes the INDEX, and does 
other bookkeeping too
  # Also removes ITEM from "stuff" 
list of available dumps to flush

if ( not status)  #  failed -- tape size must've been 
slightly wrong.  That's life.
 {
sackptr.make_full() #  we've already written to EOT so 
can't try another size item
write message "Sack(tape) wasn't as big as " isize + ssize.
write message "May want to correct sack specs  (tapesize)"
 }

 }
   else #find next smaller item and try that one
 {
fill_sack (sackptr, item_ptr.get_next_smaller())
 }

}





Re: tape filling algorithm? (flush, or dump)

2002-11-21 Thread Gene Heskett
On Thursday 21 November 2002 00:47, Gene Heskett wrote:
>On Wednesday 20 November 2002 22:41, Gene Heskett wrote:
>
>Yeah, I know, I'm talking to myself :-)  In some circles, thats
>considered to be 'poor form'.  But, see below.

Continued, with another "ps:"

>>On Wednesday 20 November 2002 17:36, Jean-Louis Martineau wrote:
>>>On Wed, Nov 20, 2002 at 01:47:50PM -0600, Deb Baddorf wrote:
 What kind of optimiaztion does amanda use while flushing
 dumps to tape?

 While doing the dump initially,  she seems to
 optimize to get the small jobs done first, and thus they
 go onto tape in a similar order.   But I'm currently without
 a stacker.  So only the first part of the dumps make it
 to tape "live"  (so to speak)  and the rest get flushed as I
 manually mount tapes.   I'm hoping that some kind of
 knapsack-packing algorithm is used . to fit the largest
 files on the tape first,   but then to add smaller ones to
 fill the top of the "sack".
  My archival "do all the level 0's" is taking up 5 tapes,
 hence the optimization question!
>>>
>>>There is no optimazation, first in, first out.
>>>That's a need feature.
>>>
>>>Jean-Louis
>>
>>Humm, from a recent example/amanda.conf:
>>-
>>dumporder "sssS"# specify the priority order of each
>> dumper #   s -> smallest size
>>#   S -> biggest size
>>#   t -> smallest time
>>#   T -> biggest time
>>#   b -> smallest bandwitdh
>>#   B -> biggest bandwitdh
>># try "BTBTBTBTBTBT" if you are not
>> holding # disk constrained
>>
>>I'm assuming that when Deb asked about it, using the word
>>'flushing', that Deb meant a normal amdump run as opposed to a
>> run of amflush, or an autoflush.  I've no idea whether its
>> active for a flush, or just an amdump run.  I have no idea if it
>> even works, but I'll find out tonight if I can stay awake till
>> then.  I just modified mine to do the biggest ones first, and
>> amstatus should be able to see if its working that way pretty
>> shortly after the estimates are done.
>
>Yes, its working according to an 'SSSsss' setting from the looks
> of it, it started with /usr/src, nearly the last entry in the
> disklist which is nearly 2.7 gigs ATM.  It also tells me its
> going to skip 2 entries tonight as they are too big, 1.6gb of
> music and 90 megs of /usr/X11R6 will get delayed till tomorrow
> night.  Obviously I'm just now getting amanda restarted after a
> disk crash.  I suppose it will take a week to get back to
> 'normal', whatever that is...

Ok, it appears that with the compression being done in software, it 
smunched that 2.6 gigs worth of /usr/src down to 1.6 gigs.  By the 
time it was done, it had written only 2.6 gigs in real data down 
the cable to the tape.  There was I believe, room enough to fit the 
original, uncompressed 1.6 gigs worth of music, and certainly room 
enough for a compressed rendition of the 90 megs worth of 
/usr/X11R6.

Since at the estmate phase, it hasn't compressed anything, is it 
permissable to expand the value tapetype returns in order to 
prevent this erronious FAILED message?  And if so, do we believe 
the vendors propaganda and arbitrarily double it?   Or does amanda 
do something similar internally?  Here is why I ask, from the 
email:
--
STATISTICS:
  Total   Full  Daily
      
Estimate Time (hrs:min)0:23
Run Time (hrs:min) 3:03
Dump Time (hrs:min)1:09   0:54   0:15
Output Size (meg)2688.6 2499.4  189.2
Original Size (meg)  7214.6 6792.5  422.2
Avg Compressed Size (%)33.2   33.3   32.1   
(level:#disks ...)
Filesystems Dumped   29  6 23   (1:23)
Avg Dump Rate (k/s)   660.9  788.9  210.2

Tape Time (hrs:min)1:58   1:49   0:08
Tape Size (meg)  2690.1 2499.7  190.3
Tape Used (%)  70.8   65.85.0   
(level:#disks ...)
Filesystems Taped29  6 23   (1:23)
Avg Tp Write Rate (k/s)   390.5  390.3  391.8

Note that the reported Original Size is well above the roughly 3.8 
gigs set in the DDS2 tapetype, and yet amdump knew it would fit 
once compressed.  I'm beginning to feel a bit like pooh, the bear 
of very little brain here...

-- 
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz  512M
99.19% setiathome rank, not too shabby for a WV hillbilly



Re: tape filling algorithm? (flush, or dump)

2002-11-20 Thread Gene Heskett
On Wednesday 20 November 2002 22:41, Gene Heskett wrote:

Yeah, I know, I'm talking to myself :-)  In some circles, thats 
considered to be 'poor form'.  But, see below.

>On Wednesday 20 November 2002 17:36, Jean-Louis Martineau wrote:
>>On Wed, Nov 20, 2002 at 01:47:50PM -0600, Deb Baddorf wrote:
>>> What kind of optimiaztion does amanda use while flushing
>>> dumps to tape?
>>>
>>> While doing the dump initially,  she seems to
>>> optimize to get the small jobs done first, and thus they
>>> go onto tape in a similar order.   But I'm currently without
>>> a stacker.  So only the first part of the dumps make it
>>> to tape "live"  (so to speak)  and the rest get flushed as I
>>> manually mount tapes.   I'm hoping that some kind of
>>> knapsack-packing algorithm is used . to fit the largest
>>> files on the tape first,   but then to add smaller ones to
>>> fill the top of the "sack".
>>>  My archival "do all the level 0's" is taking up 5 tapes,
>>> hence the optimization question!
>>
>>There is no optimazation, first in, first out.
>>That's a need feature.
>>
>>Jean-Louis
>
>Humm, from a recent example/amanda.conf:
>-
>dumporder "sssS"# specify the priority order of each
> dumper #   s -> smallest size
>#   S -> biggest size
>#   t -> smallest time
>#   T -> biggest time
>#   b -> smallest bandwitdh
>#   B -> biggest bandwitdh
># try "BTBTBTBTBTBT" if you are not
> holding # disk constrained
>
>I'm assuming that when Deb asked about it, using the word
>'flushing', that Deb meant a normal amdump run as opposed to a run
>of amflush, or an autoflush.  I've no idea whether its active for
> a flush, or just an amdump run.  I have no idea if it even works,
> but I'll find out tonight if I can stay awake till then.  I just
> modified mine to do the biggest ones first, and amstatus should
> be able to see if its working that way pretty shortly after the
> estimates are done.

Yes, its working according to an 'SSSsss' setting from the looks of 
it, it started with /usr/src, nearly the last entry in the disklist 
which is nearly 2.7 gigs ATM.  It also tells me its going to skip 2 
entries tonight as they are too big, 1.6gb of music and 90 megs of 
/usr/X11R6 will get delayed till tomorrow night.  Obviously I'm 
just now getting amanda restarted after a disk crash.  I suppose it 
will take a week to get back to 'normal', whatever that is...

-- 
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz  512M
99.19% setiathome rank, not too shabby for a WV hillbilly



Re: tape filling algorithm? (flush, or dump)

2002-11-20 Thread Gene Heskett
On Wednesday 20 November 2002 17:36, Jean-Louis Martineau wrote:
>On Wed, Nov 20, 2002 at 01:47:50PM -0600, Deb Baddorf wrote:
>> What kind of optimiaztion does amanda use while flushing
>> dumps to tape?
>>
>> While doing the dump initially,  she seems to
>> optimize to get the small jobs done first, and thus they
>> go onto tape in a similar order.   But I'm currently without
>> a stacker.  So only the first part of the dumps make it
>> to tape "live"  (so to speak)  and the rest get flushed as I
>> manually mount tapes.   I'm hoping that some kind of
>> knapsack-packing algorithm is used . to fit the largest
>> files on the tape first,   but then to add smaller ones to
>> fill the top of the "sack".
>>  My archival "do all the level 0's" is taking up 5 tapes,
>> hence the optimization question!
>
>There is no optimazation, first in, first out.
>That's a need feature.
>
>Jean-Louis

Humm, from a recent example/amanda.conf:
-
dumporder "sssS"# specify the priority order of each dumper
#   s -> smallest size
#   S -> biggest size
#   t -> smallest time
#   T -> biggest time
#   b -> smallest bandwitdh
#   B -> biggest bandwitdh
# try "BTBTBTBTBTBT" if you are not holding
# disk constrained

I'm assuming that when Deb asked about it, using the word 
'flushing', that Deb meant a normal amdump run as opposed to a run 
of amflush, or an autoflush.  I've no idea whether its active for a 
flush, or just an amdump run.  I have no idea if it even works, but 
I'll find out tonight if I can stay awake till then.  I just 
modified mine to do the biggest ones first, and amstatus should be 
able to see if its working that way pretty shortly after the 
estimates are done.

-- 
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz  512M
99.19% setiathome rank, not too shabby for a WV hillbilly



Re: tape filling algorithm? (flush, or dump)

2002-11-20 Thread Deb Baddorf


> > What kind of optimiaztion does amanda use while flushing
> > dumps to tape?
> >
> > ... I'm hoping that some kind of
> > knapsack-packing algorithm is used . to fit the largest
> > files on the tape first,   but then to add smaller ones to
> > fill the top of the "sack".

Isn't the general knapsack problem one of these P==NP cases
like the traveling salesman problem? This might be challenging
even for the Amanda developers. I don't want the packing estimate
time to explode combinatorially, it takes long enough to back
up my terabytes over 100BaseT. Perhaps FNAL has enough
compute power to throw at it, but we don't. Not yet.


LOL!   Yeah,  but I could swear I remember some simple
"greedy" algorithm from class,  which produces a reasonable
knapsack filling  without spending too much time doing it.
I'll think about it .
Deb Baddorf
---
Deb Baddorf [EMAIL PROTECTED]  840-2289
"You can't help getting older, but you don't have to get old."
- George Burns  <






Re: tape filling algorithm? (flush, or dump)

2002-11-20 Thread C. Chan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Also Sprach Jean-Louis Martineau:

> On Wed, Nov 20, 2002 at 01:47:50PM -0600, Deb Baddorf wrote:
> > What kind of optimiaztion does amanda use while flushing
> > dumps to tape?
> >
> > ... I'm hoping that some kind of
> > knapsack-packing algorithm is used . to fit the largest
> > files on the tape first,   but then to add smaller ones to
> > fill the top of the "sack".
> >  My archival "do all the level 0's" is taking up 5 tapes,
> > hence the optimization question!
>
> There is no optimazation, first in, first out.
> That's a need feature.
>
> Jean-Louis

Isn't the general knapsack problem one of these P==NP cases
like the traveling salesman problem? This might be challenging
even for the Amanda developers. I don't want the packing estimate
time to explode combinatorially, it takes long enough to back
up my terabytes over 100BaseT. Perhaps FNAL has enough
compute power to throw at it, but we don't. Not yet.

- 
C. Chan <[EMAIL PROTECTED]>
GPG Public Key  ( pgp.mit.edu | finger [EMAIL PROTECTED] )
"Lif is laene: eal sceaceth, leoht ond lif samod"

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE93BwwP6a1eh6rrMQRAsRiAJwJk5CY1a0ug4FrFc4C6MjdXpZrVACgnbQB
oAWTXIUYeNkT82LL+HABlwM=
=H+pH
-END PGP SIGNATURE-




Re: tape filling algorithm? (flush, or dump)

2002-11-20 Thread Jean-Louis Martineau
On Wed, Nov 20, 2002 at 01:47:50PM -0600, Deb Baddorf wrote:
> What kind of optimiaztion does amanda use while flushing
> dumps to tape?
> 
> While doing the dump initially,  she seems to
> optimize to get the small jobs done first, and thus they
> go onto tape in a similar order.   But I'm currently without
> a stacker.  So only the first part of the dumps make it
> to tape "live"  (so to speak)  and the rest get flushed as I
> manually mount tapes.   I'm hoping that some kind of
> knapsack-packing algorithm is used . to fit the largest
> files on the tape first,   but then to add smaller ones to
> fill the top of the "sack".
>  My archival "do all the level 0's" is taking up 5 tapes,
> hence the optimization question!

There is no optimazation, first in, first out.
That's a need feature.

Jean-Louis
-- 
Jean-Louis Martineau email: [EMAIL PROTECTED] 
Departement IRO, Universite de Montreal
C.P. 6128, Succ. CENTRE-VILLETel: (514) 343-6111 ext. 3529
Montreal, Canada, H3C 3J7Fax: (514) 343-5834



Re: tape filling algorithm? (flush, or dump)

2002-11-20 Thread Galen Johnson
Deb Baddorf wrote:


What kind of optimiaztion does amanda use while flushing
dumps to tape?

While doing the dump initially,  she seems to
optimize to get the small jobs done first, and thus they
go onto tape in a similar order.   But I'm currently without
a stacker.  So only the first part of the dumps make it
to tape "live"  (so to speak)  and the rest get flushed as I
manually mount tapes.   I'm hoping that some kind of
knapsack-packing algorithm is used . to fit the largest
files on the tape first,   but then to add smaller ones to
fill the top of the "sack".
My archival "do all the level 0's" is taking up 5 tapes,
hence the optimization question!

Deb Baddorf
---
Deb Baddorf [EMAIL PROTECTED]  840-2289
"You can't help getting older, but you don't have to get old."
- George Burns  <




look in the amanda.conf file...there are several algorithms for you to 
use...it's fairly well documented there

=G=




tape filling algorithm? (flush, or dump)

2002-11-20 Thread Deb Baddorf
What kind of optimiaztion does amanda use while flushing
dumps to tape?

While doing the dump initially,  she seems to
optimize to get the small jobs done first, and thus they
go onto tape in a similar order.   But I'm currently without
a stacker.  So only the first part of the dumps make it
to tape "live"  (so to speak)  and the rest get flushed as I
manually mount tapes.   I'm hoping that some kind of
knapsack-packing algorithm is used . to fit the largest
files on the tape first,   but then to add smaller ones to
fill the top of the "sack".
My archival "do all the level 0's" is taking up 5 tapes,
hence the optimization question!

Deb Baddorf
---
Deb Baddorf [EMAIL PROTECTED]  840-2289
"You can't help getting older, but you don't have to get old."
- George Burns  <