Re: tape filling algorithm? (flush, or dump)
>>>On Wed, Nov 20, 2002 at 01:47:50PM -0600, Deb Baddorf wrote: What kind of optimiaztion does amanda use while flushing dumps to tape? >>>There is no optimazation, first in, first out. >>>That's a need feature. >>> >>>Jean-Louis Here's a basic algorithm (in pseudo-code). Is this useful to somebody who is already working on this code? Or should I attempt to fit it into the actual taper code myself? Deb Baddorf = This *should* handle both the FLUSH case (all data is available from the start) and the on-going AMDUMP case (data is added as we go). I may notice more errors but I think I got them all. To "grok" the basic algorithm (ridiculously simple) ignore or white out the "SPIN" paragraphs which wait for more data to arrive. === stuff = collection of items to be put into knapsack. Dumpfiles in this case (includes tarfiles too) Stuff is probably a doubly(?) linked list. Operations required: status = insert(item) status = remove(ptr) or maybe remove(item) ptr = get_largest() #boolean = is_empty() boolean = has_data() #opposite sense to is_empty() boolean = more_coming() #somebody elsewhere needs to tell us when no more dumps are coming down the pipe item number = size(item) ptr = get_next_smaller()# return item next smaller than self (maybe should be a function of the COLLECTION instead of the item, for project OO code) next_ptr #if you use a doubly linked list prev_ptr #if you use a doubly linked list string = get_name(item) ?? for the index, or some such. Maybe for error message when dump won't fit. knapsack#(or, tape!) number = size_remaining() status = add(item) # responsible to DELETE from holding disk after successful & from COLLECTION above make_full()# decree it "full" when no available items can fit in the remaining space boolean = has_space()# is not yet full bunch-o-knapsacks #(or, how many tapes are available?) ptr = gimme_one() boolean = is_more() == PACK_ALL_STUFF #i.e. start taping { mystuff = new stuff() # filled somewhere else, or here, but before algorithm below allsacks = new bunch-o-knapsacks() #number comes from RUNTAPES parameter yada yada yada maybe twiddle thumbs till significant data arrives, if holding disk allows while ( allsacks.is_more() # stop when we run out of sacks (tapes) or (not mystuff.has_data() # or when we run out of data and not mystuff.more_coming() ) )# and know that no more is coming { this_sack = allsacks.gimme_one() while ( not mystuff.has_data() ) #wait for some data to arrive {spin check "more coming" in case a client dies } while ( mystuff.has_data()and #there is still data waiting to be taped this_sack.has_space() ) #the sack (tape) has room left { #remember, in "timed" version, a bigger one may be added during any pass #so, we start with biggest each time and add the first that will fit fill_sack (this_sack, mystuff.get_largest()) while ( not mystuff.has_data() ) #wait for some more data to arrive {spin check "more coming" in case a client dies } } } } == fill_sack(sackptr, item_ptr) { if (item_ptr is NULL) #there are no more smaller items { sackptr.make_full() #decree sack (tape) to be full, so we can move on to another return } isize=item_ptr.size(); ssize=sackptr.size_remaining(); #store for possible error msg #or assign to variable IN the if statement, but too messy to do in psuedo-code here! if ( item_ptr.size() <= sackptr.size_remaining() )#if item fits, according to sack specs, { # put it in the sack status = sackptr.add(item_ptr)# add MUST delete from holding disk, if it finishes successfully # add MUST also adjust sack's size_remaining value # it writes the INDEX, and does other bookkeeping too # Also removes ITEM from "stuff" list of available dumps to flush if ( not status) # failed -- tape size must've been slightly wrong. That's life. { sackptr.make_full() # we've already written to EOT so can't try another size item write message "Sack(tape) wasn't as big as " isize + ssize. write message "May want to correct sack specs (tapesize)" } } else #find next smaller item and try that one { fill_sack (sackptr, item_ptr.get_next_smaller()) } }
Re: tape filling algorithm? (flush, or dump)
On Thursday 21 November 2002 00:47, Gene Heskett wrote: >On Wednesday 20 November 2002 22:41, Gene Heskett wrote: > >Yeah, I know, I'm talking to myself :-) In some circles, thats >considered to be 'poor form'. But, see below. Continued, with another "ps:" >>On Wednesday 20 November 2002 17:36, Jean-Louis Martineau wrote: >>>On Wed, Nov 20, 2002 at 01:47:50PM -0600, Deb Baddorf wrote: What kind of optimiaztion does amanda use while flushing dumps to tape? While doing the dump initially, she seems to optimize to get the small jobs done first, and thus they go onto tape in a similar order. But I'm currently without a stacker. So only the first part of the dumps make it to tape "live" (so to speak) and the rest get flushed as I manually mount tapes. I'm hoping that some kind of knapsack-packing algorithm is used . to fit the largest files on the tape first, but then to add smaller ones to fill the top of the "sack". My archival "do all the level 0's" is taking up 5 tapes, hence the optimization question! >>> >>>There is no optimazation, first in, first out. >>>That's a need feature. >>> >>>Jean-Louis >> >>Humm, from a recent example/amanda.conf: >>- >>dumporder "sssS"# specify the priority order of each >> dumper # s -> smallest size >># S -> biggest size >># t -> smallest time >># T -> biggest time >># b -> smallest bandwitdh >># B -> biggest bandwitdh >># try "BTBTBTBTBTBT" if you are not >> holding # disk constrained >> >>I'm assuming that when Deb asked about it, using the word >>'flushing', that Deb meant a normal amdump run as opposed to a >> run of amflush, or an autoflush. I've no idea whether its >> active for a flush, or just an amdump run. I have no idea if it >> even works, but I'll find out tonight if I can stay awake till >> then. I just modified mine to do the biggest ones first, and >> amstatus should be able to see if its working that way pretty >> shortly after the estimates are done. > >Yes, its working according to an 'SSSsss' setting from the looks > of it, it started with /usr/src, nearly the last entry in the > disklist which is nearly 2.7 gigs ATM. It also tells me its > going to skip 2 entries tonight as they are too big, 1.6gb of > music and 90 megs of /usr/X11R6 will get delayed till tomorrow > night. Obviously I'm just now getting amanda restarted after a > disk crash. I suppose it will take a week to get back to > 'normal', whatever that is... Ok, it appears that with the compression being done in software, it smunched that 2.6 gigs worth of /usr/src down to 1.6 gigs. By the time it was done, it had written only 2.6 gigs in real data down the cable to the tape. There was I believe, room enough to fit the original, uncompressed 1.6 gigs worth of music, and certainly room enough for a compressed rendition of the 90 megs worth of /usr/X11R6. Since at the estmate phase, it hasn't compressed anything, is it permissable to expand the value tapetype returns in order to prevent this erronious FAILED message? And if so, do we believe the vendors propaganda and arbitrarily double it? Or does amanda do something similar internally? Here is why I ask, from the email: -- STATISTICS: Total Full Daily Estimate Time (hrs:min)0:23 Run Time (hrs:min) 3:03 Dump Time (hrs:min)1:09 0:54 0:15 Output Size (meg)2688.6 2499.4 189.2 Original Size (meg) 7214.6 6792.5 422.2 Avg Compressed Size (%)33.2 33.3 32.1 (level:#disks ...) Filesystems Dumped 29 6 23 (1:23) Avg Dump Rate (k/s) 660.9 788.9 210.2 Tape Time (hrs:min)1:58 1:49 0:08 Tape Size (meg) 2690.1 2499.7 190.3 Tape Used (%) 70.8 65.85.0 (level:#disks ...) Filesystems Taped29 6 23 (1:23) Avg Tp Write Rate (k/s) 390.5 390.3 391.8 Note that the reported Original Size is well above the roughly 3.8 gigs set in the DDS2 tapetype, and yet amdump knew it would fit once compressed. I'm beginning to feel a bit like pooh, the bear of very little brain here... -- Cheers, Gene AMD K6-III@500mhz 320M Athlon1600XP@1400mhz 512M 99.19% setiathome rank, not too shabby for a WV hillbilly
Re: tape filling algorithm? (flush, or dump)
On Wednesday 20 November 2002 22:41, Gene Heskett wrote: Yeah, I know, I'm talking to myself :-) In some circles, thats considered to be 'poor form'. But, see below. >On Wednesday 20 November 2002 17:36, Jean-Louis Martineau wrote: >>On Wed, Nov 20, 2002 at 01:47:50PM -0600, Deb Baddorf wrote: >>> What kind of optimiaztion does amanda use while flushing >>> dumps to tape? >>> >>> While doing the dump initially, she seems to >>> optimize to get the small jobs done first, and thus they >>> go onto tape in a similar order. But I'm currently without >>> a stacker. So only the first part of the dumps make it >>> to tape "live" (so to speak) and the rest get flushed as I >>> manually mount tapes. I'm hoping that some kind of >>> knapsack-packing algorithm is used . to fit the largest >>> files on the tape first, but then to add smaller ones to >>> fill the top of the "sack". >>> My archival "do all the level 0's" is taking up 5 tapes, >>> hence the optimization question! >> >>There is no optimazation, first in, first out. >>That's a need feature. >> >>Jean-Louis > >Humm, from a recent example/amanda.conf: >- >dumporder "sssS"# specify the priority order of each > dumper # s -> smallest size ># S -> biggest size ># t -> smallest time ># T -> biggest time ># b -> smallest bandwitdh ># B -> biggest bandwitdh ># try "BTBTBTBTBTBT" if you are not > holding # disk constrained > >I'm assuming that when Deb asked about it, using the word >'flushing', that Deb meant a normal amdump run as opposed to a run >of amflush, or an autoflush. I've no idea whether its active for > a flush, or just an amdump run. I have no idea if it even works, > but I'll find out tonight if I can stay awake till then. I just > modified mine to do the biggest ones first, and amstatus should > be able to see if its working that way pretty shortly after the > estimates are done. Yes, its working according to an 'SSSsss' setting from the looks of it, it started with /usr/src, nearly the last entry in the disklist which is nearly 2.7 gigs ATM. It also tells me its going to skip 2 entries tonight as they are too big, 1.6gb of music and 90 megs of /usr/X11R6 will get delayed till tomorrow night. Obviously I'm just now getting amanda restarted after a disk crash. I suppose it will take a week to get back to 'normal', whatever that is... -- Cheers, Gene AMD K6-III@500mhz 320M Athlon1600XP@1400mhz 512M 99.19% setiathome rank, not too shabby for a WV hillbilly
Re: tape filling algorithm? (flush, or dump)
On Wednesday 20 November 2002 17:36, Jean-Louis Martineau wrote: >On Wed, Nov 20, 2002 at 01:47:50PM -0600, Deb Baddorf wrote: >> What kind of optimiaztion does amanda use while flushing >> dumps to tape? >> >> While doing the dump initially, she seems to >> optimize to get the small jobs done first, and thus they >> go onto tape in a similar order. But I'm currently without >> a stacker. So only the first part of the dumps make it >> to tape "live" (so to speak) and the rest get flushed as I >> manually mount tapes. I'm hoping that some kind of >> knapsack-packing algorithm is used . to fit the largest >> files on the tape first, but then to add smaller ones to >> fill the top of the "sack". >> My archival "do all the level 0's" is taking up 5 tapes, >> hence the optimization question! > >There is no optimazation, first in, first out. >That's a need feature. > >Jean-Louis Humm, from a recent example/amanda.conf: - dumporder "sssS"# specify the priority order of each dumper # s -> smallest size # S -> biggest size # t -> smallest time # T -> biggest time # b -> smallest bandwitdh # B -> biggest bandwitdh # try "BTBTBTBTBTBT" if you are not holding # disk constrained I'm assuming that when Deb asked about it, using the word 'flushing', that Deb meant a normal amdump run as opposed to a run of amflush, or an autoflush. I've no idea whether its active for a flush, or just an amdump run. I have no idea if it even works, but I'll find out tonight if I can stay awake till then. I just modified mine to do the biggest ones first, and amstatus should be able to see if its working that way pretty shortly after the estimates are done. -- Cheers, Gene AMD K6-III@500mhz 320M Athlon1600XP@1400mhz 512M 99.19% setiathome rank, not too shabby for a WV hillbilly
Re: tape filling algorithm? (flush, or dump)
> > What kind of optimiaztion does amanda use while flushing > > dumps to tape? > > > > ... I'm hoping that some kind of > > knapsack-packing algorithm is used . to fit the largest > > files on the tape first, but then to add smaller ones to > > fill the top of the "sack". Isn't the general knapsack problem one of these P==NP cases like the traveling salesman problem? This might be challenging even for the Amanda developers. I don't want the packing estimate time to explode combinatorially, it takes long enough to back up my terabytes over 100BaseT. Perhaps FNAL has enough compute power to throw at it, but we don't. Not yet. LOL! Yeah, but I could swear I remember some simple "greedy" algorithm from class, which produces a reasonable knapsack filling without spending too much time doing it. I'll think about it . Deb Baddorf --- Deb Baddorf [EMAIL PROTECTED] 840-2289 "You can't help getting older, but you don't have to get old." - George Burns <
Re: tape filling algorithm? (flush, or dump)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Also Sprach Jean-Louis Martineau: > On Wed, Nov 20, 2002 at 01:47:50PM -0600, Deb Baddorf wrote: > > What kind of optimiaztion does amanda use while flushing > > dumps to tape? > > > > ... I'm hoping that some kind of > > knapsack-packing algorithm is used . to fit the largest > > files on the tape first, but then to add smaller ones to > > fill the top of the "sack". > > My archival "do all the level 0's" is taking up 5 tapes, > > hence the optimization question! > > There is no optimazation, first in, first out. > That's a need feature. > > Jean-Louis Isn't the general knapsack problem one of these P==NP cases like the traveling salesman problem? This might be challenging even for the Amanda developers. I don't want the packing estimate time to explode combinatorially, it takes long enough to back up my terabytes over 100BaseT. Perhaps FNAL has enough compute power to throw at it, but we don't. Not yet. - C. Chan <[EMAIL PROTECTED]> GPG Public Key ( pgp.mit.edu | finger [EMAIL PROTECTED] ) "Lif is laene: eal sceaceth, leoht ond lif samod" -BEGIN PGP SIGNATURE- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE93BwwP6a1eh6rrMQRAsRiAJwJk5CY1a0ug4FrFc4C6MjdXpZrVACgnbQB oAWTXIUYeNkT82LL+HABlwM= =H+pH -END PGP SIGNATURE-
Re: tape filling algorithm? (flush, or dump)
On Wed, Nov 20, 2002 at 01:47:50PM -0600, Deb Baddorf wrote: > What kind of optimiaztion does amanda use while flushing > dumps to tape? > > While doing the dump initially, she seems to > optimize to get the small jobs done first, and thus they > go onto tape in a similar order. But I'm currently without > a stacker. So only the first part of the dumps make it > to tape "live" (so to speak) and the rest get flushed as I > manually mount tapes. I'm hoping that some kind of > knapsack-packing algorithm is used . to fit the largest > files on the tape first, but then to add smaller ones to > fill the top of the "sack". > My archival "do all the level 0's" is taking up 5 tapes, > hence the optimization question! There is no optimazation, first in, first out. That's a need feature. Jean-Louis -- Jean-Louis Martineau email: [EMAIL PROTECTED] Departement IRO, Universite de Montreal C.P. 6128, Succ. CENTRE-VILLETel: (514) 343-6111 ext. 3529 Montreal, Canada, H3C 3J7Fax: (514) 343-5834
Re: tape filling algorithm? (flush, or dump)
Deb Baddorf wrote: What kind of optimiaztion does amanda use while flushing dumps to tape? While doing the dump initially, she seems to optimize to get the small jobs done first, and thus they go onto tape in a similar order. But I'm currently without a stacker. So only the first part of the dumps make it to tape "live" (so to speak) and the rest get flushed as I manually mount tapes. I'm hoping that some kind of knapsack-packing algorithm is used . to fit the largest files on the tape first, but then to add smaller ones to fill the top of the "sack". My archival "do all the level 0's" is taking up 5 tapes, hence the optimization question! Deb Baddorf --- Deb Baddorf [EMAIL PROTECTED] 840-2289 "You can't help getting older, but you don't have to get old." - George Burns < look in the amanda.conf file...there are several algorithms for you to use...it's fairly well documented there =G=
tape filling algorithm? (flush, or dump)
What kind of optimiaztion does amanda use while flushing dumps to tape? While doing the dump initially, she seems to optimize to get the small jobs done first, and thus they go onto tape in a similar order. But I'm currently without a stacker. So only the first part of the dumps make it to tape "live" (so to speak) and the rest get flushed as I manually mount tapes. I'm hoping that some kind of knapsack-packing algorithm is used . to fit the largest files on the tape first, but then to add smaller ones to fill the top of the "sack". My archival "do all the level 0's" is taking up 5 tapes, hence the optimization question! Deb Baddorf --- Deb Baddorf [EMAIL PROTECTED] 840-2289 "You can't help getting older, but you don't have to get old." - George Burns <