Am 05.08.2010 21:56, schrieb Henry Yen: > On Thu, Aug 05, 2010 at 17:17:39PM +0200, Christian Gaul wrote: > >> Am 05.08.2010 16:57, schrieb Henry Yen: >> > First, I welcome this discussion, however arcane (as long as the > List permits it, of course) -- I am happy to discover if I'm wrong > in my thinking. That said, I'm not (yet) convinced. >
Discussion is always welcome.. but i'll try to get back on track at the end of my mail. > This part in particular I stand by, as a response to the notion of > using *either* /dev/random *or* /dev/urandom: > > Even when catting to /dev/dsp i use /dev/urandom.. Blocking on /dev/random happens much too quickly.. and when do you really need that much randomness. >>> Again, on Linux, you generally can't use /dev/random at all -- it will >>> block after reading just a few dozen bytes. /dev/urandom won't block, >>> but your suggestion of creating a large file from it is very sensible. >>> > For this part, however, I don't agree with your assertion, for two reasons: > > >>> /dev/urandom seems to measure about 3MB/sec or thereabouts, so creating >>> a large "uncompressible" file could be done sort of like: >>> >>> dd if=/dev/urandom of=tempchunk count=1048576 >>> cat tempchunk tempchunk tempchunk tempchunk tempchunk tempchunk > bigfile >>> >>> >> cat-ting random data a couple of times to make one big random file wont >> really work, unless the size of the chunks is way bigger than the >> "library" size of the compression algorithm. >> > Reason 1: the example I gave yields a file size for "tempchunk" of 512MB, > not 1MB, as given in your counter-example. I agree that (at least now-a-days) > catting 1MB chunks into a 6MB chunk is likely (although not assured) > to lead to greatly reduced size during later compression, but I disagree > that catting 512MB chunks into a 3GB chunk is likely to be compressible > by any general-purpose compressor. > Which is what i meant with "way bigger than the library size of the algorithm". Mostly my "Information" was pitfalls to look out for when testing the speed of your equipment, if you went ahead and cat-ted 3000 x 1MB, i believe the hardware compression would make something highly compressed out of it. My guess is it would work for most chunks around half as large as the buffer size of the drive (totally guessing). > >> Also, afaik tar >> has an "optimization" when outputting to /dev/null, better output to >> /dev/zero instead if using tar to check possible speeds. >> > (Yes, although there is considerable disagreement over this (mis)feature; > my take is that the consensus is "yes, probably bad, definitely > under-documented (the behavior does match the "info" document), but > too late to change now".) > Pretty much. But new users starting with Bacula might not know this. If the user followed the advice of a previous post of "tar -cf /dev/null <path/to/directory>" he would most likely be surprised. > > Reason 2: Although the compression of data on a general-purpose machine > will certainly get faster and more capable of detecting duplication inside > larger and larger chunks, I daresay that this ability with regards to hardware > compression is unlikely to increase dramatically. For instance, the lzma > of that 3GB file as shown above ran for about 30 minutes. By contrast, > with a 27MB/sec write physical write speed, that same 3GB would only take > about 2 minutes to actually write. Even at a 6:1 compression ratio > (necessarily limited to that exact number because of this example), it would > still take more than twice a long just to analyze the data to yield > that compression than to write the uncompressed stream. Put another way, > I don't see tape drives (currently in the several-hundred-gigabyte range) > increasing their compression buffer sizes or CPU capabilities to analyze > more than a couple dozens of megabytes, at most, anytime in the near future. > That is why I think that generating a "test" file for write-throughput > testing that's a few GB's or so in length, made up from chunks that are > a few hundreds of MB's (larger chunks take more and more time to create), > is quite sufficient. > > Your analysis of current hardware compression (and for large enough chunk sizes software compression also) is most likely correct, i was merely pointing out the "obvious" problems that can lead a new user to misinterpret his testing. If a novice user tried creating a "random" file from /dev/random he would most likely not wait multiple days to create 500MB chunks, and creating a 1MB random chunk from /dev/random would lead to drastically wrong speed estimate of the drive. Getting back on track. The actual problem of the original post was the user expecting around 50-100MB/s throughput from his drives. I believe the statistics of the job show the start and end times and the total size written, the "Rate" is calculated by "SD Bytes Written / (End time - start time)". There are possibly a lot of things going on in that time frame which do not utilize the drive at all, spooling the data at the same speed the tape drive is capable of would, for example, pretty much double the running time of the job and would then show only half the speed in the statistics. Also, inserting the attributes into the catalog can take quite a while which will also up the run time of the job, thereby "decreasing" the speed of the drive. If the user wants to test the maximum speed he can expect, a "tar -cf /dev/zero /his/netapp/mountpoint" would allow him to guesstimate. I personally use spooling as much as i can, if the user did likewise and the spooling speed from the netapp were ~80MB/s and he was using LTO3 drives, he could expect maximum speeds in the job statistics around 35-40MB/s if all other components can keep up. Since he reports 22mb (i am hoping he means 22MB/s), and since he thinks it is Baculas fault i guess he means the Rate in the job statistics. I believe with spooling, maybe software compression and MD5/SHA checksumms that speed is a little low, but not terrible. Depending on the FD doing the checksumms and compression, he might simply be CPU bound. Christian -- Christian Gaul otop AG D-55116 Mainz Rheinstraße 105-107 Fon: 06131.5763.330 Fax: 06131.5763.500 E-Mail: christian.g...@otop.de Internet: www.otop.de Vorsitzender des Aufsichtsrats: Christof Glasmacher Vorstand: Dirk Flug Registergericht: Amtsgericht Mainz Handelsregister: HRB 7647 Bundesweit Fachberater in allen Hilfsmittelfachbereichen gesucht! Bewerbung und Infos unter www.otop.de Hinweis zum Datenschutz: Diese E-Mail ist ausschließlich für den in der Adresse genannten Empfänger bestimmt, da sie vertrauliche firmeninterne Informationen enthalten kann. Soweit eine Weitergabe oder Verteilung nicht ausschließlich zu internen Zwecken des Empfängers geschieht, ist jede unzulässige Veröffentlichung, Verwendung, Verbreitung, Weiterleitung und das Kopieren dieser E-Mail und ihrer verknüpften Anhänge streng untersagt. Falls Sie nicht der beabsichtigte Empfänger dieser Nachricht sind, löschen Sie diese E-Mail bitte und informieren unverzüglich den Absender. ------------------------------------------------------------------------------ This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users