Well done on taking the initiative. When you do this you could also help those who are interested in reviewing the code by basing your work on the upstream project before committing your changes. This way it is clearer what has changed and a link to the diff can be shared.
Cheers, Alex On 8 July 2011 22:36, Don Bindner <[email protected]> wrote: > Oh, and you shouldn't use 'std' for stdin and out. You should use '-'. > That's what many programs do (including gzip for example); hyphen will be > more familiar to experinced users since it's already an established > interface rule. > Don > > On Fri, Jul 8, 2011 at 4:31 PM, Don Bindner <[email protected]> wrote: >> >> Did you remember to run your tests repeatedly in different orders to >> minimize the effects that cacheing might have on your results? >> Don >> >> On Fri, Jul 8, 2011 at 4:19 PM, Huan Truong <[email protected]> wrote: >>> >>> I've heard a complain from one guy in another mailing list about gzip >>> recently. He was trying to backup tens-of-GB data every day and >>> tar-gzipping (tar czvf) is so unacceptably slow. >>> >>> I once faced the same problem when I needed to create hard drive >>> snapshot for computers and obviously I wanted to save bandwidth so that >>> I wouldn't have to transfer a lot of data over a 100Mbps line. >>> >>> Let's suppose we can save 5GB on a 15GB file by compressing that file. >>> To transfer 15GB we need 15,000 MB / (100/8) MB/sec = 1,200 secs = 20 >>> mins on a perfect network. Usually on Truman network (cross-buildings) >>> it takes 3 times as much. So realistically we need 60 minutes to >>> transfer a 15GB snapshot image. By compressing, the resulting 10GB file >>> would take only 40 mins to transfer. Good deal? No. >>> >>> It *didn't help*. It takes more than 1 hour to compress that file, so >>> the uploading process takes even longer. The clients (pentium4 2.8 HT) >>> somehow struggles to decompress the file too, so the result comes out >>> even. So why the hassle? My conclusion: It's better *not* to compress >>> the image with gzip at all. It's even clearer to see when you have a >>> fast connection, the IO gain goes to CPU computation, the result comes >>> out worse. >>> >>> Turns out gzip, also, bzip2 and zip are terrible in CPU usage, as it >>> takes a lot of time to compress and decompress. There are other >>> algorithms that compress a little bit worse than gzip but is much easier >>> on the CPU (most of them are based on the Lempel-Ziv algorithm): LZO, >>> Google's Snappy, LZF, and LZ4. LZ4 is crazily fast. >>> >>> I did some quick bench-marking with the linux source: >>> >>> 1634!ht:~/src/lz4-read-only$ time ./tar-none.sh ../linux-3.0-rc6 linux-s >>> real 0m4.390s >>> user 0m0.620s >>> sys 0m0.870s >>> >>> 1635!ht:~/src/lz4-read-only$ time ./tar-gzip.sh ../linux-3.0-rc6 linux-s >>> real 0m43.683s >>> user 0m40.901s >>> sys 0m0.319s >>> >>> 1636!ht:~/src/lz4-read-only$ time ./tar-lz4.sh ../linux-3.0-rc6 linux-s >>> real 0m5.568s >>> user 0m4.831s >>> sys 0m0.272s >>> >>> Clear win for lz4! (I used pipe, so theoretically it can be even >>> better). >>> >>> I have patched lz4 utility so that it would happily accept std for stdin >>> for infile, and also std for stdout for outfile, so you can pipe from >>> whatever program you like. >>> >>> git clone [email protected]:htruong/lz4.git for the utility. >>> >>> >>> Cheers, nice weekend, >>> - Huan. >>> -- >>> Huan Truong >>> 600-988-9066 >>> http://tnhh.net/ >>> >> > >
