Bzip2, gzip,
Why do you guys keep quoting those total outdated compressors :)
there is 7-zip for linux, it's open source and also part of LZMA. On
average remnants
are 2x smaller than what gzip/bzip2 is doing for you (so bzip2/gzip
is factor 2 worse).
7-zip also works parallel, not sure whether it works in linux
parallel. 7za is command line
version.
Linux distributions should include it default.
Uses PPM, that's a new form of multidimensional compression that all
that old junk like
bzip2/gzip doesn't use.
TIFF files compress real bad of course. Maybe convert them to some
more inefficient format,
which increases its size probably, which then compresses real great
with PPM.
When googling for the best compressors, don't try PAQ, that's a
benchmark compressor. Was worse for my
terabyte of data than even 7-zip (which is not by far best PPM
compressor, but it's open source).
Vincent
On Oct 3, 2008, at 3:11 AM, Bill Broadley wrote:
Xu, Jerry wrote:
Hello, Currently I generate nearly one TB data every few days and
I need to pass it
along enterprise network to the storage center attached to my HPC
system, I am
thinking about compressing it (most tiff format image data)
tiff uncompressed, or tiff compressed files? If uncompressed I'd
guess that
bzip2 might do well with them.
as much as I can, as
fast as I can before I send it crossing network ... So, I am
wondering whether
anyone is familiar with any hardware based accelerator, which can
dramatically
improve the compressing procedure..
Improve? You mean compression ratio? Wall clock time? CPU
utilization?
Adding forward error correction?
suggestion for any file system architecture
will be appreciated too..
Er, hard to imagine a reasonable recommendation without much more
information.
Organization, databases (if needed), filenames and related metadata
are rather
specific to the circumstances. Access patterns, retention time,
backups, and many other issues would need consideration.
I have couple of contacts from some vendors but not
sure whether it works as I expected, so if anyone has experience
about it and
want to share, it will be really appreciated !
Why hardware? I have some python code that managed 10MB/sec per
CPU (or 80MB
on 8 CPUs if you prefer) that compresses with zlib, hashes with
sha256, and
encrypts with AES (256 bit key). Assuming the compression you want
isn't
substantially harder than doing zlib, sha256, and aes a single core
from a
dual or quad core chip sold in the last few years should do fine.
1TB every 2 days = 6MB/sec or approximately 15% of a quad core or
60% of a
single core for my compress, hash and encrypt in python.
Considering how
cheap cores are (quad desktops are often under $1k) I'm not sure
what would
justify an accelerator card. Not to mention picking the particular
algorithm
could make a huge difference to the CPU and compression ratio
achieved. I'd
recommend taking a stack of real data and trying out different
compression
tools and settings.
In any case 6MB/sec of compression isn't particularly hard these
days.... even
in python on a 1-2 year old mid range cpu.
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf