On Sat, 25 Dec 2010 14:41:29 -0500, Roy Smith wrote: >> XML works extremely well for large datasets.
One advantage it has over many legacy formats is that there are no inherent 2^31/2^32 limitations. Many binary formats inherently cannot support files larger than 2GiB or 4Gib due to the use of 32-bit offsets in indices. > Of course, one advantage of XML is that with so much redundant text, it > compresses well. We typically see gzip compression ratios of 20:1. > But, that just means you can archive them efficiently; you can't do > anything useful until you unzip them. XML is typically processed sequentially, so you don't need to create a decompressed copy of the file before you start processing it. If file size is that much of an issue, eventually we'll see a standard for compressing XML. This could easily result in smaller files than using a dedicated format compressed with general-purpose compression algorithms, as a widely-used format such as XML merits more effort than any application-specific format. -- http://mail.python.org/mailman/listinfo/python-list