> > Oh, i misunderstood your question. I am switching buffer serialization to > just plain bytes except for 5 characters that are escaped (essentially > similar to string serialization as if the string were iso-8859-1.) > I'm not sure I follow. I think string serialization implies UTF-8 encoding? That means bytes in the range 128-255 would take 2 bytes. If we assume that in a byte buffer, all byte values are equally probable, then the average space for CSV serialization, per byte, would be 1.5 bytes, or 12 bits. Right? Actually a little more because you escape those 5 characters too.
So why not use base64 encoding? The expansion factor would be less, since it essentially uses 8 bits to represent 6. Also, it omits control characters which I think would be a problem with what you're suggesting - we need the CSV files to be human readable, so I think you'd have to escape them too. Or else just leave the encoding as two hex digits per byte.
