On Thu, Jan 26, 2006 at 07:18:55PM +0000, Nicholas Clark wrote: > On Thu, Jan 26, 2006 at 03:22:11PM +0800, Steve Gunnell wrote: > > > 5) Seeking through an encoding filter could be highly problematic. > > Filters such as "utf8" that have a non-deterministic byte per character > > ratio should politely refuse seeks. > Clearly as you say, fixed width encodings are fine, when dealing with an > entire file. But if you push a UCS32 filter onto a stream after reading an > odd number of bytes, valid seek positions aren't going to be multiples of 4. > I guess a seek validator can be coded to know this, but it starts getting > fiddly. The other alternative would be that seek/tell locations are always > in bytes in the underlying stream, and purposefully ignore any many-to-1 > filters atop them.
In the case of mis-alignment I think it would entirely reasonable to give the user exactly what they asked for (if possible) or the filter can throw an exception. It also sound like we want to be able to seek/tell both by character and by byte. Cheers, -J --
pgpnQszi1xmX8.pgp
Description: PGP signature