Hi! On Fri, May 30, 2008 at 10:38 AM, Ketil Malde <[EMAIL PROTECTED]> wrote: > "Johan Tibell" <[EMAIL PROTECTED]> writes: >> The intent of the not-yet-existing Unicode string is to represent >> text not bytes. > > Right, so this will replace the .Char8 modules as well? What confused > me was my misunderstanding Duncan to mean that Unicode text would > somehow imply shorter strings than non-Unicode (i.e. 8-bit) text.
Yes. >> To give just one example, short (Unicode) strings are common as keys >> in associative data structures like maps > > I guess typically, you'd break things down to words, so strings of > lenght 4-10 or so. BS uses three words and LBS four (IIRC), so the > cost of sharing typically outweighs the benefit. I'm not sure if you would have much sharing in a map as the keys will be unique. >> Can I also here insert a plea for keeping lazy I/O out of the new >> Unicode module? > > I use ByteString.Lazy almost exclusively. I realize it there's a > penalty in time and space, but the ability to write applications that > stream over multi-Gb files is essential. Lazy I/O comes with a penalty in terms of correctness! Pretending that I/O and the underlying resource allocations (e.g. file handles) aren't observable is bad. Lazy I/O is kinda, maybe usable for small scripts that reads a file or two an spits out a result but for servers it doesn't work at all. Lazy I/O requires unsafe* functions and is therefore, unsafe. The finalizers required can be arbitrary complex depending on what kind of resources need to be allocated. The simple case is a file handle but there's no reason we might need sockets, locks, etc to create the lazy ByteString. Here are two possible interfaces for safe I/O. One isstream based one with explicit close and the other fold based one (i.e. inversion of control): > import qualified Data.ByteString as S > > -- Stream based I/O. > class InputStream s where > read :: s -> IO Word8 > readN :: s -> Int -> IO S.ByteString -- efficient block reads > close :: s -> IO () > > openBinaryFile :: InputStream s => FilePath -> IO s or a left fold over the file's content. The 'foldBytes' function can close the file at EOF. > -- Left fold/callback based I/O. > foldBytes :: FilePath -> (seed -> Word8 -> Either seed seed) -> seed -> IO > seed > -- Efficient block reads. > foldChunks :: FilePath -> (seed -> S.ByteString -> Either seed seed) -> seed > -> IO seed on top of this you might want monadic versions of the above two functions. The case for a Unicode type are analogous. > Of course, these applications couldn't care less about Unicode, so > perhaps the usage is different. The issue of lazy I/O is orthogonal to ByteString vs Unicode(String). Cheers, Johan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe