[Haskell-cafe] Fun with ByteStrings [was: A very edgy language]

Andrew Coppin Sun, 08 Jul 2007 04:07:44 -0700

Donald Bruce Stewart wrote:

andrewcoppin:
Does anybody have any clue why ByteStrings are actually faster? (And whythis information isn't easily findable anywhere - must shorly be a VFAQ.)
It's well documented in the API documentation for bytestrings.

Start here:
    http://www.cse.unsw.edu.au/~dons/fps/Data-ByteString-Lazy.html


I've read the API and still left wondering...

And then read:
    http://www.cse.unsw.edu.au/~dons/papers/CSL06.html


Now *that* is far more useful... (And interesting.)

So what you're saying is that whereas "fusion" is usually used to mean"that amazing technology that will one day supply us with unlimitedamounts of cheap clean energy [shame it doesn't actually work]", in thecontext of Haskell it seems "fusion" means "that technology that makesall your code 98% faster for free"? ;-)

I guess the question that's really burning in my mind is "if ByteStringis so much faster than [x], why can't you just do the same optimisationsto [x]?" In other words, "why should I need to alter my code to get allthis fusion goodness?"

Now, as I understand it, a ByteString is a kind of unboxed array (= bigRAM savings + big CPU time savings for not building it + big GC savingsfor not processing millions of list nodes + better cache performance).Or at least, a *strict* ByteString is; I'm very very fuzzy on exactlyhow a *lazy* ByteString is any different to a normal list. From myreading today, I take it the only real difference is that one is alinked list, whereas the other is a (boxed?) array, so it's smaller. (?)

At any rate, currently all my toy compression algorithms run atrespectable speeds using [Word8], *except* for the BWT, which isabsurdly slow. I've done everything I can think of to it, and it's stilltoo slow. It's no use, I'm going to have to use ByteStrings to get anykind of performance out of it. I'm just wondering whether usinggetContents :: [Char] and then packing that into a ByteString is goingto completely negate all the speed advantages. (I'm really not keen tocompletely mangle the entire toolbox just to make this one algorithmhurry up.)

Also, while I'm here... I can see a sorting algorithm implemented inData.List, but I don't see any for other structures. For example, theredoesn't appear to be any sorting functions for any kind of array. Theredoesn't appear to be anything for ByteString either. I'd like to seesuch a thing in the libraries. Yes, you can sort things using (say)Data.Map. But what if you have some data that's already *in* an array ora ByteString and you just want to sort it? (I also notice that themutable arrays don't seem to provide an in-place map function. Obviouslythat's only ever going to work for a function that doesn't change thevalue's type, but even so...)

Finally, I don't see anything in the libraries that would efficientlysort (large) strings. Data.List.sort and Data.Map.Map both use an Ordcontext, and we have instance (Ord x) => Ord [x], but one would thinkthat a [potentially large] list of values could be sorted moreefficiently using a radix sort than a quicksort...? An implementation ofData.Map especially for the (common?) case of the keys being a *list* ofOrd items would seem useful... (And in my current program, I'm probablygoing to implement a radix sort on lists of ByteStrings.)


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Fun with ByteStrings [was: A very edgy language]

Reply via email to