Donald Bruce Stewart wrote:
andrewcoppin:
Does anybody have any clue why ByteStrings are actually faster? (And why this information isn't easily findable anywhere - must shorly be a VFAQ.)


It's well documented in the API documentation for bytestrings.

Start here:
    http://www.cse.unsw.edu.au/~dons/fps/Data-ByteString-Lazy.html

I've read the API and still left wondering...

And then read:
    http://www.cse.unsw.edu.au/~dons/papers/CSL06.html

Now *that* is far more useful... (And interesting.)

So what you're saying is that whereas "fusion" is usually used to mean "that amazing technology that will one day supply us with unlimited amounts of cheap clean energy [shame it doesn't actually work]", in the context of Haskell it seems "fusion" means "that technology that makes all your code 98% faster for free"? ;-)

I guess the question that's really burning in my mind is "if ByteString is so much faster than [x], why can't you just do the same optimisations to [x]?" In other words, "why should I need to alter my code to get all this fusion goodness?"

Now, as I understand it, a ByteString is a kind of unboxed array (= big RAM savings + big CPU time savings for not building it + big GC savings for not processing millions of list nodes + better cache performance). Or at least, a *strict* ByteString is; I'm very very fuzzy on exactly how a *lazy* ByteString is any different to a normal list. From my reading today, I take it the only real difference is that one is a linked list, whereas the other is a (boxed?) array, so it's smaller. (?)

At any rate, currently all my toy compression algorithms run at respectable speeds using [Word8], *except* for the BWT, which is absurdly slow. I've done everything I can think of to it, and it's still too slow. It's no use, I'm going to have to use ByteStrings to get any kind of performance out of it. I'm just wondering whether using getContents :: [Char] and then packing that into a ByteString is going to completely negate all the speed advantages. (I'm really not keen to completely mangle the entire toolbox just to make this one algorithm hurry up.)


Also, while I'm here... I can see a sorting algorithm implemented in Data.List, but I don't see any for other structures. For example, there doesn't appear to be any sorting functions for any kind of array. There doesn't appear to be anything for ByteString either. I'd like to see such a thing in the libraries. Yes, you can sort things using (say) Data.Map. But what if you have some data that's already *in* an array or a ByteString and you just want to sort it? (I also notice that the mutable arrays don't seem to provide an in-place map function. Obviously that's only ever going to work for a function that doesn't change the value's type, but even so...)

Finally, I don't see anything in the libraries that would efficiently sort (large) strings. Data.List.sort and Data.Map.Map both use an Ord context, and we have instance (Ord x) => Ord [x], but one would think that a [potentially large] list of values could be sorted more efficiently using a radix sort than a quicksort...? An implementation of Data.Map especially for the (common?) case of the keys being a *list* of Ord items would seem useful... (And in my current program, I'm probably going to implement a radix sort on lists of ByteStrings.)

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to