On Monday 05 December 2011, 14:14:56, John Sneer wrote: > I've used Haskell and GHC to solve particular real life application. 4 > tools were developed and their function is almost the same - they > modify textual input according to patterns found in the text. Thus, it
Hmm, modification can be a problem for ByteStrings, since it entails copying. That could be worse for strict BytStrings than lazy, if in the lazy ByteString you can reuse many chunks. > is something like a compiler, the result is also a text and it is not > parsed to tokens as patterns appear on a different level. > > The tools differ in tasks and number of modifications performed, > otherwise, in principal, they are very much similar. > > I used lists (Prelude, Data.List) to develop the tools. After > successfully completing the development, I've started to optimize the > code to make the tools faster. After modification of some algorithms > (which dropped the processing time notably), I started to change data > structures. I swapped lists with lazy bytestrings. Nevertheless, what > an unpleasant surprise, the processing speed dropped down, > significantly / more then 30% time needed). Two main possibilities: 1. your algorithm isn't suited for ByteStrings 2. you're doing it wrong The above indicates 1., but without a more detailed description and/or code, it's impossible to tell. > > So my questions follow: > - What kind of application is lazy bytestring suitable for? Anything that involves examining large sequences of bytes (or ASCII [latin1/other single-byte encoding] text) basically sequentially (it's not good if you have to jump forwards and backwards a lot and far). Also some types of modification of such data. > - Would it be worth using strict bytestring even if input files may be > large? (They would fit in memory, but may consume whole) Probably not, see above. But see above. > - If bytestring is not suitable for text manipulation, is there > something faster than lists? text has already been mentioned, but again, there are types of manipulation it's not well-suited for and where a linked list may be superior. > - It would be nice to have native sort for lazy bytestring - would it be > slower than pack $ Data.List.sort $ unpack ? The natural sort for ByteStrings would be a counting sort, O(alphabet size + length), so for long ByteStrings, it should be significantly faster than pack . sort . unpack, but for short ones, it would be significantly slower. > - If bytestring is suitable for text manipulation could we have some > hGetTextualContents which translates Windows EOL (CR+LF) to LF? Doing such a transformation would be kind of against the purpose of ByteStrings, I think. Isn't the point of ByteStrings to get the raw bytes as efficiently as possible? _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe