Re: [Haskell-cafe] Efficient string output
ketil: > > Hi, > > I'm currently working on a program that parses a large binary file and > produces various textual outputs extracted from it. Simple enough. > > But: since we're talking large amounts of data, I'd like to have > reasonable performance. > > Reading the binary file is very efficient thanks to Data.Binary. > However, output is a different matter. Currently, my code looks > something like: > > summarize :: Foo -> ByteString > summarize f = let f1 = accessor f > f2 = expression f >: > in B.concat [f1,pack "\t",pack (show f2),...] > > which isn't particularly elegant, and builds a temporary ByteString > that usually only get passed to B.putStrLn. I can suffer the > inelegance were it only fast - but this ends up taking the better part > of the execution time. Why not use Data.Binary for output too? It is rather efficient at output -- using a continuation-like system to fill buffers gradually. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Efficient string output
Duncan Coutts writes: > Have you considered using Data.Binary to output the data too? It has a > pretty efficient underlying monoid for accumulating output data in a > buffer. You'd want some wrapper functions over the top to make it a bit > nicer for your use case, but it should work and should be quick. I've used Data.Binary.Builder to generate the output, which is quite nice as an interface. Currently, I've managed to shave off a few percent off the time - nothing radical yet, but there's a lot of room for tuning various convenience functions in there. -k -- If I haven't seen further, it is by standing in the footprints of giants ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Efficient string output
On Mon, Feb 9, 2009 at 1:22 PM, Ketil Malde wrote: > Johan Tibell writes: >> If so, you might want to use `writev` to avoid extra copying. > > Is there a Haskell binding somewhere, or do I need to FFI the system > call? Googling 'writev haskell' didn't turn up anything useful. To my knowledge there's no binding out there. We will include one for sockets in the next release of network-bytestring. You might find the code here useful if you want to write your own: http://github.com/tibbe/network-bytestring/blob/c13d8fab5179e6afbcdebac95d4993ac57f04689/Network/Socket/ByteString/Internal.hs Cheers, Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Efficient string output
Bulat Ziganshin writes: >> in B.concat [f1,pack "\t",pack (show f2),...] > i'm not a BS expert but it seems that you produce Strings using show > and then convert them to BS. of course this is inefficient - you need > to replace show with BS analog Do these analogous functions exist, or must I roll my own. I've also looked a bit at Data.Binary.Builder, perhaps this is the way to go? Will look more closely. -k -- If I haven't seen further, it is by standing in the footprints of giants ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Efficient string output
On Mon, 2009-02-09 at 12:49 +0100, Ketil Malde wrote: > Hi, > > I'm currently working on a program that parses a large binary file and > produces various textual outputs extracted from it. Simple enough. > > But: since we're talking large amounts of data, I'd like to have > reasonable performance. > > Reading the binary file is very efficient thanks to Data.Binary. > However, output is a different matter. Currently, my code looks > something like: Have you considered using Data.Binary to output the data too? It has a pretty efficient underlying monoid for accumulating output data in a buffer. You'd want some wrapper functions over the top to make it a bit nicer for your use case, but it should work and should be quick. It generates a lazy bytestring, but does so with a few large chunks so the IO will still be quick. Duncan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Efficient string output
+1; it's obviously the packing that causes sloth. Memoize the "pack "\t"" etc. stuff , and write bytestring replacements for show for your data. I guess you can use the Put monad instead of B.concat for that, by the way. 2009/2/9 Bulat Ziganshin : > Hello Ketil, > > Monday, February 9, 2009, 2:49:05 PM, you wrote: > >> in B.concat [f1,pack "\t",pack (show f2),...] > >> inelegance were it only fast - but this ends up taking the better part >> of the execution time. > > i'm not a BS expert but it seems that you produce Strings using show > and then convert them to BS. of course this is inefficient - you need > to replace show with BS analog > > -- > Best regards, > Bulatmailto:bulat.zigans...@gmail.com > > ___ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Efficient string output
Johan Tibell writes: > Is building the strict ByteString what takes the most time? Yes. > If so, you might want to use `writev` to avoid extra copying. Is there a Haskell binding somewhere, or do I need to FFI the system call? Googling 'writev haskell' didn't turn up anything useful. > Does your data support incremental processing so that you could > produce output before all input has been parsed? Typically, yes. -k -- If I haven't seen further, it is by standing in the footprints of giants ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Efficient string output
Hello Ketil, Monday, February 9, 2009, 2:49:05 PM, you wrote: > in B.concat [f1,pack "\t",pack (show f2),...] > inelegance were it only fast - but this ends up taking the better part > of the execution time. i'm not a BS expert but it seems that you produce Strings using show and then convert them to BS. of course this is inefficient - you need to replace show with BS analog -- Best regards, Bulatmailto:bulat.zigans...@gmail.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Efficient string output
On Mon, Feb 9, 2009 at 12:49 PM, Ketil Malde wrote: > Reading the binary file is very efficient thanks to Data.Binary. > However, output is a different matter. Currently, my code looks > something like: > > summarize :: Foo -> ByteString > summarize f = let f1 = accessor f >f2 = expression f > : >in B.concat [f1,pack "\t",pack (show f2),...] > > which isn't particularly elegant, and builds a temporary ByteString > that usually only get passed to B.putStrLn. I can suffer the > inelegance were it only fast - but this ends up taking the better part > of the execution time. Is building the strict ByteString what takes the most time? If so, you might want to use `writev` to avoid extra copying. Does your data support incremental processing so that you could produce output before all input has been parsed? Cheers, Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe