Re: [Haskell-cafe] Attoparsec concatenating combinator
2011/6/7 Bryan O'Sullivan : > On Tue, Jun 7, 2011 at 1:40 AM, Simon Meier wrote: >> >> Why would you need 'unsafePerformIO'. You can scrutinise the 'PS' >> constructors of the slice without dropping down to IO. > > True. Oops :-) > >> >> Using a Builder for concatentation makes sense, if you want to exploit >> that copying a slice of the input array is cheaper right after it has >> been inspected (its fully cached) than later (as it is done when >> collecting slices in a list). > > When I've measured this in the past, I've found that it's often faster to > accumulate a list and then run concat at the end than to use blaze-builder > directly. That was certainly the case wit GHC 6.12; I haven't remeasured > with 7.0. That's why you'll see that some places in the aeson JSON library > use blaze-builder, while others manipulate bytestrings directly. When creating a Builder that you run afterwards, then you essentially create a list of bytestring concatenations as a closure. It makes sense that you don't win with such an approach, as the concatenation still only happens after then end of this list is reached. What you'd need is to nest attoparsec's Parser monad with the 'Put' monad provided in the blaze-builder internals [1]. However, I'm not yet sure how to achieve this in a modular fashion. Perhaps, using iteratee's the right way might provide a good answer, but that's just guesswork. The performance characteristics of blaze-builder are such that for short output sequences working with bytestrings directly is sometimes favorable, as the setup overhead before the Builder can be executed needs to be amortized. However, Builders work with a single pass through the input data, while many bytestring operations require two passes. For example, 'Data.ByteString.pack' first determines the length of the input and only afterwards writes the data to memory. Using blaze-builder's 'fromWord8s' requires only a single traversal and is faster for lists of length >64 on my 32-bit Core 2 Duo machine. I expect a similar effect for concatenating lists of strict bytestrings. [1] http://hackage.haskell.org/packages/archive/blaze-builder/0.3.0.1/doc/html/Blaze-ByteString-Builder-Internal.html#g:3 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Attoparsec concatenating combinator
On Tue, Jun 7, 2011 at 1:40 AM, Simon Meier wrote: > Why would you need 'unsafePerformIO'. You can scrutinise the 'PS' > constructors of the slice without dropping down to IO. True. Oops :-) > Using a Builder for concatentation makes sense, if you want to exploit > that copying a slice of the input array is cheaper right after it has > been inspected (its fully cached) than later (as it is done when > collecting slices in a list). When I've measured this in the past, I've found that it's often faster to accumulate a list and then run concat at the end than to use blaze-builder directly. That was certainly the case wit GHC 6.12; I haven't remeasured with 7.0. That's why you'll see that some places in the aeson JSON library use blaze-builder, while others manipulate bytestrings directly. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Attoparsec concatenating combinator
Bryan O'Sullivan wrote: >> Now that I think of it: in principle, you could >> write a specialised concat that would check the pointer/offset/length >> combinations of its arguments and, if they all abutted perfectly, would just >> return a new view into that same array, sans copying. Gregory Collins wrote: >>> The blaze-builder might work for this also, this is exactly the >>> problem it's designed for. Simon Meier wrote: > Using a Builder for concatentation makes sense... > However, ...some low-level meddling is probably required... > I'm inclined look into that... These are great ideas for further exploiting the bytestring/text specialization of this parser library for super speed. Thanks! Regards, Yitz ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Attoparsec concatenating combinator
2011/6/6 Bryan O'Sullivan : > On Sun, Jun 5, 2011 at 11:00 AM, Yitzchak Gale wrote: >> >> If behind the scenes the concat is copying directly from slices of the >> original >> input, then no, in principle we're not saving much then. >> I thought there were *two* copies going on. > > If you're using the specialised functions like attoparsec's takeWhile, then > all they do is return a view into the underlying array. No copying occurs > until the concat itself. Now that I think of it: in principle, you could > write a specialised concat that would check the pointer/offset/length > combinations of its arguments and, if they all abutted perfectly, would just > return a new view into that same array, sans copying. (You'd have to hide it > behind unsafePerformIO, of course.) Why would you need 'unsafePerformIO'. You can scrutinise the 'PS' constructors of the slice without dropping down to IO. The required 'Eq' instance on 'ForeignPtr' is also present. Using a Builder for concatentation makes sense, if you want to exploit that copying a slice of the input array is cheaper right after it has been inspected (its fully cached) than later (as it is done when collecting slices in a list). However, you can only have one Builder at a time and some low-level meddling is probably required to interleave the feeding of the Parser with input arrays with the feeding of the Builder with free buffers. Nevertheless, for something like parsing Chunked HTTP content it would make a lot of sense. I'm inclined look into that once I finished porting the blaze Builder to the 'bytestring' library. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Attoparsec concatenating combinator
On Sun, Jun 5, 2011 at 11:00 AM, Yitzchak Gale wrote: > If behind the scenes the concat is copying directly from slices of the > original > input, then no, in principle we're not saving much then. > I thought there were *two* copies going on. > If you're using the specialised functions like attoparsec's takeWhile, then all they do is return a view into the underlying array. No copying occurs until the concat itself. Now that I think of it: in principle, you could write a specialised concat that would check the pointer/offset/length combinations of its arguments and, if they all abutted perfectly, would just return a new view into that same array, sans copying. (You'd have to hide it behind unsafePerformIO, of course.) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Attoparsec concatenating combinator
I wrote: >> I was thinking of even lower level: allocating a moderate chunk of >> memory and writing the results directly into it consecutively as a >> special case. Bryan O'Sullivan wrote: > Surely that would save only one copy compared to creating a list of results > and then concatenating them, no? I'd be a little surprised if it proved > worthwhile. If behind the scenes the concat is copying directly from slices of the original input, then no, in principle we're not saving much then. I thought there were *two* copies going on. It might be possible to keep the byte count only in the special case of a concatenating combinator, but that would require some work to implement. Thanks as usual for the fantastic work, Yitz ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Attoparsec concatenating combinator
On Fri, Jun 3, 2011 at 2:52 AM, Yitzchak Gale wrote: > I was thinking of even lower level: allocating a moderate chunk of > memory and writing the results directly into it consecutively as a > special case. > Surely that would save only one copy compared to creating a list of results and then concatenating them, no? I'd be a little surprised if it proved worthwhile. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Attoparsec concatenating combinator
On Fri, Jun 3, 2011 at 11:52 AM, Yitzchak Gale wrote: > Bryan O'Sullivan wrote: >> I'd like a no-copy combinator for the same reasons, but I think it's >> impossible to do without some low-level support. > > I wrote: >>> ...does the internal representation easily admit such a combinator? > >> Not very easily. Internally, attoparsec maintains just three pieces of data >> for its state... If >> there was a "bytes consumed" counter, it would be possible to write a >> "try"-like combinator > > I was thinking of even lower level: allocating a moderate chunk of > memory and writing the results directly into it consecutively as a > special case. The blaze-builder might work for this also, this is exactly the problem it's designed for. G -- Gregory Collins ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Attoparsec concatenating combinator
On 11-06-03 06:00 AM, Yitzchak Gale wrote: Mario Blažević wrote: I don't know if this helps, but the incremental-parser library has exactly the combinator you're looking for. Wow, that is a beautiful implementation of a general parser library. So much simpler than Parsec. Thanks for pointing it out. Thanks. I guess I should get to work fixing its deficiencies then. Why are you hiding those nice Monoid classes in the parser package? Shouldn't it be a separate package? I considered it, and I'll do it if there's interest, but for the first release I decided to keep them close to where they're needed. There was less work that way. I'd hate to release a standalone package with only half the instance implementations. Edward Kmett has also been adding some nice Monoid abstractions lately. I haven't been following all of it. I wonder how yours and his relate. If you mean semigroups, they are related but only through Monoid. Semigroup is a superclass of Monoid, whereas CancellativeMonoid and FunctorialMonoid are its subclasses. CancellativeMonoid is lying between a Monoid and Group in power. It would make more sense to merge my classes into some group package, though I don't see any obvious candidate on Hackage right now. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Attoparsec concatenating combinator
Mario Blažević wrote: > I don't know if this helps, but the incremental-parser library has > exactly the combinator you're looking for. Wow, that is a beautiful implementation of a general parser library. So much simpler than Parsec. Thanks for pointing it out. Why are you hiding those nice Monoid classes in the parser package? Shouldn't it be a separate package? Edward Kmett has also been adding some nice Monoid abstractions lately. I haven't been following all of it. I wonder how yours and his relate. Thanks, Yitz ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Attoparsec concatenating combinator
Bryan O'Sullivan wrote: > I'd like a no-copy combinator for the same reasons, but I think it's > impossible to do without some low-level support. I wrote: >> ...does the internal representation easily admit such a combinator? > Not very easily. Internally, attoparsec maintains just three pieces of data > for its state... If > there was a "bytes consumed" counter, it would be possible to write a > "try"-like combinator I was thinking of even lower level: allocating a moderate chunk of memory and writing the results directly into it consecutively as a special case. I think Data.ByteString.Internal.create might do the trick. In fact, some of the existing basic attoparsec combinators, like takeWhile, could use that kind of treatment. The question is whether you want to dip that low into the ByteString implementation. Part of the problem is that there doesn't seem to be any way to allocate contiguous memory in GHC and then release only part of it. So even ByteString itself is doing extra copying. That is another reason why I think there may be some serious performance gains to be had by exposing those internals in attoparsec. [Duncan: Did you notice that the Haddocks for Data.ByteString.Internals and a few others haven't been building lately?] Thanks, Yitz ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Attoparsec concatenating combinator
On Thu, Jun 2, 2011 at 10:02 AM, Yitzchak Gale wrote: > I often find while using attoparsec and attoparsec-text that I need to > match a number of text parsers consecutively and concatenate the > result. By "text parser" I mean "Parser ByteString" for attoparsec and > "Parser Text" for attoparsec-text. I don't know if this helps, but the incremental-parser library has exactly the combinator you're looking for. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Attoparsec concatenating combinator
On Thu, Jun 2, 2011 at 7:02 AM, Yitzchak Gale wrote: > It seems the best I can do is to collect them all in a list and then > apply concat. But that still copies the text several times. > Right. I'd like a no-copy combinator for the same reasons, but I think it's impossible to do without some low-level support. > If not, does the internal representation easily admit such a combinator? > Not very easily. Internally, attoparsec maintains just three pieces of data for its state: - The current input - Any input we received via continuations, in case we need to backtrack - A flag that denotes whether we were fed EOF by a continuation There are no line numbers, no "bytes consumed" counters, nothing else. If there was a "bytes consumed" counter, it would be possible to write a "try"-like combinator that would hold onto the current input, run a parser, tack on any input received via continuations to the original input, and then use the counter to slice off a portion of that bytestring without copying. I can't think of another way to do it. Adding that counter would be a moderate amount of work, and would presumably have a negative effect on performance. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Attoparsec concatenating combinator
I often find while using attoparsec and attoparsec-text that I need to match a number of text parsers consecutively and concatenate the result. By "text parser" I mean "Parser ByteString" for attoparsec and "Parser Text" for attoparsec-text. It seems the best I can do is to collect them all in a list and then apply concat. But that still copies the text several times. Is there a combinator that does this without all that copying? If not, does the internal representation easily admit such a combinator? Thanks, Yitz ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe