Am Wed, 08 Feb 2012 20:49:48 -0600 schrieb "Robert Jacques" <sandf...@jhu.edu>:
> On Wed, 08 Feb 2012 02:12:57 -0600, Johannes Pfau > <nos...@example.com> wrote: > > Am Tue, 07 Feb 2012 20:44:08 -0500 > > schrieb "Jonathan M Davis" <jmdavisp...@gmx.com>: > >> On Tuesday, February 07, 2012 00:56:40 Adam D. Ruppe wrote: > >> > On Monday, 6 February 2012 at 23:47:08 UTC, Jonathan M Davis > [snip] > > > > Using ranges of dchar directly can be horribly inefficient in some > > cases, you'll need at least some kind off buffered dchar range. Some > > std.json replacement code tried to use only dchar ranges and had to > > reassemble strings character by character using Appender. That sucks > > especially if you're only interested in a small part of the data and > > don't care about the rest. > > So for pull/sax parsers: Use buffering, return strings(better: > > w/d/char[]) as slices to that buffer. If the user needs to keep a > > string, he can still copy it. (String decoding should also be done > > on-demand only). > > Speaking as the one proposing said Json replacement, I'd like to > point out that JSON strings != UTF strings: manual conversion is > required some of the time. And I use appender as a dynamic buffer in > exactly the manner you suggest. There's even an option to use a > string cache to minimize total memory usage. (Hmm... that > functionality should probably be re-factored out and made into its > own utility) That said, I do end up doing a bunch of useless encodes > and decodes, so I'm going to special case those away and add slicing > support for strings. wstrings and dstring will still need to be > converted as currently Json values only accept strings and therefore > also Json tokens only support strings. As a potential user of the > sax/pull interface would you prefer the extra clutter of special side > channels for zero-copy wstrings and dstrings? Regarding wstrings and dstrings: We'll JSON seems to be UTF8 in almost all cases, so it's not that important. But i think it should be possible to use templates to implement identical parsers for d/w/strings Regarding the use of Appender: Long text ahead ;-) I think pull parsers should really be as fast a possible and low-level. For easy to use highlevel stuff there's always DOM and a safe, high-level serialization API should be implemented based on the PullParser as well. The serialization API would read only the requested data, skipping the rest: ---------------- struct Data { string link; } auto Data = unserialize!Data(json); ---------------- So in the PullParser we should avoid memory allocation whenever possible, I think we can even avoid it completely: I think dchar ranges are just the wrong input type for parsers, parsers should use buffered ranges or streams (which would be basically the same). We could use a generic BufferedRange with real dchar-ranges then. This BufferedRange could use a static buffer, so there's no need to allocate anything. The pull parser should return slices to the original string (if the input is a string) or slices to the Range/Stream's buffer. Of course, such a slice is only valid till the pull parser is called again. The slice also wouldn't be decoded yet. And a slice string could only be as long as the buffer, but I don't think this is an issue, a 512KB buffer can already store 524288 characters. If the user wants to keep a string, he should really do decodeJSONString(data).idup. There's a little more opportunity for optimization: As long as a decoded json string is always smaller than the encoded one(I don't know if it is), we could have a decodeJSONString function which overwrites the original buffer --> no memory allocation. If that's not the case, decodeJSONString has to allocate iff the decoded string is different. So we need a function which always returns the decoded string as a safe too keep copy and a function which returns the decoded string as a slice if the decoded string is the same as the original. An example: string json = { "link":"http://www.google.com", "useless_data":"lorem ipsum", "more":{ "not interested":"yes" } } now I'm only interested in the link. I should be possible to parse that with zero memory allocations: auto parser = Parser(json); parser.popFront(); while(!parser.empty) { if(parser.front.type == KEY && tempDecodeJSON(parser.front.value) == "link") { parser.popFront(); assert(!parser.empty && parser.front.type == VALUE); return decodeJSON(parser.front.value); //Should return a slice } //Skip everything else; parser.popFront(); } tempDecodeJSON returns a decoded string, which (usually) isn't safe to store(it can/should be a slice to the internal buffer, here it's a slice to the original string, so it could be stored, but there's no guarantee). In this case, the call to tempDecodeJSON could even be left out, as we only search for "link" wich doesn't need encoding.