On 3/22/2010 14:34, Jason Dagit wrote:


On Mon, Mar 22, 2010 at 11:02 AM, Max Battcher <[email protected]
<mailto:[email protected]>> wrote:

    Long term I'd like a pony, but more importantly for darcs patches to
    be in some easy to parse markup format like JSON, perhaps.


What is the concern you're addressing?  What if we made the darcs patch
parser callable from C?  Would that be good enough?  Why do you want the
patches to be easy to parse in a markup format?

Like I said, it is a wish for a pony. I don't have any specific need in mind, but "wouldn't it be nice if...". Given a long term format change, I would always prefer something standardized and well known over something proprietary and possibly prone to break in unexpected ways. Certainly if a standard markup format were in use already by darcs we wouldn't have as much problems adding metadata or changing patch formats to meet the needs of today.

So, be it JSON or YAML or Binary XML or Google Protocol Buffers or something else I haven't considered, it doesn't really matter: the intent is that it should be something usefully extensible with known efficient parsers and known operating requirements. Which is to say that I appreciate your arguments for efficiency, Jason, but precisely because of those arguments I've come to strongly appreciate well known parsers over hand-built ones, because I know the "operating efficiencies"...

(As in, I know the relative strengths and weaknesses of the various XML parsers at my disposal in Python or C#. I know which ones call C backing libraries, and I know which ones I'd pick for ease of use and which ones for power and which ones for optimal speed/memory. I can choose one to use based on the requirements of the current project. Same for YAML or JSON... But each and every "special" or "proprietary" parser brings its own learning curve.)

    When Ignore-this was first implemented the medium term solution of
    using a full RFC822 email-like header was broached. Of course,
    RFC822 is full of loopholes and surprisingly hard to parse in
    reality, but the obvious point that Ignore-this: xxx does indeed
    look like an email header still stands. (I'd like to remain on the
    record that I'd still prefer a better name like "Patch conflict
    avoidance hash" than Ignore-this, by the way.)


Yeah.  I think that's fair.  Are there no parsers for RFC822 on
Hackage?  I see this:
http://hackage.haskell.org/packages/archive/mime/0.3.2/doc/html/Codec-MIME-Parse.html

Does that provide the type of parser you're looking for?

RFC822 is an ugly standard to parse: headers end at the first empty line, except in the case when a malformed gateway adds extra spaces everywhere, in which case it might be any invisible line that "seems correct"... RFC822 is still a better standard than the current lack of a standard for Ignore-this headers, but not by much.

    I've been thinking on this some, and I think I have a reasonable
    suggestion that is easier to parse than RFC822, but carries a
    similar effect: YAML formatted darcs comments.
>
That YAML snippets seem pretty reasonable as long as they don't require
the parser to hit an ending tag while parsing the patches themselves
(seems reasonable for a short-ish section of headers though).

YAML was designed for streaming, definitely. In particular, even the most inefficient parser should respect the explicit end of document marker (...) and not need to parse past it before returning results. All of the YAML parsers I've seen are generally much more efficient than that, of course, and I think the YAML specs make it relatively clear how self-contained and easy to parse all of the markup is.

For the
patches I really think we want a format that is more amenable to
streaming or seeking.  You could imagine it having a "table of contents"
section with offsets that can be seek'd to.  I guess strictly speaking
that is doable in an XML schema, but perhaps uncommon.

Seeking probably would be a good property to include on the list of features to prefer when searching for a new long term patch format.

--
--Max Battcher--
http://worldmaker.net
_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Reply via email to