On Mon, Mar 22, 2010 at 11:02 AM, Max Battcher <[email protected]> wrote:

> Eric Kow wrote:
>
>> Long term [Darcs 3]
>> -------------------
>> A new patch format in general may be interesting for the long term.
>> http://bugs.darcs.net/patch1096 appears to be a step in that direction.
>>
>
> Long term I'd like a pony, but more importantly for darcs patches to be in
> some easy to parse markup format like JSON, perhaps.


What is the concern you're addressing?  What if we made the darcs patch
parser callable from C?  Would that be good enough?  Why do you want the
patches to be easy to parse in a markup format?

One very real danger with using, say, xml for patches is that many parsers
don't let you know if you've got a correct parse until you reach the ending
tag.  That means it's easy to endup in a situation where a non-specialized
parser holds the whole input in memory just to get you some little bit of
data from near the end.  I'm not sure if it makes life worse when you're
doing a linear pass and extracting the contents, but I suspect it would make
our life difficult with the current way of doing things.  Meaning, if a
patch hand a large hunk in the middle, XML may make life really difficult
and we'd need to put a reference to the hunk in the XML and store the hunk
elsewhere.  It would probably only be safe to parse large patch bundles
using a SAX style parser (my hunch).


>
>
>  Medium term [Polished Darcs 2]
>> ------------------------------
>> I claim that this coming up with this new patch format is unrealistic
>> for the medium term (defined as post-performance-obsession and
>> pre-Darcs-3).  If we were to use anything better, it'd have to be
>> backwards-compatible (ie. using the patch long comment?)
>>
>> Therefore, it would be interesting to determine if
>>
>> 1. If a new backwards compatible format will be useful in the medium
>>   term [which could last for many years mind you, if you also add in
>>   the short-term], or if we can get away with using Ignore-this for
>>   that time
>>
>> 2. If the new format could just start with "Ignore-this:"
>>
>> 3. What the new format would actually look like
>>
>> We don't have to open this discussion now, but it's now being tracked as
>> a potential project in <http://bugs.darcs.net/issue1787>.  My request is
>> for whoever launches the third salvo in this discussion please research
>> the past threads (eg. when we introduced the Ignore-this salt for
>> issue27?) and link them here
>>
>> There's also some very interesting future work on patch annotations
>> <http://bugs.darcs.net/issue1613> for optional metadata.  It may even
>> be medium-term if we're lucky.
>>
>
> When Ignore-this was first implemented the medium term solution of using a
> full RFC822 email-like header was broached. Of course, RFC822 is full of
> loopholes and surprisingly hard to parse in reality, but the obvious point
> that Ignore-this: xxx does indeed look like an email header still stands.
> (I'd like to remain on the record that I'd still prefer a better name like
> "Patch conflict avoidance hash" than Ignore-this, by the way.)
>

Yeah.  I think that's fair.  Are there no parsers for RFC822 on Hackage?  I
see this:
http://hackage.haskell.org/packages/archive/mime/0.3.2/doc/html/Codec-MIME-Parse.html

Does that provide the type of parser you're looking for?


>
> I've been thinking on this some, and I think I have a reasonable suggestion
> that is easier to parse than RFC822, but carries a similar effect: YAML
> formatted darcs comments.
>
> YAML (yaml.org) is a JSON superset that was designed to be more
> human-readable/human-editable than JSON. Since long comments are still meant
> to be examined (and perhaps amended) by us humans, I'm all for keeping
> markup to a reasonable minimum. However, YAML is still easy to parse, with
> libraries in many languages.
>
> Here's Ignore-this wrapped in an explicit YAML document:
>
>  %YAML 1.2 # YAML version directive, can be used as indicator
>  --- # document start
>  Ignore-this: xxx # same as currently, but now in a YAML mapping
>  ... # document end
>
> We could argue the usefulness of the explicit YAML directive and document
> start (---), but explicit document end (...) makes a clear separation
> between any darcs-interesting metadata and a user's actual content: both to
> simple regex searching, and to YAML parsers (which have the concept of
> "parse the first document" and "parse past the first document"). (Certainly
> an explicit marker is better than RFC822's sometimes difficultly implicit
> marker.)
>
> Of course, the above example doesn't seem too great with just Ignore-this,
> so here's a better example:
>
>  %YAML 1.2
>  ---
>  Ignore-this: yyy
>  Encoding: UTF-8
>  Patch version: 2.0+YAML
>  X-Musdex version: 10.03.22
>  ...
>
>
> So, backwards compatibility issues: much the same as with Ignore-this.
> Patches with long comments with YAML headers get the headers output in
> version of darcs prior to the switchover point. This may not be a big
> problem, for instance, the above example in darcs 2.4 changes output seems
> reasonable:
>
>  %YAML 1.2
>  ---
>  Encoding: UTF-8
>  Patch version: 2.0+YAML
>  X-Musdex version: 10.03.22
>  ...
>
> The big gain is the forwards compatibility for arbitrary headers without
> special casing each and every one or prefixing them all with the silly
> "Ignore-this:" tag. It also would be presumably be forwards compatible with
> some nice long term future version of darcs where arbitrary metadata headers
> can be moved out of the long comment to someone more preferable.
>
> Additional gain is that ignorable header lines now have two strongly
> consistent ways of being handled by scripts: 1) parse the first YAML
> document in the long comment to get the headers, 2) ignore everything to the
> first line that begins with an ellipsis (...) to get to the user comment. In
> both cases a first line beginning with %YAML can be used to denote that
> there is any header at all.
>
> So that's my current suggestion. Feel free to tear it apart.
>

That YAML snippets seem pretty reasonable as long as they don't require the
parser to hit an ending tag while parsing the patches themselves (seems
reasonable for a short-ish section of headers though).  For the patches I
really think we want a format that is more amenable to streaming or
seeking.  You could imagine it having a "table of contents" section with
offsets that can be seek'd to.  I guess strictly speaking that is doable in
an XML schema, but perhaps uncommon.

Jason
_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Reply via email to