Re: [darcs-users] patch metadata, annotations, Ignore-this, tagging, etc

Max Battcher Mon, 22 Mar 2010 21:49:48 -0700

On 3/23/2010 0:06, Jason Dagit wrote:

I have too many side projects for the amount of time I give them, but
one idea that keeps coming back up in my brain is to use
criterion/progression to benchmark various parsers for the current darcs
format.  I was thinking pitting attoparsec vs. darcs source vs.
attoparsec-iteratee vs. pure iteratee vs. database backend vs. ??.
  Comparing memory usage would be in there too, but I don't think
criterion has a way to do that yet.


Would that, along with some asymptotic memory/time analyses, satisfy
your craving?  I ask because it seems like knowing a particular
parser/format works well enough for general purpose usage isn't as good
as having evidence that it works well on a specific specialized task.


Mostly.

    I can choose one to use based on the requirements of the current
    project. Same for YAML or JSON... But each and every "special" or
    "proprietary" parser brings its own learning curve.)


Which one would you pick for a YAML patch format?  Suppose Haskell isn't
a consideration.

For YAML there are predominantly two standard parsers available in mostlanguages: a language-specific parser and a binding around libsyck, theC SAX-like parser. Most of of the language-specific parsers haveSAX-like modes of operation, to further complicate things. I'd startwith the language-specific parser and migrate to the libsyck-based oneif necessary, but it might not be, depending on the language I'm workingwith of course.

I'm a little confused by the flow of the conversation here.  Are you
implying that even if we had a tested/robust RFC822 parser in Haskell
you'd rather we didn't use that format?

Given the choice between parsing YAML or RFC822, as a third-partyconsumer of darcs patches/information/metadata, I'd rather parse YAML.

I'm not completely opposed to RFC822-style patch metadata formatting,but I definitely think there are better formats worth considering first.I brought up YAML in particular because I think it can be good forRFC822-like "style", when read by human eyes, while having an overallmore explicitly defined markup and data structure.

Just some musings about a pony format:

Yes, this and keeping as much on disk as possible while inspecting a
patch sequence lead me recently to wonder again about using a 3rd party
database as the storage.  Sqlite is easy, but not my favorite (Mainly I
dislike the lack of foreign keys and type enforcement.  Those are merely
annoying but not show stoppers due to features like triggers and using a
typed programming language to interact with sqlite).

I think I mentioned once before that I do think Sqlite could make a verynice backend for some potential future darcs/darcs-offspring format. Atthe very least it would be something interesting to experiment. Sqliteis particularly appealing because it is a single-file DB format and canbe transmitted easily over the wire. Of course, that file could growfairly large and you'd end up needing some smart protocol for push/pullhand-shakes to avoid having to download an entire DB everytime... Youcould possibly break it into sections like the current inventories, butI'd assume you would lose some of the advantages of using a DB format inthe first place the smaller you chunk the inventories.

It seems like if we used a relational db we'd be forced to store patch
hunks in the filesystem, but that's probably for the best anyway.  With
the hunks stored on disk separately you'd almost never need to have them
in memory (I think).  I guess maybe the initial diff that created the
patch or a replace patch might require it.  Perhaps some conflicts.
  Basically the patch inventory would be in a table and indexed so that
hopefully we'd see good performance when interacting with it.

Of course, you'd certainly want a hashed, packed format for storing allof those hunks, rather than individually.

I expect we'd still need hashed-storage to efficiently query/update the
filesystem and we'd probably also want the filecache (not sure).  So I'm
really only talking about storing the patch metadata in the database.

I would expect the filecache wouldn't be necessary with a relationaldatabase: the filecache is a cached mapping between a file (name) andthe patches that modify that file. Given a relational DB, its simply arelation of the patch_summary table and the patch table: SELECT * FROMpatch WHERE patch.hash = (SELECT hash FROM patch_summary WHERE filename= "...")

A similar pony repository format idea might be to try experimenting withone of the new, hip document databases like couchdb. I've thought attimes hashed-storage already seems to be converging in the direction ofa document database... Interesting thought, a couchdb-based darcs...


--
--Max Battcher--
http://worldmaker.net
_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Re: [darcs-users] patch metadata, annotations, Ignore-this, tagging, etc

Reply via email to