On 4/22/06, Tommy Pettersson <[EMAIL PROTECTED]> wrote: > On Thu, Apr 20, 2006 at 10:56:09PM -0700, Jason Dagit wrote: > > 1) How many bytes do line endings add to the length of the old or new > > content? Is it okay to assume line endings are exactly one byte in > > patches? I know this will hold in unix-land, but what about win32? > > Darcs doesn't do different line endings. \n is a line ending, > \r\n is a line with a \r as the last char, \r (old Mac) is not a > line ending. Any conversions will be done by external filters > when they are implemented.
Alright, in that case I think the way I calculate the length of the old and new content has a hope at working on all the platforms. > > 2) Currently when using darcs interactively (in darcs record for > > example) what you see on the screen is a dump of what goes into the > > patch file. So the direct result of my new patch format is that the > > patch goes from being easily readable by humans to a bunch of garbage > > all lumped together. > > I think the original thought was to have the patch file format > be very human readable (and editable/repairable) and use it also > as screen format, only slightly improved with colors and such, > so that patches looked the same everywhere. Now efficiency has > become more important. I did a very simple record benchmark of my new code vs. the status quo. I was very surprised at what I found. I tried recording a 360mb patch and it took ~40 minutes with the status quo and ~130 minutes with the new format (both of these compared to the ~6 minutes to record if I disable the reading of the patch immediately after the record, so you can do the math to see how long it's actually spending reading the patch). It just blows my mind sometimes how unintiutive performance tuning can be with Haskell. My new hunk reading code is essentially: lines (take n s) where n = length in bytes of either old or new s = patch data as a Stringalike Of course it's a bit different because I use the sal_foo functions, return the unused portion of s, and I had to write my own sal_lines because I didn't see one (but I modeled it after a definition of lines that I found in the prelude). I had thought this would be a really efficient strategy. Guess I was wrong. > One option is to try to balance so that > the format is both efficient and human readable. Otherwise there > has to be either some conversion, probably from file format to > screen format, as the coloring already does, but per patch type, > or the patch interface needs to have two different write > functions. The latter is probably better, maybe it's possible to > use a class to default screen write to file write for all old > patch types? Alright, if I can get the performance of the patch reading to be 'acceptable' then I'll consider this as it sounds like a really good idea. Although, unless I can get the patch reading to be more efficient I doubt I'll bother. Thanks, Jason _______________________________________________ darcs-devel mailing list darcs-devel@darcs.net http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel