Splitting op_depth into base_op_depth and work_op_depth

2010-10-09 Thread Philip Martin
phi...@apache.org writes:

 Author: philip
 Date: Fri Oct  8 09:53:19 2010
 New Revision: 1005751
  
 +  PM: Yes, we have overwrite sematics.  The FS layer on the server has
 +  magic that converts the copy of the r12 descendant into a replace if
 +  the descendant exists in r10.  The client does not send a delete.
 +
 +  This magic applies to copies, not deletes, so there is a problem
 +  when the descendant is deleted in the mixed-revision copy in the
 +  working copy.  When faced with a copy of the subtree at r10 and a
 +  delete of a descendant at r12 the commit doesn't work at present.
 +  Deleting the descendant is wrong if it does not exist in r10, but
 +  not deleting it is wrong if it does exist.  I suppose the client
 +  could ask the server, or perhaps use multiple layers of BASE to
 +  track mixed-revisions (argh!).

Suppose we were to split NODES.op_depth into base_op_depth and
work_op_depth, one of which is always set and one of which is always
NULL.  Then we could represent a mixed revision working copy as a
layering of base_op_depth. (work_op_depth=0 might be allowed but
otherwise work_op_depth would be much like op_depth0.)

Having layers of base_op_depth would allow us to represent a mixed-rev
copy as layers of work_op_depth and so should solve the delete problem
above.  We could also used layers of base_op_depth to represent
switched subtrees, and that would probably allow us to handle deletes
of the root of the switch (as a tree conflict perhaps?).  Layers of
base_op_depth would also allow us to represent externals as single
working copy: a base_op_depth=1 without base_op_depth=0 would be an
external.

This is probably not something for 1.7, I'd really like to get 1.7
released rather than spend forever redesigning it, but perhaps this is
something for 1.8?

-- 
Philip


Re: [PATCH] Use neon's system proxy detection if not explicitly specified

2010-10-09 Thread Gavin Beau Baumanis
Ping. This patch has received no further comments.


Gavin Beau Baumanis


On 29/09/2010, at 8:48 PM, Dominique Leuenberger wrote:

 On Wed, 2010-09-29 at 12:42 +0200, Daniel Shahaf wrote:
 
 For me either way is fine: I can update the patch to also detect newer
 versions as suggest by you. Which in turn will still break all the other
 detections of SVN_NEON_0_28 and older. Or we keep them 'in sync'
 together and fix them all together at a later stage.
 
 The latter please; when Neon 0.39 comes around we'll fix all checks at
 the same time.
 
 In this case I would consider my patch complete, as this is already what
 has been submitted.
 
 Thanks for your confirmation; Is there any further action to be taken by
 myself for this to happen?
 
 Dominique
 
 


Re: [WIP PATCH] Make svn_diff_diff skip identical prefix and suffix to make diff and blame faster

2010-10-09 Thread Johan Corveleyn
On Sat, Oct 9, 2010 at 2:57 AM, Julian Foad julian.f...@wandisco.com wrote:
 On Sat, 2010-10-09, Johan Corveleyn wrote:
 Ok, third iteration of the patch in attachment. It passes make check.

 As discussed in [1], this version keeps 50 lines of the identical
 suffix around, to give the algorithm a good chance to generate a diff
 output of good quality (in all but the most extreme cases, this will
 be the same as with the original svn_diff algorithm).

 That's about the only difference with the previous iteration. So for
 now, I'm submitting this for review. Any feedback is very welcome :-).

 Hi Johan.

Hi Julian,

Thanks for taking a look.

 I haven't reviewed it, but after seeing today's discussion I had just
 scrolled quickly through the previous version of this patch.  I noticed
 that the two main functions - find_identical_suffix and
 find_identical_suffix - are both quite similar (but not quite similar
 enough to make them easily share implementation) and both quite long,
 and I noticed you wrote in an earlier email that you were finding it
 hard to make the code readable.  I have a suggestion that may help.

 I think the existing structure of the svn_diff__file_baton_t is
 unhelpful:
 {
  const svn_diff_file_options_t *options;
  const char *path[4];

  apr_file_t *file[4];
  apr_off_t size[4];

  int chunk[4];
  char *buffer[4];
  char *curp[4];
  char *endp[4];

  /* List of free tokens that may be reused. */
  svn_diff__file_token_t *tokens;

  svn_diff__normalize_state_t normalize_state[4];

  apr_pool_t *pool;
 } svn_diff__file_baton_t;

 All those array[4] fields are logically related, but this layout forces
 the programmer to address them individually.

 So I wrote a patch - attached - that refactors this into an array of 4
 sub-structures, and simplifies all the code that uses them.

 I think this will help you to get better code clarity because then your
 increment_pointer_or_chunk() for example will be able to take a single
 pointer to a file_info structure instead of a lot of pointers to
 individual members of the same.

 Would you take a look and let me know if you agree.  If so, I can commit
 the refactoring straight away.

Yes, great idea! That would indeed vastly simplify a lot of the code.
So please go ahead and commit the refactoring.

Also, maybe the last_chunk number could be included in the file_info
struct? Now it's calculated in several places: last_chunk0 =
offset_to_chunk(file_baton-size[idx0]), or I have to pass it on
every time as an extra argument. Seems like the sort of info that
could be part of file_info.

One more thing: you might have noticed that for find_identical_suffix
I use other buffers, chunks, curp's, endp's, ... than for the prefix.
For prefix scanning I can just use the stuff from the diff_baton,
because after prefix scanning has finished, everything is buffered and
pointing correctly for the normal algorithm to continue (i.e.
everything points at the first byte of the first non-identical line).
For suffix scanning I need to use other structures (newly alloc'd
buffer etc), so as to not mess with those pointers/buffers from the
diff_baton.

So: I think I'll need the file_info struct to be available out of the
diff_baton_t struct as well, so I can use this in suffix scanning
also.

(side-note: I considered first doing suffix scanning, then prefix
scanning, so I could reuse the buffers/pointers from diff_baton all
the time, and still have everything pointing correctly after
eliminating prefix/suffix. But that could give vastly different
results in some cases, for instance when original file is entirely
identical to both the prefix and the suffix of the modified file. So I
decided it's best to stick with first prefix, then suffix).

 Responding to some of the other points you mentioned in a much earlier
 mail:

 3) It's only implemented for 2 files. I'd like to generalize this for
 an array of datasources, so it can also be used for diff3 and diff4.

 4) As a small hack, I had to add a flag datasource_opened to
 token.c#svn_diff__get_tokens, so I could have different behavior for
 regular diff vs. diff3 and diff4. If 3) gets implemented, this hack is
 no longer needed.

 Yes, I'd like to see 3), and so hack 4) will go away.

I'm wondering though how I should represent the datasources to pass
into datasources_open. An array combined with a length parameter?
Something like:

static svn_error_t *
datasources_open(void *baton, apr_off_t *prefix_lines,
 svn_diff_datasource_e[] datasources, int datasources_len)

? And then use for loops everywhere I now do things twice for the two
datasources?

 5) I've struggled with making the code readable/understandable. It's
 likely that there is still a lot of room for improvement. I also
 probably need to document it some more.

 You need to write a full doc string for datasources_open(), at least.
 It needs especially to say how it relates to datasource_open() - why
 should the caller call this 

Re: [WIP PATCH] Make svn_diff_diff skip identical prefix and suffix to make diff and blame faster

2010-10-09 Thread Daniel Shahaf
Johan Corveleyn wrote on Sat, Oct 09, 2010 at 14:21:09 +0200:
 (side-note: I considered first doing suffix scanning, then prefix
 scanning, so I could reuse the buffers/pointers from diff_baton all
 the time, and still have everything pointing correctly after
 eliminating prefix/suffix. But that could give vastly different
 results in some cases, for instance when original file is entirely
 identical to both the prefix and the suffix of the modified file. So I
 decided it's best to stick with first prefix, then suffix).

What Hyrum said.  How common /is/ this case?  And, anyway, in that case
both everything was appended and everything was prepended are
equally legitimate diffs.


BitTorrent RA layer

2010-10-09 Thread Ozzie Chan
Hi all,

I've recently been contemplating implementing an RA layer using the
bittorrent protocol in order to speed up large repository checkouts.

The primary impetus for this feature is to get a large development group,
geographically colocated up and running quickly. The code base is large
(~1gb), and initial checkouts are a major pain. If we could harness peer to
peer downloads, then most of this pain goes away.

Has anyone thought about this before? How difficult would it be? Is anyone
perhaps interested in coordinating an effort to do this?


Re: BitTorrent RA layer

2010-10-09 Thread Peter Samuelson

[Ozzie Chan]
 I've recently been contemplating implementing an RA layer using the
 bittorrent protocol in order to speed up large repository checkouts.

I don't think it would fit the RA layer very well, honestly.  I think
what you'd do instead is seed a torrent of a full checkout, or perhaps
of a svn dump file.

Or you could come up with a protocol that is somewhat, but not
entirely, like bittorrent: each client seeks dumpfiles of all the
revisions in the repository.  They exchange these much like normal
bittorrent payloads, except that there's probably no way to come up
with the checksums in advance, so the clients would all have to trust
each other.  The repository would serve as the initial seed, and each
client would use 'svnrdump' (a tool to generate a dumpfile over the RA
layer) to retrieve new revisions from the repository that are not
already in the BT network.

I note that this gives you a copy of the repository, which is a
superset of a checkout and may be many times larger.  Can be useful,
too, to set up a local write-through proxy via mod_dav_svn and keep it
up to date with svnsync.

Refer also to Luke Leighton's recent git-BT gateway proof of concept:

http://lists.debian.org/debian-devel/2010/09/msg9.html and following
http://gitorious.org/python-libbittorrent/pybtlib

I note that git is probably better suited to the bittorrent gateway
concept than svn is, since it is changeset-oriented, and each changeset
contains and is uniquely identified by a cryptographic hash.

-- 
Peter Samuelson | org-tld!p12n!peter | http://p12n.org/


Re: [WIP PATCH] Make svn_diff_diff skip identical prefix and suffix to make diff and blame faster

2010-10-09 Thread Johan Corveleyn
On Sat, Oct 9, 2010 at 5:19 PM, Daniel Shahaf d...@daniel.shahaf.name wrote:
 Johan Corveleyn wrote on Sat, Oct 09, 2010 at 14:21:09 +0200:
 (side-note: I considered first doing suffix scanning, then prefix
 scanning, so I could reuse the buffers/pointers from diff_baton all
 the time, and still have everything pointing correctly after
 eliminating prefix/suffix. But that could give vastly different
 results in some cases, for instance when original file is entirely
 identical to both the prefix and the suffix of the modified file. So I
 decided it's best to stick with first prefix, then suffix).

 What Hyrum said.  How common /is/ this case?  And, anyway, in that case
 both everything was appended and everything was prepended are
 equally legitimate diffs.

Hm, I'm not sure about this one. I just wanted to try the maximum
reasonably possible to keep the results identical to what they were.
Using another buffer for suffix scanning didn't seem that big of a
deal (only slight increase in memory use (2 chunks of 128K in current
implementation)). I made that decision pretty early, before I knew of
the other problem of suffix scanning, and the keep-50-suffix-lines
compromise we decided upon.

There may be more subtle cases than the one I described, I don't know.
OTOH, now that we have the keep-50-suffix-lines, that may help also in
this case. I'll have to think about that. Maybe I can give it a go,
first suffix then prefix, and see if I can find real-life problems ...

-- 
Johan