Re: AW: How to find out the rev number where a file was deleted?

2010-11-29 Thread Johan Corveleyn
[ moving to dev@ ]

Following up on a discussion on the users list about the lack of a way
to easily find the rev number in which a file was deleted...

Already referred to issue #3627 (FS API support for oldest-to-youngest
history traversal) and FS-NG, as mentioned on the roadmap. But the
discussion continued about why this is so hard right now, and if there
are alternative approaches. See below...

On Mon, Nov 29, 2010 at 3:51 AM, Daniel Shahaf d...@daniel.shahaf.name wrote:
 Johan Corveleyn wrote on Sun, Nov 28, 2010 at 21:20:28 +0100:
 On Sun, Nov 28, 2010 at 6:35 PM, Daniel Shahaf d...@daniel.shahaf.name 
 wrote:
  Stefan Sperling wrote on Sun, Nov 28, 2010 at 16:48:30 +0100:
  The real problem is that we want to be able to answer these questions
  very fast, and some design aspects work against this. For instance,
  FSFS by design does not allow modifying old revisions. So where do
  we store the copy-to information for a given p...@n?
 
  copy-to information is immutable (never changes once created), so we
  could add another hierarchy (parallel to revs/ and revprops/) in which
  to store that information.  Any 'cp f...@n bar' operation would need to
  create/append a file in that hierarchy.
 
  Open question: how to organize $new_hierarchy/16/16384/** to make it
  efficiently appendable and queryable (and for what queries? Iterate
  all copied-to places is one).
 
  Makes sense?

 I'm not sure. But there is another alternative: while we wait for
 FS-NG (or another solution like you propose), one could implement the
 slow algorithm within the current design.

 Are you advocating to implement it in the core (as an svn_fs_* API) or
 as a third-party script?  The latter is certainly fine, but regarding
 the former I don't see the point of adding an API that cannot be
 implemented efficiently at this time.

Why not in the core? We can't do this quickly, so we don't do it is
not a very strong argument against having this very useful
functionality IMHO.

Having it in the core is vastly more useful for people like me (and my
colleagues): works on Windows, regardless of whether or not one has
perl/python installed, no need to distribute an additional script,
guaranteed to be available everywhere an svn client is installed, ...

It's actually quite similar to the way blame is implemented
currently: we don't really have the design (line-based information) to
do this quickly, but we calculate it from the other information that
we have available (in a way that could also be done by a script on the
client: diffing every interesting revision against the next,
remembering the lines that were added/removed in every step). Can you
imagine not having blame in svn core just because we can't do it
quickly? Ok, blame may be a more important use case than finding the
rev number where a file was deleted, but still ...

So I still think it's definitely worth it to have this in the core and
offer an API, and implement it slowly now because that's the only way
we can do it (besides, I don't think it will be *that* slow). And
optimize it later when we have FS-NG, or another way to retrieve
this info quickly...

However, having said all that doesn't change the fact that someone
still needs to implement it, and I must admit I don't have the cycles
for that currently :-(.

Cheers,
Johan

 Just automating what a
 user (or script) currently does when looking for this information,
 i.e. a binary search.

 Of course it would be slow, but it would certainly already provide
 value. At the very least, it saves users a lot of time searching FAQ's
 and list archives, wondering why this doesn't work, understanding the
 design limitations, and then finally implementing their own script or
 doing a one-time manual search.

 Then, when FS-NG arrives, or someone comes up with a way to index this
 information, it can be optimized.

 I don't know if there would be fundamental problems with that, apart
 from the fact that someone still needs to implement it of course ...

 Cheers,
 --
 Johan



Re: AW: How to find out the rev number where a file was deleted?

2010-11-29 Thread Daniel Shahaf
Johan Corveleyn wrote on Mon, Nov 29, 2010 at 10:14:01 +0100:
 Having it in the core is vastly more useful for people like me (and my
 colleagues): works on Windows, regardless of whether or not one has
 perl/python installed, no need to distribute an additional script,
 guaranteed to be available everywhere an svn client is installed, ...
 

You are talking about having the functionality supported by the svn*
binaries.  I was talking about having the functionality supported by
the svn_fs_* API.

I agree these questions are related, but they aren't precisely the same
question.

 It's actually quite similar to the way blame is implemented
 currently: we don't really have the design (line-based information) to
 do this quickly, but we calculate it from the other information that
 we have available (in a way that could also be done by a script on the
 client: diffing every interesting revision against the next,
 remembering the lines that were added/removed in every step).
 

If svn_client_blameN() re-uses its RA session, then it has an advantage
over a shell script that calls 'svn diff' repeatedly.  I agree it still
doesn't have an advantage over a C bindings script that calls
svn_client_diffN() repeatedly.