Re: RFC: git cat-file --follow-symlinks?

2015-04-30 Thread Jeff King
On Thu, Apr 30, 2015 at 11:44:50AM -0700, David Turner wrote:

  git ls-tree HEAD -- BUILD ?
 
 This does not actually seem to work (even with -r); it only recurses
 into directories that are named BUILD, rather than being equivalent to
 git ls-tree -r HEAD |grep /BUILD$.

Ah, I thought that was what you wanted (to find specific files, not a
pattern). I think `ls-tree` doesn't understand our normal pathspecs, for
historical reasons.

 Also, BUILD files are scattered throughout the tree, so the entire tree
 would still need to be traversed.  At present, our monorepo is not quite
 large enough for this to matter (a full ls-tree only takes me 0.6s), but
 it is growing.

But aren't you asking git to do that internally? I.e., it can limit the
traversal for a prefix-match, but it cannot do so for an arbitrary
filename. It has to open every tree. So the extra expense is really just
the I/O over the pipe. That's not optimal, but it is a constant factor
slowdown from what git would do internally.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: git cat-file --follow-symlinks?

2015-04-30 Thread Jeff King
On Thu, Apr 30, 2015 at 12:00:22PM -0700, David Turner wrote:

   Also, BUILD files are scattered throughout the tree, so the entire tree
   would still need to be traversed.  At present, our monorepo is not quite
   large enough for this to matter (a full ls-tree only takes me 0.6s), but
   it is growing.
  
  But aren't you asking git to do that internally? I.e., it can limit the
  traversal for a prefix-match, but it cannot do so for an arbitrary
  filename. It has to open every tree. So the extra expense is really just
  the I/O over the pipe. That's not optimal, but it is a constant factor
  slowdown from what git would do internally.
 
 No, I'm not trying to find all BUILD files -- only ones that are in the
 transitive dependency tree of the target I'm trying to sparsely check
 out. So if the target foo/bar/baz depends on morx/fleem, and morx/fleem
 depends on plugh/xyzzy, then I have to examine those three places only.
 I don't have to examine anything in the gibbberish/ subtree, for
 instance.  

OK, let me see if I understand your use case by parroting it back.

You _don't_ want to feed git a find all BUILD pattern, which is good
(because it doesn't work ;) ). You do want to feed it a set of raw
paths to find, because you're going to discover the paths yourself at
each step as you recurse through the dependency-chain of build files. 
You don't actually care about feeding those paths to ls-tree at all.
You care only about the _content_ at each path (and will parse that
content to see if you need to take a further recursive step).

So I think git out-of-the-box supports that pretty well (via cat-file).
And your sticking point is that some of the paths may involve symlinks
in the tree, so you want cat-file to answer if I had checked this out
and typed cat /some/path/to/BUILD, what content would I get. Which
brings us back to the original symlink question.

Is that all accurate?

I'm not sure that helps with the how to handle symlinks discussion,
but at least your goals make sense to me at this point.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: git cat-file --follow-symlinks?

2015-04-30 Thread Jeff King
On Thu, Apr 30, 2015 at 08:29:14PM -0700, David Turner wrote:

4. Return the last object we could resolve, as I described. So
 [...]
 
 Actually, I think 4 has an insurmountable problem.  Here's the case I'm
 thinking of:
 
 ln -s ..  morx
 
 Imagine that we go to look up 'morx/fleem'.  Now morx is the last
 object we could resolve, but we don't know how much of our input has
 been consumed at this point.  So consumers don't know that after they
 exit the repo, they still need to find fleem next to it.

Yes, agreed (my list was written before Andreas brought up the idea of
symlinks in the intermediate paths). I think to let the caller pick up
where you left off, you would have to create a new string that has the
remainder concatenated to it.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: git cat-file --follow-symlinks?

2015-04-29 Thread David Turner
On Wed, 2015-04-29 at 14:16 -0700, Jonathan Nieder wrote:
 Hi,
 
 David Turner wrote:
 
  Instead, it would be cool if cat-file had a mode in which it would
  follow symlinks.
 
 Makes sense.
 
  The major wrinkle is that symlinks can point outside the repository --
  either because they are absolute paths, or because they are relative
  paths with enough ../ in them.  For this case, I propose that
  --follow-symlinks should output [sha] symlink [target] instead of the
  usual [sha] blob [bytes].
 
 What happens when the symlink payload contains a newline?

Oh, right.
So, how about [sha] symlink [bytes] \n [target] instead?


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: git cat-file --follow-symlinks?

2015-04-29 Thread Jonathan Nieder
Jeff King wrote:

   1. Git has to make a decision about what to do in corner cases. What
  is our cwd for relative links? The project root?

I don't follow.  Isn't symlink resolution always relative to the
symlink, regardless of cwd?

Thanks,
Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: git cat-file --follow-symlinks?

2015-04-29 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 I had imagined we would stop resolution and you would just get the last
 object peeled object. Combined with teaching cat-file to show more
 object context, doing:

   echo content dest ;# actual blob
   ln -s dest link;# link to blob
   ln -s broken foo   ;# broken link
   ln -s out ../foo   ;# out-of-tree link
   git add .  git commit -m foo
   for i in link broken out; do
   echo HEAD^{resolve}:$i
   done |
   git cat-file --batch=%(intreemode) %(size)

 would yield:

  (1)   100644 8
content
  (2)   04 3
foo
  (3)   04 6
../foo

 where the left-margin numbers are for reference:

   1. We dereference a real symlink, and pretend like we actually asked
  for its referent.

   2. For a broken link, we can't dereference, so we return the link
  itself. You can tell by the mode, and the content tells you what
  would have been dereferenced.

   3. Ditto for out-of-tree. Note that this would be the _raw_ symlink
  contents, not any kind of simplification (so if you asked for
  foo/bar/baz and it was ../../../../out, you would the full path
  with all those dots, not a simplified ../out, which I think is
  what you were trying to show in earlier examples).

s/04/16/ I would think (if you really meant to expose a
tree, write it as 4 instead, so that people will not get a wrong
impression and reimplement a broken tree object encoding some popular
Git hosting site broke their customer projects with ;-).

I am not sure $treeish^{resolve} is a great syntax, but I like the
concept and agree that it is a lot more sensible to handle this at
the level of sha1_name.c layer than an ad-hoc solution in the
cat-file layer.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: git cat-file --follow-symlinks?

2015-04-29 Thread David Turner
On Wed, 2015-04-29 at 21:16 -0400, Jeff King wrote:
 On Wed, Apr 29, 2015 at 06:06:23PM -0700, David Turner wrote:
   3. Ditto for out-of-tree. Note that this would be the _raw_ symlink
  contents, not any kind of simplification (so if you asked for
  foo/bar/baz and it was ../../../../out, you would the full path
  with all those dots, not a simplified ../out, which I think is
  what you were trying to show in earlier examples).

Unfortunately, we need the simplified version, because we otherwise
don't know what the ..s are relative to in the case of a link to a link:

  echo content dest ;# actual blob
  mkdir -p foo/bar
  ln -s foo/bar/baz fleem # in-tree link-to-link 
  ln -s ../../../external foo/bar/baz # out-of-tree link

If echo HEAD^{resolve}:fleem were to return ../../../external (after
following the first symlink to the second), we would have lost
information.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: git cat-file --follow-symlinks?

2015-04-29 Thread David Turner
On Wed, 2015-04-29 at 20:37 -0400, Jeff King wrote:
 On Wed, Apr 29, 2015 at 07:11:50PM -0400, Jeff King wrote:
 
  Yeah, I agree if you let git punt on leaving the filesystem, most of the
  complicated problems go away. It still feels a bit more magical than I
  expect out of cat-file, and there are still corner cases (e.g., do we do
  cycle detection? Or just have a limit to the recursion depth?)
 
 I was pondering the magical above. I think what bugs me is that it
 seems like a feature that is implemented as part of one random bit of
 plumbing, but not available elsewhere.
 
 Conceptually, this is like peeling object names. You may give a tag
 name, but if you ask for a tree commit we will peel the tag to a commit,
 and the commit to a tree. This is sort of the same thing; you give a
 path within a tree, and we will peel until we hit a real non-symlink
 object.
 
 I don't know what the syntax would look like. To match foo^{tree} it
 would be something like:
 
   HEAD:foo/bar^{resolve}
 
 or something like that. Except that it is a bad idea to allow ^{}
 syntax on the right-hand side of a colon, as it is ambiguous with
 filenames that contain ^{resolve}. So it would have to look something
 like:
 
   HEAD^{resolve}:foo/bar
 
 which is a _little_ weird, but actually kind of makes sense. The
 resolve operation inherently is not just about the filename, but about
 uses HEAD^{tree} as the root context.
 
 So I dunno. This pushes the resolving logic even _lower_ in the stack
 than it would be in cat-file. So why do I like it more? Cognitive
 dissonance? I guess I the appeal to me is that it:
 
   1. Makes the concept available more generally (you can rev-parse it,
  you can git show it, etc). It also lets you _name_ the object in
  question, so you can ask for other things besides it contents (like
  its name, its type, etc).
 
   2. Positions it alongside other peeling name-resolution functions.

Just to clarify: if you do git rev-parse, and the result is an
out-of-tree symlink, you see /foo or ../foo instead of a sha?  And if
you git show it it says symlink HEAD:../foo?

This seems totally reasonable to me, and solves my problem.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html