Re: RFC: git cat-file --follow-symlinks?
On Thu, Apr 30, 2015 at 11:44:50AM -0700, David Turner wrote: git ls-tree HEAD -- BUILD ? This does not actually seem to work (even with -r); it only recurses into directories that are named BUILD, rather than being equivalent to git ls-tree -r HEAD |grep /BUILD$. Ah, I thought that was what you wanted (to find specific files, not a pattern). I think `ls-tree` doesn't understand our normal pathspecs, for historical reasons. Also, BUILD files are scattered throughout the tree, so the entire tree would still need to be traversed. At present, our monorepo is not quite large enough for this to matter (a full ls-tree only takes me 0.6s), but it is growing. But aren't you asking git to do that internally? I.e., it can limit the traversal for a prefix-match, but it cannot do so for an arbitrary filename. It has to open every tree. So the extra expense is really just the I/O over the pipe. That's not optimal, but it is a constant factor slowdown from what git would do internally. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: git cat-file --follow-symlinks?
On Thu, Apr 30, 2015 at 12:00:22PM -0700, David Turner wrote: Also, BUILD files are scattered throughout the tree, so the entire tree would still need to be traversed. At present, our monorepo is not quite large enough for this to matter (a full ls-tree only takes me 0.6s), but it is growing. But aren't you asking git to do that internally? I.e., it can limit the traversal for a prefix-match, but it cannot do so for an arbitrary filename. It has to open every tree. So the extra expense is really just the I/O over the pipe. That's not optimal, but it is a constant factor slowdown from what git would do internally. No, I'm not trying to find all BUILD files -- only ones that are in the transitive dependency tree of the target I'm trying to sparsely check out. So if the target foo/bar/baz depends on morx/fleem, and morx/fleem depends on plugh/xyzzy, then I have to examine those three places only. I don't have to examine anything in the gibbberish/ subtree, for instance. OK, let me see if I understand your use case by parroting it back. You _don't_ want to feed git a find all BUILD pattern, which is good (because it doesn't work ;) ). You do want to feed it a set of raw paths to find, because you're going to discover the paths yourself at each step as you recurse through the dependency-chain of build files. You don't actually care about feeding those paths to ls-tree at all. You care only about the _content_ at each path (and will parse that content to see if you need to take a further recursive step). So I think git out-of-the-box supports that pretty well (via cat-file). And your sticking point is that some of the paths may involve symlinks in the tree, so you want cat-file to answer if I had checked this out and typed cat /some/path/to/BUILD, what content would I get. Which brings us back to the original symlink question. Is that all accurate? I'm not sure that helps with the how to handle symlinks discussion, but at least your goals make sense to me at this point. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: git cat-file --follow-symlinks?
On Thu, Apr 30, 2015 at 08:29:14PM -0700, David Turner wrote: 4. Return the last object we could resolve, as I described. So [...] Actually, I think 4 has an insurmountable problem. Here's the case I'm thinking of: ln -s .. morx Imagine that we go to look up 'morx/fleem'. Now morx is the last object we could resolve, but we don't know how much of our input has been consumed at this point. So consumers don't know that after they exit the repo, they still need to find fleem next to it. Yes, agreed (my list was written before Andreas brought up the idea of symlinks in the intermediate paths). I think to let the caller pick up where you left off, you would have to create a new string that has the remainder concatenated to it. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: git cat-file --follow-symlinks?
On Wed, 2015-04-29 at 14:16 -0700, Jonathan Nieder wrote: Hi, David Turner wrote: Instead, it would be cool if cat-file had a mode in which it would follow symlinks. Makes sense. The major wrinkle is that symlinks can point outside the repository -- either because they are absolute paths, or because they are relative paths with enough ../ in them. For this case, I propose that --follow-symlinks should output [sha] symlink [target] instead of the usual [sha] blob [bytes]. What happens when the symlink payload contains a newline? Oh, right. So, how about [sha] symlink [bytes] \n [target] instead? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: git cat-file --follow-symlinks?
Jeff King wrote: 1. Git has to make a decision about what to do in corner cases. What is our cwd for relative links? The project root? I don't follow. Isn't symlink resolution always relative to the symlink, regardless of cwd? Thanks, Jonathan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: git cat-file --follow-symlinks?
Jeff King p...@peff.net writes: I had imagined we would stop resolution and you would just get the last object peeled object. Combined with teaching cat-file to show more object context, doing: echo content dest ;# actual blob ln -s dest link;# link to blob ln -s broken foo ;# broken link ln -s out ../foo ;# out-of-tree link git add . git commit -m foo for i in link broken out; do echo HEAD^{resolve}:$i done | git cat-file --batch=%(intreemode) %(size) would yield: (1) 100644 8 content (2) 04 3 foo (3) 04 6 ../foo where the left-margin numbers are for reference: 1. We dereference a real symlink, and pretend like we actually asked for its referent. 2. For a broken link, we can't dereference, so we return the link itself. You can tell by the mode, and the content tells you what would have been dereferenced. 3. Ditto for out-of-tree. Note that this would be the _raw_ symlink contents, not any kind of simplification (so if you asked for foo/bar/baz and it was ../../../../out, you would the full path with all those dots, not a simplified ../out, which I think is what you were trying to show in earlier examples). s/04/16/ I would think (if you really meant to expose a tree, write it as 4 instead, so that people will not get a wrong impression and reimplement a broken tree object encoding some popular Git hosting site broke their customer projects with ;-). I am not sure $treeish^{resolve} is a great syntax, but I like the concept and agree that it is a lot more sensible to handle this at the level of sha1_name.c layer than an ad-hoc solution in the cat-file layer. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: git cat-file --follow-symlinks?
On Wed, 2015-04-29 at 21:16 -0400, Jeff King wrote: On Wed, Apr 29, 2015 at 06:06:23PM -0700, David Turner wrote: 3. Ditto for out-of-tree. Note that this would be the _raw_ symlink contents, not any kind of simplification (so if you asked for foo/bar/baz and it was ../../../../out, you would the full path with all those dots, not a simplified ../out, which I think is what you were trying to show in earlier examples). Unfortunately, we need the simplified version, because we otherwise don't know what the ..s are relative to in the case of a link to a link: echo content dest ;# actual blob mkdir -p foo/bar ln -s foo/bar/baz fleem # in-tree link-to-link ln -s ../../../external foo/bar/baz # out-of-tree link If echo HEAD^{resolve}:fleem were to return ../../../external (after following the first symlink to the second), we would have lost information. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: git cat-file --follow-symlinks?
On Wed, 2015-04-29 at 20:37 -0400, Jeff King wrote: On Wed, Apr 29, 2015 at 07:11:50PM -0400, Jeff King wrote: Yeah, I agree if you let git punt on leaving the filesystem, most of the complicated problems go away. It still feels a bit more magical than I expect out of cat-file, and there are still corner cases (e.g., do we do cycle detection? Or just have a limit to the recursion depth?) I was pondering the magical above. I think what bugs me is that it seems like a feature that is implemented as part of one random bit of plumbing, but not available elsewhere. Conceptually, this is like peeling object names. You may give a tag name, but if you ask for a tree commit we will peel the tag to a commit, and the commit to a tree. This is sort of the same thing; you give a path within a tree, and we will peel until we hit a real non-symlink object. I don't know what the syntax would look like. To match foo^{tree} it would be something like: HEAD:foo/bar^{resolve} or something like that. Except that it is a bad idea to allow ^{} syntax on the right-hand side of a colon, as it is ambiguous with filenames that contain ^{resolve}. So it would have to look something like: HEAD^{resolve}:foo/bar which is a _little_ weird, but actually kind of makes sense. The resolve operation inherently is not just about the filename, but about uses HEAD^{tree} as the root context. So I dunno. This pushes the resolving logic even _lower_ in the stack than it would be in cat-file. So why do I like it more? Cognitive dissonance? I guess I the appeal to me is that it: 1. Makes the concept available more generally (you can rev-parse it, you can git show it, etc). It also lets you _name_ the object in question, so you can ask for other things besides it contents (like its name, its type, etc). 2. Positions it alongside other peeling name-resolution functions. Just to clarify: if you do git rev-parse, and the result is an out-of-tree symlink, you see /foo or ../foo instead of a sha? And if you git show it it says symlink HEAD:../foo? This seems totally reasonable to me, and solves my problem. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html