Re: git smudge filter fails
Stephen Mortonwrites: > It's perhaps beyond the scope of my original question, but for > situations where I need a "last change date" embedded in a file (e.g. > because a protocol standard requires it), is there any recommended way > to do so? We've the hard way that hardcoding makes > merging/cherry-picking a bit of a nightmare and should be avoided. Does that "last change date" have to be embedded in a file with other stuff in there, or can it be a standalone file by itself (which may be used by other things via linking or inclusion)? If it can be a standalone file, a custom ll-merge driver that knows how yoru datestring looks like and takes the later of the versions in the two branches being merged would not be too hard to write to eliminate the "nightmare", I would think. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git smudge filter fails
On Thu, Mar 10, 2016 at 5:04 PM, Junio C Hamanowrote: > Jeff King writes: > >> On Thu, Mar 10, 2016 at 09:45:19AM -0500, Stephen Morton wrote: >> >>> I am a bit confused because this is basically the example used in >>> ProGit [1] and it is fundamentally broken. In fact, if I understand >>> correctly, this means that smudge filters cannot be relied upon to >>> provide any 'keyword expansion' type tasks because they will all by >>> nature have to query the file with 'git log'. >> >> Interesting. Perhaps I am missing something (I am far from an expert in >> clean/smudge filters, which I do not generally use myself), but the >> example in ProGit looks kind of bogus to me. I don't think it ever would >> have worked reliably, under any version of git. >> >>> (Note that although in my example I used 'git checkout', with an only >>> slightly more complicated example I can make it fail on 'git pull' >>> which is perhaps a much more realistic use case. That was probably >>> implied in your answer, I just wanted to mention it.) >> >> Yeah, I think the issue is basically the same for several commands which >> update the worktree and the HEAD. Most of them are going to do the >> worktree first. > > You can have a pair of branches A and B that have forked long time > ago, and have a path F that has been changed identically since they > forked, perhaps by cherry-picking the same change. This happens all > the time. > > If there were some mechanism that modifies the checked out version > of F with information that depends on the history that leads to A > (e.g. "which commit that is an ancestor of A last modified F?") > when you check out branch A, it will become invalid when you do "git > checkout B", because the path F will not change because they are the > same between the branches. In short, CVS $Id$-style substitutions > that depend on the history fundamentally does not work, unless you > are willing to always rewrite the whole working tree every time you > switch branches. > > The smudge and clean filters are given _only_ the blob contents and > nothing else, not "which commit (or tree) the blob is taken from", > not "which path this blob sits in that tree-ish", not "what branch > am I on" and this is a very much deliberate design decision made in > order to avoid leading people to a misguided attempt to mimick CVS > $Id$-style substitutions. > I will raise an Issue with ProGit. It's perhaps beyond the scope of my original question, but for situations where I need a "last change date" embedded in a file (e.g. because a protocol standard requires it), is there any recommended way to do so? We've the hard way that hardcoding makes merging/cherry-picking a bit of a nightmare and should be avoided. Is a post-checkout hook the way to go? I've actually found the smudge filter to be very slow for this application as each file is processed in series; a post-commit hook that could operate on files in parallel would likely be substantially faster. Stephen (Sorry about the earlier top-posting. I didn't realize what gmail was doing until after it had happened.) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git smudge filter fails
Jeff Kingwrites: > On Thu, Mar 10, 2016 at 09:45:19AM -0500, Stephen Morton wrote: > >> I am a bit confused because this is basically the example used in >> ProGit [1] and it is fundamentally broken. In fact, if I understand >> correctly, this means that smudge filters cannot be relied upon to >> provide any 'keyword expansion' type tasks because they will all by >> nature have to query the file with 'git log'. > > Interesting. Perhaps I am missing something (I am far from an expert in > clean/smudge filters, which I do not generally use myself), but the > example in ProGit looks kind of bogus to me. I don't think it ever would > have worked reliably, under any version of git. > >> (Note that although in my example I used 'git checkout', with an only >> slightly more complicated example I can make it fail on 'git pull' >> which is perhaps a much more realistic use case. That was probably >> implied in your answer, I just wanted to mention it.) > > Yeah, I think the issue is basically the same for several commands which > update the worktree and the HEAD. Most of them are going to do the > worktree first. You can have a pair of branches A and B that have forked long time ago, and have a path F that has been changed identically since they forked, perhaps by cherry-picking the same change. This happens all the time. If there were some mechanism that modifies the checked out version of F with information that depends on the history that leads to A (e.g. "which commit that is an ancestor of A last modified F?") when you check out branch A, it will become invalid when you do "git checkout B", because the path F will not change because they are the same between the branches. In short, CVS $Id$-style substitutions that depend on the history fundamentally does not work, unless you are willing to always rewrite the whole working tree every time you switch branches. The smudge and clean filters are given _only_ the blob contents and nothing else, not "which commit (or tree) the blob is taken from", not "which path this blob sits in that tree-ish", not "what branch am I on" and this is a very much deliberate design decision made in order to avoid leading people to a misguided attempt to mimick CVS $Id$-style substitutions. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git smudge filter fails
On Thu, Mar 10, 2016 at 09:45:19AM -0500, Stephen Morton wrote: > I am a bit confused because this is basically the example used in > ProGit [1] and it is fundamentally broken. In fact, if I understand > correctly, this means that smudge filters cannot be relied upon to > provide any 'keyword expansion' type tasks because they will all by > nature have to query the file with 'git log'. Interesting. Perhaps I am missing something (I am far from an expert in clean/smudge filters, which I do not generally use myself), but the example in ProGit looks kind of bogus to me. I don't think it ever would have worked reliably, under any version of git. > (Note that although in my example I used 'git checkout', with an only > slightly more complicated example I can make it fail on 'git pull' > which is perhaps a much more realistic use case. That was probably > implied in your answer, I just wanted to mention it.) Yeah, I think the issue is basically the same for several commands which update the worktree and the HEAD. Most of them are going to do the worktree first. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git smudge filter fails
I am a bit confused because this is basically the example used in ProGit [1] and it is fundamentally broken. In fact, if I understand correctly, this means that smudge filters cannot be relied upon to provide any 'keyword expansion' type tasks because they will all by nature have to query the file with 'git log'. (Note that although in my example I used 'git checkout', with an only slightly more complicated example I can make it fail on 'git pull' which is perhaps a much more realistic use case. That was probably implied in your answer, I just wanted to mention it.) Steve [1] https://git-scm.com/book/en/v2/Customizing-Git-Git-Attributes On Wed, Mar 9, 2016 at 8:59 PM, Jeff Kingwrote: > On Wed, Mar 09, 2016 at 01:29:31PM -0500, Stephen Morton wrote: > >> git config --local filter.dater.smudge 'myDate=`git log >> --pretty=format:"%cd" --date=iso -1 -- %f`; sed -e >> "s/\(\\$\)Date[^\\$]*\\$/\1Date: $myDate \\$/g"' > > Your filter is running "git log" without a revision parameter, which > means it is looking at HEAD. > > And here > >> git checkout no_foo >> git checkout master >> cat foo.c >> #observe keyword expansion lost > > You are expecting this second one to do: > > 1. Switch HEAD to "master". > > 2. Checkout files which need updating. Looking at HEAD in your filter > then examines "master", and you see the commit timestamp of the > destination. > > But that isn't how it is implemented. Checkout will handle the file > checkout _first_, as that is the part that is likely to run into > problems (e.g., rejecting a switch because it would lose changes in the > working tree). Only at the very end, after the change to the working > tree has succeeded, do we update HEAD. > > I think the order you are expecting is conceptually cleaner, but I don't > think we would want to switch it around (for reasons of efficiency and > robustness). And I don't think we would want to make a promise about the > ordering to callers either way, as it binds our implementation. > > So is there a way to reliably know the destination of a checkout? My > first thought was that we could add a placeholder similar to "%f" that > your filter could use. I'm not sure how we would handle the corner cases > there, though (e.g., do we always have a "destination" to report? If > not, what do we give the script?). > > I suspect you could also hack something together with a post-checkout > script, though it would probably be a lot less efficient (and might also > have some weird corner cases). > > -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git smudge filter fails
On Wed, Mar 09, 2016 at 01:29:31PM -0500, Stephen Morton wrote: > git config --local filter.dater.smudge 'myDate=`git log > --pretty=format:"%cd" --date=iso -1 -- %f`; sed -e > "s/\(\\$\)Date[^\\$]*\\$/\1Date: $myDate \\$/g"' Your filter is running "git log" without a revision parameter, which means it is looking at HEAD. And here > git checkout no_foo > git checkout master > cat foo.c > #observe keyword expansion lost You are expecting this second one to do: 1. Switch HEAD to "master". 2. Checkout files which need updating. Looking at HEAD in your filter then examines "master", and you see the commit timestamp of the destination. But that isn't how it is implemented. Checkout will handle the file checkout _first_, as that is the part that is likely to run into problems (e.g., rejecting a switch because it would lose changes in the working tree). Only at the very end, after the change to the working tree has succeeded, do we update HEAD. I think the order you are expecting is conceptually cleaner, but I don't think we would want to switch it around (for reasons of efficiency and robustness). And I don't think we would want to make a promise about the ordering to callers either way, as it binds our implementation. So is there a way to reliably know the destination of a checkout? My first thought was that we could add a placeholder similar to "%f" that your filter could use. I'm not sure how we would handle the corner cases there, though (e.g., do we always have a "destination" to report? If not, what do we give the script?). I suspect you could also hack something together with a post-checkout script, though it would probably be a lot less efficient (and might also have some weird corner cases). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
git smudge filter fails
A git smudge filter, at least one that relies on the results from 'git log' does not seem to work on file A when doing a 'git update' from a revision where file A doesn't exist to a revision where it does exist. Below is a simple recipe to reproduce. This appears to me to be a bug. If not, why is it expected and is there anything I can do to work around this behaviour? Steve mkdir git_test cd git_test/ git init . touch bar.c git add . git commit -am "Initial commit. foo.c not here yet." git tag no_foo touch foo.c git add . git commit -am "Add foo, no content" echo 'Date is $Date$' >> foo.c git commit -am "Add date to foo.c" echo 'foo.c filter=dater' > .git/info/attributes git config --local filter.dater.smudge 'myDate=`git log --pretty=format:"%cd" --date=iso -1 -- %f`; sed -e "s/\(\\$\)Date[^\\$]*\\$/\1Date: $myDate \\$/g"' git config --local filter.dater.clean 'sed -e "s/\(\\$\)Date[^\\$]*\\$/\1Date\\$/g"' rm -f foo.c git checkout -- foo.c cat foo.c # observe keyword expansion git checkout no_foo git checkout master cat foo.c #observe keyword expansion lost -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html