Re: [PATCH 0/5] Abide by our own rules regarding line endings
Hi Junio, On Sun, 7 May 2017, Junio C Hamano wrote: > Johannes Schindelinwrites: > > > Let me repeat myself: > > Don't. I had to. You did not understand me. > Instead, read through what you are responding to the end before start > typing a byte. In case you didn't do that, in the end I agree with the > direction of the series ;-). And I am still convinced that you do not understand me. At least on the point that I tried to repeat in a different way, in an attempt to help you to understand me. What makes me so convinced that you do not understand me? The part of your mail that you assumed I did not read. (Disclaimer: I did read it.) Ciao, Dscho
Re: [PATCH 0/5] Abide by our own rules regarding line endings
Johannes Schindelinwrites: > Let me repeat myself: Don't. Instead, read through what you are responding to the end before start typing a byte. In case you didn't do that, in the end I agree with the direction of the series ;-).
Re: [PATCH 0/5] Abide by our own rules regarding line endings
Hi Junio, On Thu, 4 May 2017, Junio C Hamano wrote: > Johannes Schindelinwrites: > > > Well, a couple of comments about your comment: > > > > - we say "shell scripts", but we're sloppy there: they are "Unix shell > > scripts", as they are executed by Unix shells. As such, it is pretty > > obvious that they favor Unix line endings, right? And that they do not > > really handle anything else well, right? > > Not really. I would expect that > > cat A B C > > with CRLF line endings (i.e. "cat A B C") on platforms whose > eol convention is LF to barf due to the missing file whose name is > "C", while on platforms whose eol convention is CRLF, I do > expect the contents of file C comes at the end of the output. Let me repeat myself: `cat` is a Unix utility. Unix line endings are LF-only. So when you say "platform whose eol convention is CRLF", you speak about a platform that is not Unix. If you want to run `cat` on Linux, you have to have a Unix-y environment. Ergo: LF-only line endings. The real problem arises only because I decided to ship Git for Windows with a POSIX emulating environment to execute the Unix shell and Perl scripts. The real solution would have been to push harder for "builtinification", but you know yourself how hard of an uphill effort that is, as the idea is still prevalent that having a large part of Git be implemented as shell/Perl scripts is not only okay, but even a Good Thing (and portability is then a matter of making every contributor know what constructs are portable and to avoid, say, Bashisms or GNU sed's options that are incompatible with BSD sed's options). Ciao, Dscho
Re: [PATCH 0/5] Abide by our own rules regarding line endings
Johannes Schindelinwrites: > Well, a couple of comments about your comment: > > - we say "shell scripts", but we're sloppy there: they are "Unix shell > scripts", as they are executed by Unix shells. As such, it is pretty > obvious that they favor Unix line endings, right? And that they do not > really handle anything else well, right? Not really. I would expect that cat A B C with CRLF line endings (i.e. "cat A B C") on platforms whose eol convention is LF to barf due to the missing file whose name is "C", while on platforms whose eol convention is CRLF, I do expect the contents of file C comes at the end of the output. > - You try to say that it is not okay for shell scripts to be checked > out as LF-only when the platform convention for *text* files is CR/LF, > right? I didn't try to and I didn't say that, I hope. I think it is not OK for shell scripts to be checked out with when the platform convention for text is , though. It may be possible for an implementation on CRLF platform to tolerate missing (i.e. a file in LF convention--instead of treating a lone as a non-eol but just a regular "funny" byte, treat it as a normal eol), but that would be a quality-of-implementation thing. So when the platform convention for text files is , I would have thought that it was more natural to check out shell scripts as such. However, I did not think things through when I said "I am not sure if it is OK for shell scripts not to honor the platform convention, though." In a sense, the "cat A B C" example does not have to worry about the platform convention issue very much. In real scripts, we have things like a string literal that spans lines (i.e. do we have a LF, or a CRLF pair, in between these two lines?) and here documents (ditto). To handle these "correctly" while treating shell scripts mere "text" files, a shell implementation on CRLF platform has to accept CRLF, pretend as if it saw only LF, do all the processing as if the eol convention were LF, and convert LF in the output from the script to CRLF. I think that is _too_ much to ask for an implementation, and such an attempt would get corner cases wrong. IOW, I was not sure if it is OK for scripts not to honor the platorm convention when I wrote the message you are responding to, but I think it probably is not just OK but is the simplest/cleanest for shell implementations not to honor the platform convention and instead pretend that they always live in LF-only world. And all of the above ultimately does not matter. If the shell you have on Windows cannot take CRLF files, then our shell script must be checked out as LF files even on Windows. My "I am not sure" was mostly from not knowing how ingrained that "cannot take CRLF files" was (i.e. I didn't know if there may be some knobs similar to core.crlf we have that you can toggle for your shell to honor the platform convention).
Re: [PATCH 0/5] Abide by our own rules regarding line endings
Johannes Schindelinwrites: > For starters, those files include shell scripts: the most prevalent > shell interpreter in use (and certainly used in Git for Windows) is > Bash, and Bash does not handle CR/LF line endings gracefully. Good to know. I am not sure if it is OK for shell scripts not to honor the platform convention, though. Stated from the opposite angle, I would not be surprised if your shell scripts do not work on Linux if you set core.autocrlf to true. Git may honor it, but shells on Linux (or BSD for that matter) do not pay attention to core.autocrlf and they are within their rights to complain on an extra CR at the end of the line. IOW, I would doubt that it should be our goal to set core.autocrlf on a platform whose native line endings is LF and make the tests to pass. > Related to shell scripts: when generating common-cmds.h, we use tools > that generally operate on the assumption that input and output > deliminate their lines using LF-only line endings. Consequently, they > would happily copy the CR byte verbatim into the strings in > common-cmds.h, which in turn makes the C preprocessor barf (that > interprets them as MacOS-style line endings). This indeed is a problem. "add\r" is not a name of a common command, obviously, regardless of how the text file that lists the names of the commands is encoded. I am undecided if it is a problem in the source text (i.e. command-list.txt is not a platform neutral "text" but has to be encoded with LF endings) or the bug in the tools used in the generate-cmdlist.sh script, though. Shouldn't the tools be aware of the platform convention of what text files are and how their eol looks like?
Re: [PATCH 0/5] Abide by our own rules regarding line endings
Hi Jonathan, On Tue, 2 May 2017, Jonathan Nieder wrote: > Johannes Schindelin wrote: > > > Over the past decade, there have been a couple of attempts to remedy the > [...] > > I'm intentionally skimming this cover letter, since anything important > it says needs to also be in the commit messages. Sure, makes sense. I tried to do that, too, before sending out my patch series. > [...] > > Without these fixes, Git will fail to build and pass the test suite, as > > can be verified even on Linux using this cadence: > > > > git config core.autocrlf true > > rm .git/index && git stash > > make DEVELOPER=1 -j15 test > > This should go in a commit message (or perhaps in all of them). Hmm, okay. I wanted to keep it out of them, as commit messages are (in my mind) more about the why?, and a little less about the how? when not obvious from the diff. I added a variation to the first patch (because the tests would still fail, I skipped the `test` from the `make` invocation) and the unmodified cadence to the "Fix the remaining tests..." patch. > [...] > > Johannes Schindelin (5): > > Fix build with core.autocrlf=true > > git-new-workdir: mark script as LF-only > > completion: mark bash script as LF-only > > Fix the remaining tests that failed with core.autocrlf=true > > t4051: mark supporting files as requiring LF-only line endings > > With or without that change, > Reviewed-by: Jonathan NiederThanks. I added that footer to the patches (not to the two new ones, though). > The t/README bit in patch 4/5 is sad (I want to be able to open > t/README using an old version of Notepad) but changing that test to > use another file seems out of scope for what this series does. Yes, it is sad. Maybe I should list the tests that do this (use files outside any t/ directory): t4003-diff-rename-1.sh (uses t/diff-lib/COPYING) t4005-diff-rename-2.sh (uses t/diff-lib/COPYING) t4007-rename-3.sh (uses t/diff-lib/COPYING) t4008-diff-break-rewrite.sh (uses t/diff-lib/README and t/diff-lib/COPYING) t4009-diff-rename-4.sh (uses t/diff-lib/COPYING) t4022-diff-rewrite.sh (uses COPYING) t4023-diff-rename-typechange.sh (uses COPYING) t7001-mv.sh (uses README.md (!!!) and COPYING) t7060-wtstatus.sh (uses t/README) t7101-reset-empty-subdirs.sh (uses COPYING) Note most of these tests may *use* those files, but do *not* assume that they have Unix line endings! Only a couple test compare SHA-1s to hardcoded values (which, if you ask me, is a bit fragile, given that files outside the tests' control are used). Interesting side note: t0022-crlf-rename.sh copies *itself* to the trash* directory where it is then used to perform tests. So while this test uses "an outside file", that file happens to be a .sh file which we already have to mark LF-only for different reasons (Bash likes its input CR/LF-free). Another interesting side note: the convention appears to dictate that supporting files should be either generated in the test script itself, or be committed into t/t/ directories (where matches the respective test script's number, or reuses another test script's supporting files). A notable exception is t3901 which has the supporting files t3901-8859-1.txt and t3901-utf8.txt. I would wageer that this is just a remnant of ancient times before the current convention, judging by the date of the commit that added these files: a731ec5eb82 (t3901: test "format-patch | am" pipe with i18n, 2007-01-13). The scripts t0203-gettext-setlocale-sanity.sh, t9350-fast-export.sh and t9500-gitweb-standalone-no-errors.sh merely reuse those files, and when those scripts started using those files, they did not change their location. I made this move a preparatory step in this patch series. Back to the question what to do about those tests that make improper assumptions about line endings of files outside the tests' realm: I think I can do this more cleverly, as it would appear that only four test scripts make hard assumptions about the exact byte sequence of the COPYING file, and I simply turned those `cp` (and even `cat`!) calls into `tr -d "\015"` calls. I will Cc: you on v2. Ciao, Dscho
Re: [PATCH 0/5] Abide by our own rules regarding line endings
Hi, Johannes Schindelin wrote: > Over the past decade, there have been a couple of attempts to remedy the [...] I'm intentionally skimming this cover letter, since anything important it says needs to also be in the commit messages. [...] > Without these fixes, Git will fail to build and pass the test suite, as > can be verified even on Linux using this cadence: > > git config core.autocrlf true > rm .git/index && git stash > make DEVELOPER=1 -j15 test This should go in a commit message (or perhaps in all of them). [...] > Johannes Schindelin (5): > Fix build with core.autocrlf=true > git-new-workdir: mark script as LF-only > completion: mark bash script as LF-only > Fix the remaining tests that failed with core.autocrlf=true > t4051: mark supporting files as requiring LF-only line endings With or without that change, Reviewed-by: Jonathan NiederThe t/README bit in patch 4/5 is sad (I want to be able to open t/README using an old version of Notepad) but changing that test to use another file seems out of scope for what this series does. Thanks, Jonathan
[PATCH 0/5] Abide by our own rules regarding line endings
Over the past decade, there have been a couple of attempts to remedy the situation regarding line endings (Windows/DOS line endings are traditionally CR/LF even if many current applications can handle LF, too, Linux/MacOSX/*BSD/Unix uses LF-only line handlings, and old MacOS software used to use CR-only line endings). The current idea seems to be that the default line endings differ depending on the platform, and for files that should be exempt from that default, we strongly recommend using .gitattributes to define what the source code requires. It is time to heed our own recommendation and mark the files that require LF-only line endings in our very own .gitattributes. For starters, those files include shell scripts: the most prevalent shell interpreter in use (and certainly used in Git for Windows) is Bash, and Bash does not handle CR/LF line endings gracefully. There are even two shell scripts that are used in the test suite even if they are not technically supposed to be part of core Git, as indicated by their habitat inside contrib/: git-new-workdir and git-completion.bash. Related to shell scripts: when generating common-cmds.h, we use tools that generally operate on the assumption that input and output deliminate their lines using LF-only line endings. Consequently, they would happily copy the CR byte verbatim into the strings in common-cmds.h, which in turn makes the C preprocessor barf (that interprets them as MacOS-style line endings). Further, the most obvious required fixes concern tests' support files committed verbatim, to be compared to Git's output, which is always LF-only. There are a few SVN dump files, too, supporting the Subversion-related tests, requiring LF-only line endings. Finally, the test suite makes use of text files that are not obviously supporting tests, such as t/README, comparing them to LF-only versions (and consequently failing if the files have been checked out with CR/LF line endings). Therefore we need to mark those as LF-only in the .gitattributes, too. Without these fixes, Git will fail to build and pass the test suite, as can be verified even on Linux using this cadence: git config core.autocrlf true rm .git/index && git stash make DEVELOPER=1 -j15 test Note: I separated out the change marking t/t4051/* as LF-only into an individual commit for one reason: it would appear that the test passes if checked out using core.autocrlf=true on Linux, but on Windows the test fails. In that respect, this test is special, as the other changes can be easily validated even without using Windows. Johannes Schindelin (5): Fix build with core.autocrlf=true git-new-workdir: mark script as LF-only completion: mark bash script as LF-only Fix the remaining tests that failed with core.autocrlf=true t4051: mark supporting files as requiring LF-only line endings .gitattributes| 8 +++- contrib/completion/.gitattributes | 1 + contrib/workdir/.gitattributes| 1 + git-gui/.gitattributes| 1 + t/.gitattributes | 29 - 5 files changed, 38 insertions(+), 2 deletions(-) create mode 100644 contrib/completion/.gitattributes create mode 100644 contrib/workdir/.gitattributes base-commit: d2bbb7c2bcf6e77ebfcabf4e12110fe6d5c91de6 Published-As: https://github.com/dscho/git/releases/tag/lf-attrs-v1 Fetch-It-Via: git fetch https://github.com/dscho/git lf-attrs-v1 -- 2.12.2.windows.2.800.gede8f145e06