Re: [PATCH 0/5] Abide by our own rules regarding line endings

2017-05-08 Thread Johannes Schindelin
Hi Junio,

On Sun, 7 May 2017, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > Let me repeat myself:
> 
> Don't.

I had to. You did not understand me.

> Instead, read through what you are responding to the end before start
> typing a byte.  In case you didn't do that, in the end I agree with the
> direction of the series ;-).

And I am still convinced that you do not understand me. At least on the
point that I tried to repeat in a different way, in an attempt to help you
to understand me.

What makes me so convinced that you do not understand me? The part of your
mail that you assumed I did not read. (Disclaimer: I did read it.)

Ciao,
Dscho


Re: [PATCH 0/5] Abide by our own rules regarding line endings

2017-05-07 Thread Junio C Hamano
Johannes Schindelin  writes:

> Let me repeat myself:

Don't. Instead, read through what you are responding to the end
before start typing a byte.  In case you didn't do that, in the end
I agree with the direction of the series ;-).


Re: [PATCH 0/5] Abide by our own rules regarding line endings

2017-05-04 Thread Johannes Schindelin
Hi Junio,

On Thu, 4 May 2017, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > Well, a couple of comments about your comment:
> >
> > - we say "shell scripts", but we're sloppy there: they are "Unix shell
> >   scripts", as they are executed by Unix shells. As such, it is pretty
> >   obvious that they favor Unix line endings, right? And that they do not
> >   really handle anything else well, right?
> 
> Not really.  I would expect that
> 
>   cat A B C
> 
> with CRLF line endings (i.e. "cat A B C") on platforms whose
> eol convention is LF to barf due to the missing file whose name is
> "C", while on platforms whose eol convention is CRLF, I do
> expect the contents of file C comes at the end of the output.

Let me repeat myself: `cat` is a Unix utility. Unix line endings are
LF-only.

So when you say "platform whose eol convention is CRLF", you speak about a
platform that is not Unix.

If you want to run `cat` on Linux, you have to have a Unix-y environment.
Ergo: LF-only line endings.

The real problem arises only because I decided to ship Git for Windows
with a POSIX emulating environment to execute the Unix shell and Perl
scripts.

The real solution would have been to push harder for "builtinification",
but you know yourself how hard of an uphill effort that is, as the idea is
still prevalent that having a large part of Git be implemented as
shell/Perl scripts is not only okay, but even a Good Thing (and
portability is then a matter of making every contributor know what
constructs are portable and to avoid, say, Bashisms or GNU sed's options
that are incompatible with BSD sed's options).

Ciao,
Dscho


Re: [PATCH 0/5] Abide by our own rules regarding line endings

2017-05-04 Thread Junio C Hamano
Johannes Schindelin  writes:

> Well, a couple of comments about your comment:
>
> - we say "shell scripts", but we're sloppy there: they are "Unix shell
>   scripts", as they are executed by Unix shells. As such, it is pretty
>   obvious that they favor Unix line endings, right? And that they do not
>   really handle anything else well, right?

Not really.  I would expect that

cat A B C

with CRLF line endings (i.e. "cat A B C") on platforms whose
eol convention is LF to barf due to the missing file whose name is
"C", while on platforms whose eol convention is CRLF, I do
expect the contents of file C comes at the end of the output.

> - You try to say that it is not okay for shell scripts to be checked
>   out as LF-only when the platform convention for *text* files is CR/LF,
>   right? 

I didn't try to and I didn't say that, I hope.  I think it is not OK
for shell scripts to be checked out with  when the platform
convention for text is , though.  It may be possible for an
implementation on CRLF platform to tolerate missing  (i.e. a
file in LF convention--instead of treating a lone  as a non-eol
but just a regular "funny" byte, treat it as a normal eol), but that
would be a quality-of-implementation thing.  So when the platform
convention for text files is , I would have thought that it
was more natural to check out shell scripts as such.

However, I did not think things through when I said "I am not sure
if it is OK for shell scripts not to honor the platform convention,
though."  In a sense, the "cat A B C" example does not have to worry
about the platform convention issue very much.  In real scripts, we
have things like a string literal that spans lines (i.e. do we have
a LF, or a CRLF pair, in between these two lines?) and here
documents (ditto).  To handle these "correctly" while treating shell
scripts mere "text" files, a shell implementation on CRLF platform
has to accept CRLF, pretend as if it saw only LF, do all the
processing as if the eol convention were LF, and convert LF in the
output from the script to CRLF.  I think that is _too_ much to ask
for an implementation, and such an attempt would get corner cases
wrong.  IOW, I was not sure if it is OK for scripts not to honor the
platorm convention when I wrote the message you are responding to,
but I think it probably is not just OK but is the simplest/cleanest
for shell implementations not to honor the platform convention and
instead pretend that they always live in LF-only world.

And all of the above ultimately does not matter.  If the shell you
have on Windows cannot take CRLF files, then our shell script must
be checked out as LF files even on Windows.  My "I am not sure" was
mostly from not knowing how ingrained that "cannot take CRLF files"
was (i.e. I didn't know if there may be some knobs similar to
core.crlf we have that you can toggle for your shell to honor the
platform convention).




Re: [PATCH 0/5] Abide by our own rules regarding line endings

2017-05-03 Thread Junio C Hamano
Johannes Schindelin  writes:

> For starters, those files include shell scripts: the most prevalent
> shell interpreter in use (and certainly used in Git for Windows) is
> Bash, and Bash does not handle CR/LF line endings gracefully.

Good to know.  I am not sure if it is OK for shell scripts not to
honor the platform convention, though.

Stated from the opposite angle, I would not be surprised if your
shell scripts do not work on Linux if you set core.autocrlf to true.
Git may honor it, but shells on Linux (or BSD for that matter) do
not pay attention to core.autocrlf and they are within their rights
to complain on an extra CR at the end of the line.  IOW, I would
doubt that it should be our goal to set core.autocrlf on a platform
whose native line endings is LF and make the tests to pass.

> Related to shell scripts: when generating common-cmds.h, we use tools
> that generally operate on the assumption that input and output
> deliminate their lines using LF-only line endings. Consequently, they
> would happily copy the CR byte verbatim into the strings in
> common-cmds.h, which in turn makes the C preprocessor barf (that
> interprets them as MacOS-style line endings).

This indeed is a problem.  "add\r" is not a name of a common
command, obviously, regardless of how the text file that lists the
names of the commands is encoded.  I am undecided if it is a problem
in the source text (i.e. command-list.txt is not a platform neutral
"text" but has to be encoded with LF endings) or the bug in the
tools used in the generate-cmdlist.sh script, though.  Shouldn't the
tools be aware of the platform convention of what text files are and
how their eol looks like?




Re: [PATCH 0/5] Abide by our own rules regarding line endings

2017-05-03 Thread Johannes Schindelin
Hi Jonathan,

On Tue, 2 May 2017, Jonathan Nieder wrote:

> Johannes Schindelin wrote:
> 
> > Over the past decade, there have been a couple of attempts to remedy the
> [...]
> 
> I'm intentionally skimming this cover letter, since anything important
> it says needs to also be in the commit messages.

Sure, makes sense. I tried to do that, too, before sending out my patch
series.

> [...]
> > Without these fixes, Git will fail to build and pass the test suite, as
> > can be verified even on Linux using this cadence:
> >
> > git config core.autocrlf true
> > rm .git/index && git stash
> > make DEVELOPER=1 -j15 test
> 
> This should go in a commit message (or perhaps in all of them).

Hmm, okay. I wanted to keep it out of them, as commit messages are (in my
mind) more about the why?, and a little less about the how? when not
obvious from the diff.

I added a variation to the first patch (because the tests would still
fail, I skipped the `test` from the `make` invocation) and the unmodified
cadence to the "Fix the remaining tests..." patch.

> [...]
> > Johannes Schindelin (5):
> >   Fix build with core.autocrlf=true
> >   git-new-workdir: mark script as LF-only
> >   completion: mark bash script as LF-only
> >   Fix the remaining tests that failed with core.autocrlf=true
> >   t4051: mark supporting files as requiring LF-only line endings
> 
> With or without that change,
> Reviewed-by: Jonathan Nieder 

Thanks.

I added that footer to the patches (not to the two new ones, though).

> The t/README bit in patch 4/5 is sad (I want to be able to open
> t/README using an old version of Notepad) but changing that test to
> use another file seems out of scope for what this series does.

Yes, it is sad. Maybe I should list the tests that do this (use files
outside any t/ directory):

t4003-diff-rename-1.sh (uses t/diff-lib/COPYING)
t4005-diff-rename-2.sh (uses t/diff-lib/COPYING)
t4007-rename-3.sh (uses t/diff-lib/COPYING)
t4008-diff-break-rewrite.sh (uses t/diff-lib/README and t/diff-lib/COPYING)
t4009-diff-rename-4.sh (uses t/diff-lib/COPYING)
t4022-diff-rewrite.sh (uses COPYING)
t4023-diff-rename-typechange.sh (uses COPYING)
t7001-mv.sh (uses README.md (!!!) and COPYING)
t7060-wtstatus.sh (uses t/README)
t7101-reset-empty-subdirs.sh (uses COPYING)

Note most of these tests may *use* those files, but do *not* assume that
they have Unix line endings! Only a couple test compare SHA-1s to
hardcoded values (which, if you ask me, is a bit fragile, given that files
outside the tests' control are used).

Interesting side note: t0022-crlf-rename.sh copies *itself* to the trash*
directory where it is then used to perform tests. So while this test uses
"an outside file", that file happens to be a .sh file which we already
have to mark LF-only for different reasons (Bash likes its input
CR/LF-free).

Another interesting side note: the convention appears to dictate that
supporting files should be either generated in the test script itself, or
be committed into t/t/ directories (where  matches the respective
test script's number, or reuses another test script's supporting files). A
notable exception is t3901 which has the supporting files t3901-8859-1.txt
and t3901-utf8.txt. I would wageer that this is just a remnant of ancient
times before the current convention, judging by the date of the commit
that added these files: a731ec5eb82 (t3901: test "format-patch | am" pipe
with i18n, 2007-01-13). The scripts t0203-gettext-setlocale-sanity.sh,
t9350-fast-export.sh and t9500-gitweb-standalone-no-errors.sh merely reuse
those files, and when those scripts started using those files, they did
not change their location. I made this move a preparatory step in this
patch series.

Back to the question what to do about those tests that make improper
assumptions about line endings of files outside the tests' realm: I think
I can do this more cleverly, as it would appear that only four test
scripts make hard assumptions about the exact byte sequence of the COPYING
file, and I simply turned those `cp` (and even `cat`!) calls into `tr -d
"\015"` calls.

I will Cc: you on v2.

Ciao,
Dscho


Re: [PATCH 0/5] Abide by our own rules regarding line endings

2017-05-02 Thread Jonathan Nieder
Hi,

Johannes Schindelin wrote:

> Over the past decade, there have been a couple of attempts to remedy the
[...]

I'm intentionally skimming this cover letter, since anything important
it says needs to also be in the commit messages.

[...]
> Without these fixes, Git will fail to build and pass the test suite, as
> can be verified even on Linux using this cadence:
>
>   git config core.autocrlf true
>   rm .git/index && git stash
>   make DEVELOPER=1 -j15 test

This should go in a commit message (or perhaps in all of them).

[...]
> Johannes Schindelin (5):
>   Fix build with core.autocrlf=true
>   git-new-workdir: mark script as LF-only
>   completion: mark bash script as LF-only
>   Fix the remaining tests that failed with core.autocrlf=true
>   t4051: mark supporting files as requiring LF-only line endings

With or without that change,
Reviewed-by: Jonathan Nieder 

The t/README bit in patch 4/5 is sad (I want to be able to open
t/README using an old version of Notepad) but changing that test to
use another file seems out of scope for what this series does.

Thanks,
Jonathan


[PATCH 0/5] Abide by our own rules regarding line endings

2017-05-02 Thread Johannes Schindelin
Over the past decade, there have been a couple of attempts to remedy the
situation regarding line endings (Windows/DOS line endings are
traditionally CR/LF even if many current applications can handle LF,
too, Linux/MacOSX/*BSD/Unix uses LF-only line handlings, and old MacOS
software used to use CR-only line endings).

The current idea seems to be that the default line endings differ
depending on the platform, and for files that should be exempt from that
default, we strongly recommend using .gitattributes to define what the
source code requires.

It is time to heed our own recommendation and mark the files that
require LF-only line endings in our very own .gitattributes.

For starters, those files include shell scripts: the most prevalent
shell interpreter in use (and certainly used in Git for Windows) is
Bash, and Bash does not handle CR/LF line endings gracefully.

There are even two shell scripts that are used in the test suite even if
they are not technically supposed to be part of core Git, as indicated
by their habitat inside contrib/: git-new-workdir and
git-completion.bash.

Related to shell scripts: when generating common-cmds.h, we use tools
that generally operate on the assumption that input and output
deliminate their lines using LF-only line endings. Consequently, they
would happily copy the CR byte verbatim into the strings in
common-cmds.h, which in turn makes the C preprocessor barf (that
interprets them as MacOS-style line endings).

Further, the most obvious required fixes concern tests' support files
committed verbatim, to be compared to Git's output, which is always
LF-only.

There are a few SVN dump files, too, supporting the Subversion-related
tests, requiring LF-only line endings.

Finally, the test suite makes use of text files that are not obviously
supporting tests, such as t/README, comparing them to LF-only versions
(and consequently failing if the files have been checked out with CR/LF
line endings). Therefore we need to mark those as LF-only in the
.gitattributes, too.

Without these fixes, Git will fail to build and pass the test suite, as
can be verified even on Linux using this cadence:

git config core.autocrlf true
rm .git/index && git stash
make DEVELOPER=1 -j15 test

Note: I separated out the change marking t/t4051/* as LF-only into an
individual commit for one reason: it would appear that the test passes
if checked out using core.autocrlf=true on Linux, but on Windows the
test fails. In that respect, this test is special, as the other changes
can be easily validated even without using Windows.


Johannes Schindelin (5):
  Fix build with core.autocrlf=true
  git-new-workdir: mark script as LF-only
  completion: mark bash script as LF-only
  Fix the remaining tests that failed with core.autocrlf=true
  t4051: mark supporting files as requiring LF-only line endings

 .gitattributes|  8 +++-
 contrib/completion/.gitattributes |  1 +
 contrib/workdir/.gitattributes|  1 +
 git-gui/.gitattributes|  1 +
 t/.gitattributes  | 29 -
 5 files changed, 38 insertions(+), 2 deletions(-)
 create mode 100644 contrib/completion/.gitattributes
 create mode 100644 contrib/workdir/.gitattributes


base-commit: d2bbb7c2bcf6e77ebfcabf4e12110fe6d5c91de6
Published-As: https://github.com/dscho/git/releases/tag/lf-attrs-v1
Fetch-It-Via: git fetch https://github.com/dscho/git lf-attrs-v1

-- 
2.12.2.windows.2.800.gede8f145e06