Re: [PATCH 0/2] Re* Consequences of CRLF in index?

2017-10-31 Thread Jeff King
On Tue, Oct 31, 2017 at 09:41:25AM -0700, Stefan Beller wrote:

> On Mon, Oct 30, 2017 at 7:44 PM, Junio C Hamano  wrote:
> > Stefan Beller  writes:
> >
> >> (I note this as you regard your patches as a lunch time hack
> >> in the cooking email; I am serious about these patches though.)
> >
> > We do not want to touch borrowed code unnecessarily.  Are these
> > lines and bits hampering further progress we are actively trying to
> > make right now?
> 
> No they are not, you are correct.
> 
> I differ in opinion on 'borrowed code'. The latest release of xdiff
> (v0.23) was in Nov 13, 2008 according to 
> http://freecode.com/projects/xdiff-lib
> or on March 23, 2006 according to https://directory.fsf.org/wiki/LibXDiff
> and given that we incorporated so many changes already to xdiff,
> I would argue it is sufficiently different from the original, we'll probably
> never import another upstream version (if there will be a release at all).
> 
> So the code was rather taken and now we are the bag holders in
> maintaining it, so we can make it pretty even only for the sake of
> pleasing ourselves (or rather: not confusing ourselves with too
> many unused flags).

Yes, I'd agree. For what it's worth both sides of this argument played
out in my head when I saw your patches, and I ended up at the same "we
are the bag holders" place. And there's certainly precedent for touching
that code and cleaning it up to make it easier to work with (just look
at at "git log -p xdiff").

I don't know that I would support a full-scale rewrite (for the same
reason that I wouldn't on the rest of the code base -- avoiding churn).
But deleting unused bits seems like an easy win for readability.

-Peff


Re: [PATCH 0/2] Re* Consequences of CRLF in index?

2017-10-31 Thread Stefan Beller
On Mon, Oct 30, 2017 at 7:44 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> (I note this as you regard your patches as a lunch time hack
>> in the cooking email; I am serious about these patches though.)
>
> We do not want to touch borrowed code unnecessarily.  Are these
> lines and bits hampering further progress we are actively trying to
> make right now?

No they are not, you are correct.

I differ in opinion on 'borrowed code'. The latest release of xdiff
(v0.23) was in Nov 13, 2008 according to http://freecode.com/projects/xdiff-lib
or on March 23, 2006 according to https://directory.fsf.org/wiki/LibXDiff
and given that we incorporated so many changes already to xdiff,
I would argue it is sufficiently different from the original, we'll probably
never import another upstream version (if there will be a release at all).

So the code was rather taken and now we are the bag holders in
maintaining it, so we can make it pretty even only for the sake of
pleasing ourselves (or rather: not confusing ourselves with too
many unused flags).

Thanks,
Stefan


Re: [PATCH 0/2] Re* Consequences of CRLF in index?

2017-10-30 Thread Junio C Hamano
Stefan Beller  writes:

> (I note this as you regard your patches as a lunch time hack
> in the cooking email; I am serious about these patches though.)

We do not want to touch borrowed code unnecessarily.  Are these
lines and bits hampering further progress we are actively trying to
make right now?



Re: [PATCH 0/2] Re* Consequences of CRLF in index?

2017-10-30 Thread Stefan Beller
On Fri, Oct 27, 2017 at 10:07 AM, Stefan Beller  wrote:
>> Let's do this bit-shuffling as a preliminary clean-up.
>
> These 2 patches can go on top of that as well.

Actually these textually do not conflict with your patch,
and they can be picked independently, e.g. they could
go on top of sb/diff-color-moved-use-xdl-recmatch
as a diff cleanup.

(I note this as you regard your patches as a lunch time hack
in the cooking email; I am serious about these patches though.)

Thanks,
Stefan


[PATCH 0/2] Re* Consequences of CRLF in index?

2017-10-27 Thread Stefan Beller
> Let's do this bit-shuffling as a preliminary clean-up.

These 2 patches can go on top of that as well.

Thanks,
Stefan

Stefan Beller (2):
  xdiff/xdiff.h: remove unused flags
  xdiff/xdiffi.c: remove unneeded function declarations

 xdiff/xdiff.h  |  8 
 xdiff/xdiffi.c | 17 -
 2 files changed, 25 deletions(-)

-- 
2.15.0.rc2.443.gfcc3b81c0a



Re: Re* Consequences of CRLF in index?

2017-10-27 Thread Stefan Beller
On Thu, Oct 26, 2017 at 11:13 PM, Junio C Hamano  wrote:
> Subject: [PATCH] xdiff: reassign xpparm_t.flags bits
>
> We have packed the bits too tightly in such a way that it is not
> easy to add a new type of whitespace ignoring option, a new type
> of LCS algorithm, or a new type of post-cleanup heuristics.
>
> Reorder bits a bit to give room for these three classes of options
> to grow.
>
> While at it, add a comment in front of the bit definitions to
> clarify in which structure these defined bits may appear.
>
> Signed-off-by: Junio C Hamano 

Reviewed-by: Stefan Beller 

> ---
>  xdiff/xdiff.h | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
> index b090ad8eac..457cac32d8 100644
> --- a/xdiff/xdiff.h
> +++ b/xdiff/xdiff.h
> @@ -27,22 +27,24 @@
>  extern "C" {
>  #endif /* #ifdef __cplusplus */
>
> +/* xpparm_t.flags */
> +#define XDF_NEED_MINIMAL (1 << 0)
>
> -#define XDF_NEED_MINIMAL (1 << 1)

The whitespace flags are also stored in xpparm_t, though
these flags can also come in from other sources (e.g. we
just pass it in manually via the interface).

>  #define XDF_IGNORE_WHITESPACE (1 << 2)
>  #define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3)
>  #define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
>  #define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | 
> XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL)
>
> -#define XDF_PATIENCE_DIFF (1 << 5)
> -#define XDF_HISTOGRAM_DIFF (1 << 6)
> +#define XDF_IGNORE_BLANK_LINES (1 << 7)

Is XDF_IGNORE_BLANK_LINES a whitespace option, just
on the lines level instead of the inside one line? I think
it still makes sense to keep it here, slightly separated from
the IGNORE flags.

> +#define XDF_PATIENCE_DIFF (1 << 14)
> +#define XDF_HISTOGRAM_DIFF (1 << 15)
>  #define XDF_DIFF_ALGORITHM_MASK (XDF_PATIENCE_DIFF | XDF_HISTOGRAM_DIFF)
>  #define XDF_DIFF_ALG(x) ((x) & XDF_DIFF_ALGORITHM_MASK)
>
> -#define XDF_IGNORE_BLANK_LINES (1 << 7)
> -
> -#define XDF_INDENT_HEURISTIC (1 << 8)
> +#define XDF_INDENT_HEURISTIC (1 << 23)
>
> +/* xdemitconf_t.flags */
>  #define XDL_EMIT_FUNCNAMES (1 << 0)
>  #define XDL_EMIT_FUNCCONTEXT (1 << 2)

This looked like it was carefully crafted to avoid accidental bit overlap
with XDF_NEED_MINIMAL; but these are in different flag fields, this
should be fine.

Thanks,
Stefan


Re: Consequences of CRLF in index?

2017-10-27 Thread Johannes Schindelin
Hi Lars,

On Thu, 26 Oct 2017, Lars Schneider wrote:

> > On 26 Oct 2017, at 09:09, Johannes Sixt  wrote:
> > 
> > Am 25.10.2017 um 14:19 schrieb Johannes Schindelin:
> >> I envy you for the blessing of such a clean C++ source that you do not
> >> have any, say, Unix shell script in it. Try this, and weep:
> >>$ printf 'echo \\\r\n\t123\r\n' >a1
> >>$ sh a1
> >>a1: 2: a1: 123: not found
> > 
> > I was bitten by that, too. For this reason, I ensure that shell
> > scripts and Makefiles begin their life on Linux. Fortunately, modern
> > editors on Windows, includ^Wand vi, do not force CRLF line breaks, and

^^

This put a well-needed smile on my face. Thanks.

> > such files can be edited on Windows, too.
> 
> Wouldn't this kind of .gitattributes setup solve the problem?
> 
> * -text
> *.sh   text eol=lf

If you look at the commit I mentioned, you will see examples where it
breaks down: when using Unix shell scripts *without* .sh file extension.
Most notable offender: GIT-VERSION-GEN.

Ciao,
Dscho


Re* Consequences of CRLF in index?

2017-10-27 Thread Junio C Hamano
Junio C Hamano  writes:

> Stefan Beller  writes:
>
>> (1<<5) is taken twice now.
>
> Good eyes.  I think we use bits #1-#8 now (bit #0 is vacant, so are
> #9-#31).

Let's do this bit-shuffling as a preliminary clean-up.

-- >8 --
Subject: [PATCH] xdiff: reassign xpparm_t.flags bits

We have packed the bits too tightly in such a way that it is not
easy to add a new type of whitespace ignoring option, a new type
of LCS algorithm, or a new type of post-cleanup heuristics.

Reorder bits a bit to give room for these three classes of options
to grow.

While at it, add a comment in front of the bit definitions to
clarify in which structure these defined bits may appear.

Signed-off-by: Junio C Hamano 
---
 xdiff/xdiff.h | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index b090ad8eac..457cac32d8 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -27,22 +27,24 @@
 extern "C" {
 #endif /* #ifdef __cplusplus */
 
+/* xpparm_t.flags */
+#define XDF_NEED_MINIMAL (1 << 0)
 
-#define XDF_NEED_MINIMAL (1 << 1)
 #define XDF_IGNORE_WHITESPACE (1 << 2)
 #define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3)
 #define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
 #define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | 
XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL)
 
-#define XDF_PATIENCE_DIFF (1 << 5)
-#define XDF_HISTOGRAM_DIFF (1 << 6)
+#define XDF_IGNORE_BLANK_LINES (1 << 7)
+
+#define XDF_PATIENCE_DIFF (1 << 14)
+#define XDF_HISTOGRAM_DIFF (1 << 15)
 #define XDF_DIFF_ALGORITHM_MASK (XDF_PATIENCE_DIFF | XDF_HISTOGRAM_DIFF)
 #define XDF_DIFF_ALG(x) ((x) & XDF_DIFF_ALGORITHM_MASK)
 
-#define XDF_IGNORE_BLANK_LINES (1 << 7)
-
-#define XDF_INDENT_HEURISTIC (1 << 8)
+#define XDF_INDENT_HEURISTIC (1 << 23)
 
+/* xdemitconf_t.flags */
 #define XDL_EMIT_FUNCNAMES (1 << 0)
 #define XDL_EMIT_FUNCCONTEXT (1 << 2)
 
-- 
2.15.0-rc2-266-g8f92d095f4



Re: Consequences of CRLF in index?

2017-10-26 Thread Junio C Hamano
Ross Kabus  writes:

> Is "* -text" in any way different than "-text" (without the * asterisk)? All
> of my .gitattributes files have "-text" (no * asterisk) and I haven't noticed
> any difference but could I be missing something subtle?
>
> ~Ross

A line in the .gitattibute file is of the form

   ...

i.e. a pattern to match paths, with a list of attribute definitions.
The asterisk they are showing in their description is the 
part, i.e. "apply the '-text' thing to paths that match '*'", which
is equivalent to saying "set text attribute to false for all paths".



Re: Consequences of CRLF in index?

2017-10-26 Thread Ross Kabus
Is "* -text" in any way different than "-text" (without the * asterisk)? All
of my .gitattributes files have "-text" (no * asterisk) and I haven't noticed
any difference but could I be missing something subtle?

~Ross


Re: Consequences of CRLF in index?

2017-10-26 Thread Johannes Sixt

Thank you for the clarification, it's much appreciated.

-- Hannes



Re: Consequences of CRLF in index?

2017-10-26 Thread Jonathan Nieder
Johannes Sixt wrote:
> Am 26.10.2017 um 13:01 schrieb Lars Schneider:

>> * -text
>> *.sh   text eol=lf
>
> Why would that be necessary? I cannot have CRLF in shell scripts etc., not
> even on Windows. (And in addition I do not mind CR in C++ source code.) All
> I want is: Git, please, by all means, do not mess with my files, ever. Isn't
> the simplest way to achieve that to set *nothing* at all? Not even
> core.autocrlf?

That's why Lars suggests "* -text" in .gitattributes.  That way, if
some user of the repository *does* set core.autocrlf, you still get
the "do not mess with my files" semantics you want.

If you control the configuration of all users of the repository, you
don't need that, but it doesn't hurt, either.

With that "* -text", you do not need "*.sh text eol=lf", but maybe
you'd want it in order to get some warnings when someone changes the
line endings by mistake without running tests.  (Better to have a
culture of running tests, you might say.  Fair enough, but as with the
other .gitattributes line, it doesn't hurt.)

Jonathan


Re: Consequences of CRLF in index?

2017-10-26 Thread Johannes Sixt

Am 26.10.2017 um 13:01 schrieb Lars Schneider:

On 26 Oct 2017, at 09:09, Johannes Sixt  wrote:
Am 25.10.2017 um 14:19 schrieb Johannes Schindelin:

I envy you for the blessing of such a clean C++ source that you do not
have any, say, Unix shell script in it. Try this, and weep:
$ printf 'echo \\\r\n\t123\r\n' >a1
$ sh a1
a1: 2: a1: 123: not found


I was bitten by that, too. For this reason, I ensure that shell scripts and 
Makefiles begin their life on Linux. Fortunately, modern editors on Windows, 
includ^Wand vi, do not force CRLF line breaks, and such files can be edited on 
Windows, too.


Wouldn't this kind of .gitattributes setup solve the problem?


But there is no problem. Don't have CRs in shell scripts and be done 
with it.




* -text
*.sh   text eol=lf


Why would that be necessary? I cannot have CRLF in shell scripts etc., 
not even on Windows. (And in addition I do not mind CR in C++ source 
code.) All I want is: Git, please, by all means, do not mess with my 
files, ever. Isn't the simplest way to achieve that to set *nothing* at 
all? Not even core.autocrlf?


-- Hannes


Re: Consequences of CRLF in index?

2017-10-26 Thread Torsten Bögershausen
On Thu, Oct 26, 2017 at 01:01:25PM +0200, Lars Schneider wrote:
> 
> > On 26 Oct 2017, at 09:09, Johannes Sixt  wrote:
> > 
> > Am 25.10.2017 um 14:19 schrieb Johannes Schindelin:
> >> I envy you for the blessing of such a clean C++ source that you do not
> >> have any, say, Unix shell script in it. Try this, and weep:
> >>$ printf 'echo \\\r\n\t123\r\n' >a1
> >>$ sh a1
> >>a1: 2: a1: 123: not found
> > 
> > I was bitten by that, too. For this reason, I ensure that shell scripts and 
> > Makefiles begin their life on Linux. Fortunately, modern editors on 
> > Windows, includ^Wand vi, do not force CRLF line breaks, and such files can 
> > be edited on Windows, too.
> 
> Wouldn't this kind of .gitattributes setup solve the problem?
> 
> * -text
> *.sh   text eol=lf
> 

Yes, exactly. and for the snake-lovers:

*.py   text eol=lf


Re: Consequences of CRLF in index?

2017-10-26 Thread Torsten Bögershausen
On Wed, Oct 25, 2017 at 10:13:57AM -0700, Jonathan Nieder wrote:
> Hi again,
> 
> Lars Schneider wrote:
> >> On 24 Oct 2017, at 20:14, Jonathan Nieder  wrote:
> 
> >> In any event, you also probably want to declare what you're doing
> >> using .gitattributes.  By checking in the files as CRLF, you are
> >> declaring that you do *not* want Git to treat them as text files
> >> (i.e., you do not want Git to change the line endings), so something
> >> as simple as
> >>
> >>* -text
> >
> > That's sounds good. Does "-text" have any other implications?
> > For whatever reason I always thought this is the way to tell
> > Git that a particular file is binary with the implication that
> > Git should not attempt to diff it.
> 
> No other implications.  You're thinking of "-diff".  There is also a
> shortcut "binary" which simply means "-text -diff".

Not 100% the same, as far as I know.
"binary" means: Don't convert line endings, and there is now way to
do a readable diff.
The only thing to tell the user is: The binary blobs are different.

Then we have "text". The "old" version of "text" was "crlf", which
for some people was more intuitive, and less intuitive for others.
"* crlf" is the same as "* text" and means please convert line endings.
And yes, the file is still line oriented.
"* -crlf" means don't touch the line endings, the file is
line-orinted and diff and  merge will work.
"* -text" is the same as "* -crlf"

> 
> Ideas for wording improvements to gitattributes(5) on this subject?

None from me at the moment.

> 
> Thanks,
> Jonathan


Re: Consequences of CRLF in index?

2017-10-26 Thread Lars Schneider

> On 25 Oct 2017, at 19:13, Jonathan Nieder  wrote:
> 
> Hi again,
> 
> Lars Schneider wrote:
>>> On 24 Oct 2017, at 20:14, Jonathan Nieder  wrote:
> 
>>> In any event, you also probably want to declare what you're doing
>>> using .gitattributes.  By checking in the files as CRLF, you are
>>> declaring that you do *not* want Git to treat them as text files
>>> (i.e., you do not want Git to change the line endings), so something
>>> as simple as
>>> 
>>> * -text
>> 
>> That's sounds good. Does "-text" have any other implications?
>> For whatever reason I always thought this is the way to tell
>> Git that a particular file is binary with the implication that
>> Git should not attempt to diff it.
> 
> No other implications.  You're thinking of "-diff".  There is also a
> shortcut "binary" which simply means "-text -diff".

Yeah. Well, when I read "-text" then I think "no text" and that makes
me think "is binary".


> Ideas for wording improvements to gitattributes(5) on this subject?

I think the wording in the docs is good. It is just the "text" keyword
that confused me. Maybe this could have been names "eolnorm" and
"-eolnorm" or something. But it is too late for that now I guess :-)

Thanks,
Lars


Re: Consequences of CRLF in index?

2017-10-26 Thread Lars Schneider

> On 26 Oct 2017, at 09:09, Johannes Sixt  wrote:
> 
> Am 25.10.2017 um 14:19 schrieb Johannes Schindelin:
>> I envy you for the blessing of such a clean C++ source that you do not
>> have any, say, Unix shell script in it. Try this, and weep:
>>  $ printf 'echo \\\r\n\t123\r\n' >a1
>>  $ sh a1
>>  a1: 2: a1: 123: not found
> 
> I was bitten by that, too. For this reason, I ensure that shell scripts and 
> Makefiles begin their life on Linux. Fortunately, modern editors on Windows, 
> includ^Wand vi, do not force CRLF line breaks, and such files can be edited 
> on Windows, too.

Wouldn't this kind of .gitattributes setup solve the problem?

* -text
*.sh   text eol=lf

Thanks,
Lars



Re: Consequences of CRLF in index?

2017-10-26 Thread Johannes Sixt

Am 25.10.2017 um 14:19 schrieb Johannes Schindelin:

I envy you for the blessing of such a clean C++ source that you do not
have any, say, Unix shell script in it. Try this, and weep:

$ printf 'echo \\\r\n\t123\r\n' >a1

$ sh a1

a1: 2: a1: 123: not found


I was bitten by that, too. For this reason, I ensure that shell scripts 
and Makefiles begin their life on Linux. Fortunately, modern editors on 
Windows, includ^Wand vi, do not force CRLF line breaks, and such files 
can be edited on Windows, too.


Of course, I do not set core.autocrlf anywhere to avoid any changes 
behind my back.



For the same reason (Unix shell not handling CR/LF gracefull), I went
through that painful work that finally landed as 00ddc9d13ca (Fix build
with core.autocrlf=true, 2017-05-09).


That's much appreciated!

-- Hannes


Re: Consequences of CRLF in index?

2017-10-25 Thread Junio C Hamano
Stefan Beller  writes:

>> diff --git a/diff.c b/diff.c
>> index 6fd288420b..eeca0fd3b8 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -4202,7 +4202,8 @@ void diff_setup_done(struct diff_options *options)
>>
>> if (DIFF_XDL_TST(options, IGNORE_WHITESPACE) ||
>> DIFF_XDL_TST(options, IGNORE_WHITESPACE_CHANGE) ||
>> -   DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL))
>> +   DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL) ||
>> +   DIFF_XDL_TST(options, IGNORE_CR_AT_EOL))
>
> This highlights another part of the flag macros, that could be made nicer.
> All these flags combined are XDF_WHITESPACE_FLAGS, so this
> if could be written without the macros as
>
>   if (options->xdl_ops & XDF_WHITESPACE_FLAGS)

Yes, and I think the codepath that matters most already uses that
form.  Perhaps it is a good idea to do the rewrite without a macro
(XDF_WHITESPACE_FLAGS is already a macro enough).

> (1<<5) is taken twice now.

Good eyes.  I think we use bits #1-#8 now (bit #0 is vacant, so are
#9-#31).

>> diff --git a/xdiff/xutils.c b/xdiff/xutils.c
>> index 04d7b32e4e..8720e272b9 100644
>> --- a/xdiff/xutils.c
>> +++ b/xdiff/xutils.c
>> @@ -156,6 +156,21 @@ int xdl_blankline(const char *line, long size, long 
>> flags)
>> return (i == size);
>>  }
>>
>> +/*
>> + * Have we eaten everything on the line, except for an optional
>> + * CR at the very end?
>> + */
>> +static int ends_with_optional_cr(const char *l, long s, long i)
>> +{
>> +   if (s && l[s-1] == '\n')
>> +   s--;
>
> so first we cut off the '\n',

> That seems correct after some thought;

I added the "trim the LF at the end" to the beginning of the
function at the last minute to cheat, and it is debatable if it is
entirely correct on an incomplete line.

The byte at the end of line, i.e. l[s-1], could be either '\n' or
something else, and the latter is an incompete line at the end of
the file.  When we trimmed the LF and decremented s, CR at l[s-1]
is the CR in CRLF, which we do want to ignore.  If we didn't, then
what is CR sitting at l[s-1]?  It is a lone CR at the end of file,
not a part of CRLF.  Do we really want to ignore it?

If we take the name of the option "ignore-cr-at-eol" literally, yes,
it is a CR sitting at the end of a line, which happens to be an
incomplete one, so we do want to ignore.  But if we think about the
reason why we are adding the option (i.e. to help conversion between
CRLF and LF), it is somewhat iffy.  The lone CR at the end of file
cannot be something that came from CRLF<->LF conversion, and ignoring
it would hide possible problems in conversion or the original data.

> I might offer
> another easier to understand (for me) solution,
> ...
> Though this seems even more complicated
> after having it written down.

This happens to me quite often and my solution to it is to remove
the alternative I tried to formulate after convincing myself that it
is not that much of an improvement ;-).


Re: Consequences of CRLF in index?

2017-10-25 Thread Jonathan Nieder
Hi again,

Lars Schneider wrote:
>> On 24 Oct 2017, at 20:14, Jonathan Nieder  wrote:

>> In any event, you also probably want to declare what you're doing
>> using .gitattributes.  By checking in the files as CRLF, you are
>> declaring that you do *not* want Git to treat them as text files
>> (i.e., you do not want Git to change the line endings), so something
>> as simple as
>>
>>  * -text
>
> That's sounds good. Does "-text" have any other implications?
> For whatever reason I always thought this is the way to tell
> Git that a particular file is binary with the implication that
> Git should not attempt to diff it.

No other implications.  You're thinking of "-diff".  There is also a
shortcut "binary" which simply means "-text -diff".

Ideas for wording improvements to gitattributes(5) on this subject?

Thanks,
Jonathan


Re: Consequences of CRLF in index?

2017-10-25 Thread Lars Schneider

> On 24 Oct 2017, at 20:14, Jonathan Nieder  wrote:
> 
> Hi,
> 
> Lars Schneider wrote:
> 
>> I've migrated a large repo (110k+ files) with a lot of history (177k commits)
>> and a lot of users (200+) to Git. Unfortunately, all text files in the index
>> of the repo have CRLF line endings. In general this seems not to be a problem
>> as the project is developed exclusively on Windows.
> 
> Sounds good.
> 
>> However, I wonder if there are any "hidden consequences" of this setup?
>> If there are consequences, then I see two options. Either I rebase the repo
>> and fix the line endings for all commits or I add a single commit that fixes
>> the line endings for all files. Both actions require coordination with the
>> users to avoid repo trouble/merge conflicts. The "single fixup commit" 
>> options
>> would also make diffs into the past look bad. Would a single large commit 
>> have
>> any impact on the performance of standard Git operations?
> 
> There are no hidden consequences that I'm aware of.  If you later
> decide that you want to become a cross-platform project, then you may
> want to switch to LF endings, in which case I suggest the "single
> fixup commit" strategy.
> 
> In any event, you also probably want to declare what you're doing
> using .gitattributes.  By checking in the files as CRLF, you are
> declaring that you do *not* want Git to treat them as text files
> (i.e., you do not want Git to change the line endings), so something
> as simple as
> 
>   * -text

That's sounds good. Does "-text" have any other implications?
For whatever reason I always thought this is the way to tell
Git that a particular file is binary with the implication that
Git should not attempt to diff it.

Thanks,
Lars

Re: Consequences of CRLF in index?

2017-10-25 Thread Stefan Beller
On Tue, Oct 24, 2017 at 9:53 PM, Junio C Hamano  wrote:
>
> Here is a lunch-time hack to add that option.  As you can see from
> the lack of docs, tests and a proper log message, I haven't played
> with it long enough to know how buggy it is, though.  I am not all
> that interested in polishing it to completion myself and prefer to
> leave it as #leftoverbits ;-)

Ok, nevertheless a review pointing out a couple things would be
useful for those who pick it up later, I assume.

> Also I didn't bother teaching this to Stefan's color-moved thing, as
> the line comparator it uses will hopefully be unified with the one I
> am touching in xdiff/ with this patch.

which will be rerolled shortly fixing just the parameter names as Eric
mentioned.

>  diff.c|  5 -
>  merge-recursive.c |  2 ++
>  xdiff/xdiff.h |  3 ++-
>  xdiff/xutils.c| 34 --
>  4 files changed, 40 insertions(+), 4 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 6fd288420b..eeca0fd3b8 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -4202,7 +4202,8 @@ void diff_setup_done(struct diff_options *options)
>
> if (DIFF_XDL_TST(options, IGNORE_WHITESPACE) ||
> DIFF_XDL_TST(options, IGNORE_WHITESPACE_CHANGE) ||
> -   DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL))
> +   DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL) ||
> +   DIFF_XDL_TST(options, IGNORE_CR_AT_EOL))

This highlights another part of the flag macros, that could be made nicer.
All these flags combined are XDF_WHITESPACE_FLAGS, so this
if could be written without the macros as

  if (options->xdl_ops & XDF_WHITESPACE_FLAGS)

which we might want to hide in a macro

  DIFF_XDL_MASK_TST(options, mask)

or such?


>  #define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
> -#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | 
> XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL)
> +#define XDF_IGNORE_CR_AT_EOL (1 << 5)
> +#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | 
> XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL | 
> XDF_IGNORE_CR_AT_EOL)
>
>  #define XDF_PATIENCE_DIFF (1 << 5)

(1<<5) is taken twice now. Currently there is only one
unused free bit (but that was used once upon a time);
so we have to think how we revamp the flag system to
support more than 32 bits.

See also the answers to
https://public-inbox.org/git/20171024000931.14814-1-sbel...@google.com/
as that started this discussion already.

>  #define XDF_HISTOGRAM_DIFF (1 << 6)
> diff --git a/xdiff/xutils.c b/xdiff/xutils.c
> index 04d7b32e4e..8720e272b9 100644
> --- a/xdiff/xutils.c
> +++ b/xdiff/xutils.c
> @@ -156,6 +156,21 @@ int xdl_blankline(const char *line, long size, long 
> flags)
> return (i == size);
>  }
>
> +/*
> + * Have we eaten everything on the line, except for an optional
> + * CR at the very end?
> + */
> +static int ends_with_optional_cr(const char *l, long s, long i)
> +{
> +   if (s && l[s-1] == '\n')
> +   s--;

so first we cut off the '\n',

> +   if (s == i)
> +   return 1;

then we either have an ending without

> +   if (s == i + 1 && l[i] == '\r')
> +   return 1;

or with a '\r' before.

That seems correct after some thought; I might offer
another easier to understand (for me) solution,
which is
{
   /* cut of ending LF */
   if (s && l[s-1] == '\n')
   s--;
  /* do we only need to cut LF? */
  if (i == s)
return 1;

   /* cut of ending CR */
   if (s && l[s-1] == '\r')
   s--;
  /* was this everything to cut? */
  return i == s
}

Though this seems even more complicated
after having it written down.

>  * Each flavor of ignoring needs different logic to skip whitespaces
>  * while we have both sides to compare.
> @@ -204,6 +220,14 @@ int xdl_recmatch(const char *l1, long s1, const char 
> *l2, long s2, long flags)
> i1++;
> i2++;
> }
> +   } else if (flags & XDF_IGNORE_CR_AT_EOL) {
> +   /* Find the first difference and see how the line ends */
> +   while (i1 < s1 && i2 < s2 && l1[i1] == l2[i2]) {
> +   i1++;
> +   i2++;
> +   }
> +   return (ends_with_optional_cr(l1, s1, i1) &&
> +   ends_with_optional_cr(l2, s2, i2));
> }
>
> /*
> @@ -230,9 +254,15 @@ static unsigned long 
> xdl_hash_record_with_whitespace(char const **data,
> char const *top, long flags) {
> unsigned long ha = 5381;
> char const *ptr = *data;
> +   int cr_at_eol_only = (flags & XDF_WHITESPACE_FLAGS) == 
> XDF_IGNORE_CR_AT_EOL;
>
> for (; ptr < top && *ptr != '\n'; ptr++) {
> -   if (XDL_ISSPACE(*ptr)) {
> +   if (cr_at_eol_only) {
> +   if (*ptr == '\r' &&
> +   

Re: Consequences of CRLF in index?

2017-10-25 Thread Johannes Schindelin
Hi Hannes,

On Tue, 24 Oct 2017, Johannes Sixt wrote:

> Am 24.10.2017 um 19:48 schrieb Lars Schneider:
> > I've migrated a large repo (110k+ files) with a lot of history (177k
> > commits)
> > and a lot of users (200+) to Git. Unfortunately, all text files in the index
> > of the repo have CRLF line endings. In general this seems not to be a
> > problem
> > as the project is developed exclusively on Windows.
> > 
> > However, I wonder if there are any "hidden consequences" of this setup?
> 
> I've been working on a project with CRLF in every source file for a decade
> now. It's C++ source, and it isn't even Windows-only: when checked out on
> Linux, there are CRs in the files, with no bad consequences so far. GCC is
> happy with them.

I envy you for the blessing of such a clean C++ source that you do not
have any, say, Unix shell script in it. Try this, and weep:

$ printf 'echo \\\r\n\t123\r\n' >a1

$ sh a1

a1: 2: a1: 123: not found

For the same reason (Unix shell not handling CR/LF gracefull), I went
through that painful work that finally landed as 00ddc9d13ca (Fix build
with core.autocrlf=true, 2017-05-09).

Ciao,
Dscho


Re: Consequences of CRLF in index?

2017-10-25 Thread Johannes Schindelin
Hi,

On Tue, 24 Oct 2017, Torsten Bögershausen wrote:

> The penalty may be low, but Dscho once reported that it [line endings
> conversion] is measurable & painful on a "big repo".

Yes, I do not remember the details, but it must have been around 5-15%
speed improvement to prevent the autoCRLF stuff from doing its thing.

At work, we always switch it off, for that reason.

Ciao,
Dscho

Re: Consequences of CRLF in index?

2017-10-24 Thread Junio C Hamano
Junio C Hamano  writes:

> Jonathan Nieder  writes:
>
>> I'd be interested to hear what happens when diff-ing across a line
>> ending fixup commit.  Is this an area where Git needs some
>> improvement?  "git merge" knows an -Xrenormalize option to deal with a
>> related problem --- it's possible that "git diff" needs to learn a
>> similar trick.
>
> My knee-jerk reaction is that (1) the end user definitely wants to
> see preimage and postimage lines are different in such a commit by
> default, one side has and the other side lacks ^M at the end., and
> (2) one of the "whitespace ignoring" options (namely, "ignore space
> at eol") may suffice, but if not, it should be easy to invent a new
> one "ignore CR at eol".

Here is a lunch-time hack to add that option.  As you can see from
the lack of docs, tests and a proper log message, I haven't played
with it long enough to know how buggy it is, though.  I am not all
that interested in polishing it to completion myself and prefer to
leave it as #leftoverbits ;-)

Also I didn't bother teaching this to Stefan's color-moved thing, as
the line comparator it uses will hopefully be unified with the one I
am touching in xdiff/ with this patch.

Have fun.

 diff.c|  5 -
 merge-recursive.c |  2 ++
 xdiff/xdiff.h |  3 ++-
 xdiff/xutils.c| 34 --
 4 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index 6fd288420b..eeca0fd3b8 100644
--- a/diff.c
+++ b/diff.c
@@ -4202,7 +4202,8 @@ void diff_setup_done(struct diff_options *options)
 
if (DIFF_XDL_TST(options, IGNORE_WHITESPACE) ||
DIFF_XDL_TST(options, IGNORE_WHITESPACE_CHANGE) ||
-   DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL))
+   DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL) ||
+   DIFF_XDL_TST(options, IGNORE_CR_AT_EOL))
DIFF_OPT_SET(options, DIFF_FROM_CONTENTS);
else
DIFF_OPT_CLR(options, DIFF_FROM_CONTENTS);
@@ -4659,6 +4660,8 @@ int diff_opt_parse(struct diff_options *options,
DIFF_XDL_SET(options, IGNORE_WHITESPACE_CHANGE);
else if (!strcmp(arg, "--ignore-space-at-eol"))
DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
+   else if (!strcmp(arg, "--ignore-cr-at-eol"))
+   DIFF_XDL_SET(options, IGNORE_CR_AT_EOL);
else if (!strcmp(arg, "--ignore-blank-lines"))
DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
else if (!strcmp(arg, "--indent-heuristic"))
diff --git a/merge-recursive.c b/merge-recursive.c
index 1d3f8f0d22..4a49ec93b1 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2251,6 +2251,8 @@ int parse_merge_opt(struct merge_options *o, const char 
*s)
DIFF_XDL_SET(o, IGNORE_WHITESPACE);
else if (!strcmp(s, "ignore-space-at-eol"))
DIFF_XDL_SET(o, IGNORE_WHITESPACE_AT_EOL);
+   else if (!strcmp(s, "ignore-cr-at-eol"))
+   DIFF_XDL_SET(o, IGNORE_CR_AT_EOL);
else if (!strcmp(s, "renormalize"))
o->renormalize = 1;
else if (!strcmp(s, "no-renormalize"))
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index b090ad8eac..ff16243da9 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -32,7 +32,8 @@ extern "C" {
 #define XDF_IGNORE_WHITESPACE (1 << 2)
 #define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3)
 #define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
-#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | 
XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL)
+#define XDF_IGNORE_CR_AT_EOL (1 << 5)
+#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | 
XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL | 
XDF_IGNORE_CR_AT_EOL)
 
 #define XDF_PATIENCE_DIFF (1 << 5)
 #define XDF_HISTOGRAM_DIFF (1 << 6)
diff --git a/xdiff/xutils.c b/xdiff/xutils.c
index 04d7b32e4e..8720e272b9 100644
--- a/xdiff/xutils.c
+++ b/xdiff/xutils.c
@@ -156,6 +156,21 @@ int xdl_blankline(const char *line, long size, long flags)
return (i == size);
 }
 
+/*
+ * Have we eaten everything on the line, except for an optional
+ * CR at the very end?
+ */
+static int ends_with_optional_cr(const char *l, long s, long i)
+{
+   if (s && l[s-1] == '\n')
+   s--;
+   if (s == i)
+   return 1;
+   if (s == i + 1 && l[i] == '\r')
+   return 1;
+   return 0;
+}
+
 int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags)
 {
int i1, i2;
@@ -170,7 +185,8 @@ int xdl_recmatch(const char *l1, long s1, const char *l2, 
long s2, long flags)
 
/*
 * -w matches everything that matches with -b, and -b in turn
-* matches everything that matches with --ignore-space-at-eol.
+* matches everything that matches with --ignore-space-at-eol,
+* which in turn matches everything that matches with 
--ignore-cr-at-eol.
 *
 * Each flavor of ignoring needs 

Re: Consequences of CRLF in index?

2017-10-24 Thread Junio C Hamano
Jonathan Nieder  writes:

> I'd be interested to hear what happens when diff-ing across a line
> ending fixup commit.  Is this an area where Git needs some
> improvement?  "git merge" knows an -Xrenormalize option to deal with a
> related problem --- it's possible that "git diff" needs to learn a
> similar trick.

My knee-jerk reaction is that (1) the end user definitely wants to
see preimage and postimage lines are different in such a commit by
default, one side has and the other side lacks ^M at the end., and
(2) one of the "whitespace ignoring" options (namely, "ignore space
at eol") may suffice, but if not, it should be easy to invent a new
one "ignore CR at eol".


Re: Consequences of CRLF in index?

2017-10-24 Thread Johannes Sixt

Am 24.10.2017 um 19:48 schrieb Lars Schneider:

I've migrated a large repo (110k+ files) with a lot of history (177k commits)
and a lot of users (200+) to Git. Unfortunately, all text files in the index
of the repo have CRLF line endings. In general this seems not to be a problem
as the project is developed exclusively on Windows.

However, I wonder if there are any "hidden consequences" of this setup?


I've been working on a project with CRLF in every source file for a 
decade now. It's C++ source, and it isn't even Windows-only: when 
checked out on Linux, there are CRs in the files, with no bad 
consequences so far. GCC is happy with them.


-- Hannes


Re: Consequences of CRLF in index?

2017-10-24 Thread Torsten Bögershausen
On Tue, Oct 24, 2017 at 11:14:15AM -0700, Jonathan Nieder wrote:
> Hi,
> 
> Lars Schneider wrote:
> 
> > I've migrated a large repo (110k+ files) with a lot of history (177k 
> > commits)
> > and a lot of users (200+) to Git. Unfortunately, all text files in the index
> > of the repo have CRLF line endings. In general this seems not to be a 
> > problem
> > as the project is developed exclusively on Windows.
> 
> Sounds good.
> 
> > However, I wonder if there are any "hidden consequences" of this setup?
> > If there are consequences, then I see two options. Either I rebase the repo
> > and fix the line endings for all commits or I add a single commit that fixes
> > the line endings for all files. Both actions require coordination with the
> > users to avoid repo trouble/merge conflicts. The "single fixup commit" 
> > options
> > would also make diffs into the past look bad. Would a single large commit 
> > have
> > any impact on the performance of standard Git operations?
> 
> There are no hidden consequences that I'm aware of.  If you later
> decide that you want to become a cross-platform project, then you may
> want to switch to LF endings, in which case I suggest the "single
> fixup commit" strategy.
> 
> In any event, you also probably want to declare what you're doing
> using .gitattributes.  By checking in the files as CRLF, you are
> declaring that you do *not* want Git to treat them as text files
> (i.e., you do not want Git to change the line endings), so something
> as simple as
> 
>   * -text
> 
> should do the trick.  See gitattributes(5) for details.

Exactly.
The "hidden consequence" may be the other way around:
If you don't specify .gitattributes, then all people who have core.autocrlf=true
will suffer from a runtime penalty.
core.autocrlf=true means "auto".
At each checkout Git needs to figure out that the file has CRLF in the repo,
so that there is no conversion done.
The same happens on "git add" or "git commit" (and sometimes on "git status").

The penalty may be low, but Dscho once reported that it is measurable & painful
on a "big repo".

The other advantage of "* -text" in .gitattributes is that new files are handled
consistantly accross developers.

Those who have "core.autocrlf=false" would produce commits with CRLF for new
files, and those developpers who have core.autocrlf=true would produce files
with LF in the index and CRLF in the worktree.
This may (most probably will) cause confusion later, when things are pushed and
pulled.

> 
> I'd be interested to hear what happens when diff-ing across a line
> ending fixup commit.  Is this an area where Git needs some
> improvement?  "git merge" knows an -Xrenormalize option to deal with a
> related problem --- it's possible that "git diff" needs to learn a
> similar trick.

That is a tricky thing.
Sometimes you want to see the CLRF - LF as a diff, (represented as "^M"),
and sometimes not.

All in all "* -text" is a good choice for Windows-only projects.

> 
> Thanks and hope that helps,
> Jonathan


Re: Consequences of CRLF in index?

2017-10-24 Thread Jonathan Nieder
Hi,

Lars Schneider wrote:

> I've migrated a large repo (110k+ files) with a lot of history (177k commits)
> and a lot of users (200+) to Git. Unfortunately, all text files in the index
> of the repo have CRLF line endings. In general this seems not to be a problem
> as the project is developed exclusively on Windows.

Sounds good.

> However, I wonder if there are any "hidden consequences" of this setup?
> If there are consequences, then I see two options. Either I rebase the repo
> and fix the line endings for all commits or I add a single commit that fixes
> the line endings for all files. Both actions require coordination with the
> users to avoid repo trouble/merge conflicts. The "single fixup commit" options
> would also make diffs into the past look bad. Would a single large commit have
> any impact on the performance of standard Git operations?

There are no hidden consequences that I'm aware of.  If you later
decide that you want to become a cross-platform project, then you may
want to switch to LF endings, in which case I suggest the "single
fixup commit" strategy.

In any event, you also probably want to declare what you're doing
using .gitattributes.  By checking in the files as CRLF, you are
declaring that you do *not* want Git to treat them as text files
(i.e., you do not want Git to change the line endings), so something
as simple as

* -text

should do the trick.  See gitattributes(5) for details.

I'd be interested to hear what happens when diff-ing across a line
ending fixup commit.  Is this an area where Git needs some
improvement?  "git merge" knows an -Xrenormalize option to deal with a
related problem --- it's possible that "git diff" needs to learn a
similar trick.

Thanks and hope that helps,
Jonathan


Consequences of CRLF in index?

2017-10-24 Thread Lars Schneider
Hi,

I've migrated a large repo (110k+ files) with a lot of history (177k commits) 
and a lot of users (200+) to Git. Unfortunately, all text files in the index
of the repo have CRLF line endings. In general this seems not to be a problem 
as the project is developed exclusively on Windows.

However, I wonder if there are any "hidden consequences" of this setup?
If there are consequences, then I see two options. Either I rebase the repo 
and fix the line endings for all commits or I add a single commit that fixes 
the line endings for all files. Both actions require coordination with the 
users to avoid repo trouble/merge conflicts. The "single fixup commit" options 
would also make diffs into the past look bad. Would a single large commit have
any impact on the performance of standard Git operations?

Thanks,
Lars