Re: [PATCH v7] ls-files: Add eol diagnostics

2015-12-22 Thread Junio C Hamano
Torsten Bögershausen  writes:

> When working in a cross-platform environment, a user wants to
> check if text files are stored normalized in the repository and if
> .gitattributes are set appropriately.
>
> Make it possible to let Git show the line endings in the index and
> in the working tree and the effective text/eol attributes.
>
> The end of line ("eolinfo") are shown like this:
> "binary"   binary file
> "text-no-eol"  text file without any EOL
> "text-lf"  text file with LF
> "text-crlf"text file with CRLF
> "text-crlf-lf" text file with mixed line endings.
>
> The effective text/eol attribute is one of these:
> "", "-text", "text", "text=auto", "eol=lf", "eol=crlf"
>
> git ls-files --eol gives an output like this:
>
> i/text-no-eol   w/text-no-eol   attr/text=auto t/t5100/empty
> i/binaryw/binaryattr/-text t/test-binary-2.png
> i/text-lf   w/text-lf   attr/eol=lft/t5100/rfc2047-info-0007
> i/text-lf   w/text-crlf attr/eol=crlf  doit.bat
> i/text-crlf-lf  w/text-crlf-lf  attr/  locale/XX.po
>
> Note that the output is meant to be human-readable and may change.

Wait, what?  

I've been assuming that these output being designed was to be read
by machine, because this new feature is implemented as a part of the
command "ls-files", which is plumbing whose output is meant for
script consumption.

> +--eol::
> + Show line endings ("eolinfo") and the text/eol attributes 
> ("texteolattr") of files.
> + "eolinfo" is the file content identification used by Git when
> + the "text" attribute is "auto", or core.autocrlf != false.
> +
> + "eolinfo" is either "" (when the the info is not available"), or one of 
> "binary",
> + "text-no-eol", "text-lf", "text-crlf" or "text-crlf-lf".
> + The "texteolattr" can be "", "-text", "text", "text=auto", "eol=lf", 
> "eol=crlf".
> +
> + Both the content in the index ("i/") and the content in the working 
> tree ("w/")
> + are shown for regular files, followed by the "texteolattr ("attr/").
> +

Does this format well, or would the second and third paragraph be
format in a funny way?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7] ls-files: Add eol diagnostics

2015-12-22 Thread Torsten Bögershausen
When working in a cross-platform environment, a user wants to
check if text files are stored normalized in the repository and if
.gitattributes are set appropriately.

Make it possible to let Git show the line endings in the index and
in the working tree and the effective text/eol attributes.

The end of line ("eolinfo") are shown like this:
"binary"   binary file
"text-no-eol"  text file without any EOL
"text-lf"  text file with LF
"text-crlf"text file with CRLF
"text-crlf-lf" text file with mixed line endings.

The effective text/eol attribute is one of these:
"", "-text", "text", "text=auto", "eol=lf", "eol=crlf"

git ls-files --eol gives an output like this:

i/text-no-eol   w/text-no-eol   attr/text=auto t/t5100/empty
i/binaryw/binaryattr/-text t/test-binary-2.png
i/text-lf   w/text-lf   attr/eol=lft/t5100/rfc2047-info-0007
i/text-lf   w/text-crlf attr/eol=crlf  doit.bat
i/text-crlf-lf  w/text-crlf-lf  attr/  locale/XX.po

Note that the output is meant to be human-readable and may change.

Add test cases in t0027, thanks to Junio C Hamano for the optimized
grep-less sed expression.

Helped-By: Eric Sunshine 
Signed-off-by: Torsten Bögershausen 
---
Changes against v6 on pu:
- convert.c: fixed possible mem leak in get_wt_convert_stats_ascii()
- builtin/ls-files.c: removed not needed ()
- t0027: Re-added empty line
Don't use grep/sed, but sed with an address /(Thanks Junio)
More test_when_finished, other small things from the review
Add comment on the last TC, which used ls-files -d
The last TC may still be classified as fragile, but I couln't
motivate myself to use a separated creation/commit for this
TC and decided to safe some CPU cycles and re-use what we have
in the repo.
Documentation/git-ls-files.txt |  22 
builtin/ls-files.c |  19 +++
convert.c  |  85 +++
convert.h  |   3 ++
t/t0027-auto-crlf.sh   | 112 -
5 files changed, 229 insertions(+), 12 deletions(-)

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index e26f01f..8f29c99 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -12,6 +12,7 @@ SYNOPSIS
'git ls-files' [-z] [-t] [-v]

(--[cached|deleted|others|ignored|stage|unmerged|killed|modified])*
(-[c|d|o|i|s|u|k|m])*
+   [--eol]
[-x |--exclude=]
[-X |--exclude-from=]
[--exclude-per-directory=]
@@ -147,6 +148,18 @@ a space) at the start of each line:
possible for manual inspection; the exact format may change at
any time.

+--eol::
+   Show line endings ("eolinfo") and the text/eol attributes 
("texteolattr") of files.
+   "eolinfo" is the file content identification used by Git when
+   the "text" attribute is "auto", or core.autocrlf != false.
+
+   "eolinfo" is either "" (when the the info is not available"), or one of 
"binary",
+   "text-no-eol", "text-lf", "text-crlf" or "text-crlf-lf".
+   The "texteolattr" can be "", "-text", "text", "text=auto", "eol=lf", 
"eol=crlf".
+
+   Both the content in the index ("i/") and the content in the working 
tree ("w/")
+   are shown for regular files, followed by the "texteolattr ("attr/").
+
\--::
Do not interpret any more arguments as options.

@@ -161,6 +174,15 @@ which case it outputs:

   [ ]   

+'git ls-files --eol' will show
+   i/ w/ attr/ 
+
+'git ls-files --eol -o' will show
+   i/  w/ attr/ 
+
+'git ls-files --eol -s' will show
+[ ]   i/ w/ attr/ 
+
'git ls-files --unmerged' and 'git ls-files --stage' can be used to examine
detailed information on unmerged paths.

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index b6a7cb0..9eacc64 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -27,6 +27,7 @@ static int show_killed;
static int show_valid_bit;
static int line_terminator = '\n';
static int debug_mode;
+static int show_eol;

static const char *prefix;
static int max_prefix_len;
@@ -47,6 +48,21 @@ static const char *tag_modified = "";
static const char *tag_skip_worktree = "";
static const char *tag_resolve_undo = "";

+static void write_eolinfo(const struct cache_entry *ce, const char *path)
+{
+   struct stat st;
+   const char *i_txt = "";
+   const char *w_txt = "";
+   if (!show_eol)
+   return;
+   if (ce && S_ISREG(ce->ce_mode))
+   i_txt = get_cached_convert_stats_ascii(ce->name);
+   if (!lstat(path, ) && S_ISREG(st.st_mode))
+   w_txt = get_wt_convert_stats_ascii(path);
+   printf("i/%-13s w/%-13s attr/%-9s ", i_txt, w_txt,
+get_convert_attr_ascii(path));
+}
+
static void write_name(const char *name)
{
 

[PATCH v7] ls-files: Add eol diagnostics

2015-12-07 Thread Torsten Bögershausen
When working in a cross-platform environment, a user wants to
check if text files are stored normalized in the repository and if
.gitattributes are set appropriately.

Make it possible to let Git show the line endings in the index and
in the working tree and the effective text/eol attributes.

The end of line ("eolinfo") are shown like this:
"binary"   binary file
"text-no-eol"  text file without any EOL
"text-lf"  text file with LF
"text-crlf"text file with CRLF
"text-crlf-lf" text file with mixed line endings.

The effective text/eol attribute is one of these:
"", "-text", "text", "text=auto", "eol=lf", "eol=crlf"

git ls-files --eol gives an output like this:

i/text-no-eol   w/text-no-eol   attr/text=auto t/t5100/empty
i/binaryw/binaryattr/-text t/test-binary-2.png
i/text-lf   w/text-lf   attr/eol=lft/t5100/rfc2047-info-0007
i/text-lf   w/text-crlf attr/eol=crlf  doit.bat
i/text-crlf-lf  w/text-crlf-lf  attr/  locale/XX.po

Note that the output is meant to be human-readable and may change.

Helped-By: Eric Sunshine 
Signed-off-by: Torsten Bögershausen 
---
Changes since v6:
- Fixed potential memory leak in convert.c, when strbuf_read_file()
  fails.
- t0027:
  Cleanups (empty lines, egrep, un-needed quoting)
  test_when_finished 'rm e expect actual' 
  There doesn't seem to be 100% consistency when and how to remove files.
  (I think if we create files, we should be able to remove them:
  use "rm" rather than "rm -f")
  Add comment about the "last test case", which removes file to run
  'git ls-files -d"

 Documentation/git-ls-files.txt |  22 +
 builtin/ls-files.c |  19 
 convert.c  |  85 
 convert.h  |   3 ++
 t/t0027-auto-crlf.sh   | 108 -
 5 files changed, 226 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index e26f01f..8f29c99 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -12,6 +12,7 @@ SYNOPSIS
 'git ls-files' [-z] [-t] [-v]

(--[cached|deleted|others|ignored|stage|unmerged|killed|modified])*
(-[c|d|o|i|s|u|k|m])*
+   [--eol]
[-x |--exclude=]
[-X |--exclude-from=]
[--exclude-per-directory=]
@@ -147,6 +148,18 @@ a space) at the start of each line:
possible for manual inspection; the exact format may change at
any time.
 
+--eol::
+   Show line endings ("eolinfo") and the text/eol attributes 
("texteolattr") of files.
+   "eolinfo" is the file content identification used by Git when
+   the "text" attribute is "auto", or core.autocrlf != false.
+
+   "eolinfo" is either "" (when the the info is not available"), or one of 
"binary",
+   "text-no-eol", "text-lf", "text-crlf" or "text-crlf-lf".
+   The "texteolattr" can be "", "-text", "text", "text=auto", "eol=lf", 
"eol=crlf".
+
+   Both the content in the index ("i/") and the content in the working 
tree ("w/")
+   are shown for regular files, followed by the "texteolattr ("attr/").
+
 \--::
Do not interpret any more arguments as options.
 
@@ -161,6 +174,15 @@ which case it outputs:
 
 [ ]   
 
+'git ls-files --eol' will show
+   i/ w/ attr/ 
+
+'git ls-files --eol -o' will show
+   i/  w/ attr/ 
+
+'git ls-files --eol -s' will show
+[ ]   i/ w/ attr/ 
+
 'git ls-files --unmerged' and 'git ls-files --stage' can be used to examine
 detailed information on unmerged paths.
 
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index b6a7cb0..ef892bc 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -27,6 +27,7 @@ static int show_killed;
 static int show_valid_bit;
 static int line_terminator = '\n';
 static int debug_mode;
+static int show_eol;
 
 static const char *prefix;
 static int max_prefix_len;
@@ -47,6 +48,21 @@ static const char *tag_modified = "";
 static const char *tag_skip_worktree = "";
 static const char *tag_resolve_undo = "";
 
+static void write_eolinfo(const struct cache_entry *ce, const char *path)
+{
+   struct stat st;
+   const char *i_txt = "";
+   const char *w_txt = "";
+   if (!show_eol)
+   return;
+   if (ce && S_ISREG(ce->ce_mode))
+   i_txt = get_cached_convert_stats_ascii(ce->name);
+   if (!lstat(path, ) && (S_ISREG(st.st_mode)))
+   w_txt = get_wt_convert_stats_ascii(path);
+   printf("i/%-13s w/%-13s attr/%-9s ", i_txt, w_txt,
+get_convert_attr_ascii(path));
+}
+
 static void write_name(const char *name)
 {
/*
@@ -68,6 +84,7 @@ static void show_dir_entry(const char *tag, struct dir_entry 
*ent)
return;
 
fputs(tag, stdout);
+   write_eolinfo(NULL, ent->name);
   

Re: [PATCH v7] ls-files: Add eol diagnostics

2015-12-07 Thread Philip Oakley

From: "Torsten Bögershausen" 

When working in a cross-platform environment, a user wants to
check if text files are stored normalized in the repository and if
.gitattributes are set appropriately.



The need for this came up again on the Git Users list 
(https://groups.google.com/d/msg/git-users/jC-mngwVYo4/Sr4JXgpGJVYJ). It 
will definitely be useful in mixed environments wher users can get rather 
confused.


I've added a couple of bikeshed comments on some of the wider issues this 
bumps into.



Make it possible to let Git show the line endings in the index and
in the working tree and the effective text/eol attributes.

The end of line ("eolinfo") are shown like this:
"binary"   binary file
"text-no-eol"  text file without any EOL
"text-lf"  text file with LF
"text-crlf"text file with CRLF
"text-crlf-lf" text file with mixed line endings.


 This "text-crlf-lf" can easily occur for eol=crlf files which 
have merge conflict markers, which are always eol=lf. It may be that the 
conflict markers should use the eol setting that is in place for the file 
being merged. e.g. 
http://stackoverflow.com/questions/17832616/make-git-use-crlf-on-its-head-merge-lines 
(which links to https://github.com/git/git/blob/master/xdiff/xmerge.c#L173, 
and 3 other places in fill_conflict_hunk). The issue of how to mark EOL 
conflicts in-text is tricky as it's a non-obvious white space issue - it 
doesn't fit well with the normal >>> === <<< conflict markers.




The effective text/eol attribute is one of these:
"", "-text", "text", "text=auto", "eol=lf", "eol=crlf"


 the "-text" attribute isn't explained in the gitattributes(5) 
page, except by implication  from one of the examples.


git ls-files --eol gives an output like this:

i/text-no-eol   w/text-no-eol   attr/text=auto t/t5100/empty
i/binaryw/binaryattr/-text t/test-binary-2.png
i/text-lf   w/text-lf   attr/eol=lft/t5100/rfc2047-info-0007
i/text-lf   w/text-crlf attr/eol=crlf  doit.bat
i/text-crlf-lf  w/text-crlf-lf  attr/  locale/XX.po

Note that the output is meant to be human-readable and may change.

Helped-By: Eric Sunshine 
Signed-off-by: Torsten Bögershausen 
---
Changes since v6:
- Fixed potential memory leak in convert.c, when strbuf_read_file()
 fails.
- t0027:
 Cleanups (empty lines, egrep, un-needed quoting)
 test_when_finished 'rm e expect actual'
 There doesn't seem to be 100% consistency when and how to remove files.
 (I think if we create files, we should be able to remove them:
 use "rm" rather than "rm -f")
 Add comment about the "last test case", which removes file to run
 'git ls-files -d"

Documentation/git-ls-files.txt |  22 +
builtin/ls-files.c |  19 
convert.c  |  85 
convert.h  |   3 ++
t/t0027-auto-crlf.sh   | 108 
-

5 files changed, 226 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-ls-files.txt 
b/Documentation/git-ls-files.txt

index e26f01f..8f29c99 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -12,6 +12,7 @@ SYNOPSIS
'git ls-files' [-z] [-t] [-v]
 (--[cached|deleted|others|ignored|stage|unmerged|killed|modified])*
 (-[c|d|o|i|s|u|k|m])*
+ [--eol]
 [-x |--exclude=]
 [-X |--exclude-from=]
 [--exclude-per-directory=]
@@ -147,6 +148,18 @@ a space) at the start of each line:
 possible for manual inspection; the exact format may change at
 any time.

+--eol::
+ Show line endings ("eolinfo") and the text/eol attributes 
("texteolattr") of files.

+ "eolinfo" is the file content identification used by Git when
+ the "text" attribute is "auto", or core.autocrlf != false.
+
+ "eolinfo" is either "" (when the the info is not available"), or one of 
"binary",

+ "text-no-eol", "text-lf", "text-crlf" or "text-crlf-lf".
+ The "texteolattr" can be "", "-text", "text", "text=auto", "eol=lf", 
"eol=crlf".


perhaps add  'see also linkgit:gitattributes[5] .' though it may be too 
tangential.



+
+ Both the content in the index ("i/") and the content in the working tree 
("w/")

+ are shown for regular files, followed by the "texteolattr ("attr/").
+
\--::
 Do not interpret any more arguments as options.

@@ -161,6 +174,15 @@ which case it outputs:

[ ]   

+'git ls-files --eol' will show
+ i/ w/ attr/ 
+
+'git ls-files --eol -o' will show
+ i/  w/ attr/ 
+
+'git ls-files --eol -s' will show
+[ ]   i/ w/ attr/ 


+
'git ls-files --unmerged' and 'git ls-files --stage' can be used to 
examine

detailed information on unmerged paths.

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index b6a7cb0..ef892bc 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -27,6 +27,7 @@ static int show_killed;
static int show_valid_bit;
static int line_terminator = '\n';
static int debug_mode;
+static int show_eol;

static const 

Re: [PATCH v7] ls-files: Add eol diagnostics

2015-12-07 Thread Eric Sunshine
On Monday, December 7, 2015, Torsten Bögershausen  wrote:
> When working in a cross-platform environment, a user wants to
> check if text files are stored normalized in the repository and if
> .gitattributes are set appropriately.[...]

A few style comments this time around...

> Helped-By: Eric Sunshine 
> Signed-off-by: Torsten Bögershausen 
> ---
> diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> @@ -47,6 +48,21 @@ static const char *tag_modified = "";
> +static void write_eolinfo(const struct cache_entry *ce, const char *path)
> +{
> +   struct stat st;
> +   const char *i_txt = "";
> +   const char *w_txt = "";
> +   if (!show_eol)
> +   return;
> +   if (ce && S_ISREG(ce->ce_mode))
> +   i_txt = get_cached_convert_stats_ascii(ce->name);
> +   if (!lstat(path, ) && (S_ISREG(st.st_mode)))

Style: unnecessary parentheses around S_ISREG(), which is inconsistent
with S_ISREG() two lines above.

> +   w_txt = get_wt_convert_stats_ascii(path);
> +   printf("i/%-13s w/%-13s attr/%-9s ", i_txt, w_txt,
> +get_convert_attr_ascii(path));
> +}
> diff --git a/convert.c b/convert.c
> @@ -95,6 +100,62 @@ static int is_binary(unsigned long size, struct text_stat 
> *stats)
> +static unsigned int gather_convert_stats(const char *data, unsigned long 
> size)
> +{
> +   struct text_stat stats;
> +   if (!data || !size)
> +   return 0;
> +   gather_stats(data, size, );
> +   if (is_binary(size, ) || stats.cr != stats.crlf)
> +   return CONVERT_STAT_BITS_BIN;
> +   else if (stats.crlf && (stats.crlf == stats.lf))

Style: unnecessary parentheses around 'foo == bar'

> +   return CONVERT_STAT_BITS_TXT_CRLF;
> +   else if (stats.crlf && stats.lf)
> +   return CONVERT_STAT_BITS_TXT_CRLF | CONVERT_STAT_BITS_TXT_LF;
> +   else if (stats.lf)
> +   return CONVERT_STAT_BITS_TXT_LF;
> +   else
> +   return 0;
> +}
> +
> +static const char *gather_convert_stats_ascii(const char *data, unsigned 
> long size)
> +{
> +   unsigned int convert_stats = gather_convert_stats(data, size);
> +
> +   if (convert_stats & CONVERT_STAT_BITS_BIN)
> +   return "binary";
> +   switch (convert_stats) {
> +   case CONVERT_STAT_BITS_TXT_LF:
> +   return("text-lf");

Style: space after "return".

Also, can we lose the unnecessary noisy parentheses? (I recall
mentioning this in my first review.)

Same for "return" statements below.

> +   case CONVERT_STAT_BITS_TXT_CRLF:
> +   return("text-crlf");
> +   case CONVERT_STAT_BITS_TXT_LF | CONVERT_STAT_BITS_TXT_CRLF:
> +   return("text-crlf-lf");
> +   default:
> +   return ("text-no-eol");
> +   }
> +}
> diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
> @@ -214,6 +239,20 @@ checkout_files () {
> fi
> done
>
> +   test_expect_success "ls-files --eol $lfname ${pfx}LF.txt" "
> + test_when_finished 'rm e expect actual' &&

Style: test_when_finished is incorrectly indented with tab+spaces
rather than only tabs

> +   cat >e <<-EOF &&
> +   i/text-crlf w/$(stats_ascii $crlfname) ${src}CRLF.txt
> +   i/text-crlf-lf w/$(stats_ascii $lfmixcrlf) 
> ${src}CRLF_mix_LF.txt
> +   i/text-lf w/$(stats_ascii $lfname) ${src}LF.txt
> +   i/binary w/$(stats_ascii $lfmixcr) ${src}LF_mix_CR.txt
> +   i/binary w/$(stats_ascii $crlfnul) ${src}CRLF_nul.txt
> +   i/binary w/$(stats_ascii $crlfnul) ${src}LF_nul.txt
> +   EOF
> +   sort expect &&
> +   git ls-files --eol $src* | sed -e 's!attr/[=a-z-]*!!g' -e 's/ 
>  */ /g' | sort >actual &&
> +   test_cmp expect actual
> +   "
> @@ -480,4 +550,20 @@ checkout_filesnative  true  "lf"  LFCRLF  
> CRLF_mix_LF  LF_mix_CR
>  checkout_filesnative  false "crlf"CRLF  CRLF  CRLF 
> CRLF_mix_CR  CRLF_nul
>  checkout_filesnative  true  "crlf"CRLF  CRLF  CRLF 
> CRLF_mix_CR  CRLF_nul
>
> +

Style: unnecessary blank line

> +# Should be the last test case: remove some files from the worktree
> +# run 'git ls-files -d'

This seems kind of fragile. Might it be possible to either recreate
those files when this test finishes or instead provide a function
which creates them on-demand for tests which need them? My concern is
that there is a good chance that someone later adding tests will
overlook this comment, especially since most people just plop new
tests at the bottom of the file.

> +test_expect_success 'ls-files --eol -d' "
> +   rm  crlf_false_attr__CRLF.txt crlf_false_attr__CRLF_mix_LF.txt 
> crlf_false_attr__LF.txt .gitattributes &&

Style: too many spaces