Re: [PATCH] dir: allow a BOM at the beginning of exclude files

2015-04-16 Thread Carlos Martín Nieto
On Thu, 2015-04-16 at 10:16 -0700, Junio C Hamano wrote:
> Jeff King  writes:
> 
> > On Thu, Apr 16, 2015 at 08:39:55AM -0700, Junio C Hamano wrote:
> >
> >> >  test_expect_success 'status untracked directory with --ignored' '
> >> >  echo "ignored" >.gitignore &&
> >> > +sed -e "s/^/\xef\xbb\xbf/" .gitignore >.gitignore.new &&
> >> > +mv .gitignore.new .gitignore &&
> >> 
> >> Is this "write literal in \xHEX on the replacement side of sed
> >> substitution" potable?  In any case, replacing the above three with
> >> something like:
> >> 
> >>printf "ignored\n" >.gitignore
> >> 
> >> may be more sensible, no?
> >
> > I'm not sure about sed, but I agree it is suspect. And note that printf
> > with hex codes is not portable, either You have to use octal:
> >
> >   printf '\357\273\277ignored\n' >.gitignore
> >
> > Also, as a nit, I'd much rather see this in its own test rather than
> > crammed into another test_expect_success. It's much easier to diagnose
> > failures if the test description mentions the goal, and it is not tied
> > up with testing other parts that might fail.
> 
> Yeah, I totally agree.
> 
> Carlos, something like this squashed in, perhaps?
> 
>  t/t7061-wtstatus-ignore.sh | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
> index 0a06fbf..cdc0747 100755
> --- a/t/t7061-wtstatus-ignore.sh
> +++ b/t/t7061-wtstatus-ignore.sh
> @@ -13,8 +13,6 @@ EOF
>  
>  test_expect_success 'status untracked directory with --ignored' '
>   echo "ignored" >.gitignore &&
> - sed -e "s/^/\xef\xbb\xbf/" .gitignore >.gitignore.new &&
> - mv .gitignore.new .gitignore &&
>   mkdir untracked &&
>   : >untracked/ignored &&
>   : >untracked/uncommitted &&
> @@ -22,6 +20,15 @@ test_expect_success 'status untracked directory with 
> --ignored' '
>   test_cmp expected actual
>  '
>  
> +test_expect_success 'same with gitignore starting with BOM' '
> + printf "\357\273\277ignored\n" >.gitignore &&
> + mkdir -p untracked &&
> + : >untracked/ignored &&
> + : >untracked/uncommitted &&
> + git status --porcelain --ignored >actual &&
> + test_cmp expected actual
> +'
> +
>  cat >expected <<\EOF
>  ?? .gitignore
>  ?? actual
> 

Yeah, that makes sense. I had something similar in my patch at one point
before going with modifying the current one.

   cmn


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dir: allow a BOM at the beginning of exclude files

2015-04-16 Thread Junio C Hamano
Jeff King  writes:

> On Thu, Apr 16, 2015 at 08:39:55AM -0700, Junio C Hamano wrote:
>
>> >  test_expect_success 'status untracked directory with --ignored' '
>> >echo "ignored" >.gitignore &&
>> > +  sed -e "s/^/\xef\xbb\xbf/" .gitignore >.gitignore.new &&
>> > +  mv .gitignore.new .gitignore &&
>> 
>> Is this "write literal in \xHEX on the replacement side of sed
>> substitution" potable?  In any case, replacing the above three with
>> something like:
>> 
>>  printf "ignored\n" >.gitignore
>> 
>> may be more sensible, no?
>
> I'm not sure about sed, but I agree it is suspect. And note that printf
> with hex codes is not portable, either You have to use octal:
>
>   printf '\357\273\277ignored\n' >.gitignore
>
> Also, as a nit, I'd much rather see this in its own test rather than
> crammed into another test_expect_success. It's much easier to diagnose
> failures if the test description mentions the goal, and it is not tied
> up with testing other parts that might fail.

Yeah, I totally agree.

Carlos, something like this squashed in, perhaps?

 t/t7061-wtstatus-ignore.sh | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
index 0a06fbf..cdc0747 100755
--- a/t/t7061-wtstatus-ignore.sh
+++ b/t/t7061-wtstatus-ignore.sh
@@ -13,8 +13,6 @@ EOF
 
 test_expect_success 'status untracked directory with --ignored' '
echo "ignored" >.gitignore &&
-   sed -e "s/^/\xef\xbb\xbf/" .gitignore >.gitignore.new &&
-   mv .gitignore.new .gitignore &&
mkdir untracked &&
: >untracked/ignored &&
: >untracked/uncommitted &&
@@ -22,6 +20,15 @@ test_expect_success 'status untracked directory with 
--ignored' '
test_cmp expected actual
 '
 
+test_expect_success 'same with gitignore starting with BOM' '
+   printf "\357\273\277ignored\n" >.gitignore &&
+   mkdir -p untracked &&
+   : >untracked/ignored &&
+   : >untracked/uncommitted &&
+   git status --porcelain --ignored >actual &&
+   test_cmp expected actual
+'
+
 cat >expected <<\EOF
 ?? .gitignore
 ?? actual
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dir: allow a BOM at the beginning of exclude files

2015-04-16 Thread Torsten Bögershausen
On 2015-04-16 16.05, Carlos Martín Nieto wrote:
[]
May be it is easier to move this into an own function, like remove_utf8_bom() ?

>  dir.c  | 8 +++-
>  t/t7061-wtstatus-ignore.sh | 2 ++
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/dir.c b/dir.c
> index 0943a81..6368247 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -581,6 +581,7 @@ int add_excludes_from_file_to_list(const char *fname,
>   struct stat st;
>   int fd, i, lineno = 1;
>   size_t size = 0;
> + static const unsigned char *utf8_bom = (unsigned char *) "\xef\xbb\xbf";
Do we really need to cast here (and if, is the cast dropping the "const" ?)

Another suggestion, see below:
either:
static const size_t bom_len = 3;
or
static const size_t bom_len = strlen(utf8_bom);

>   char *buf, *entry;
>  
>   fd = open(fname, O_RDONLY);
> @@ -617,7 +618,12 @@ int add_excludes_from_file_to_list(const char *fname,
>   }
>  
>   el->filebuf = buf;
> - entry = buf;
> +
And now we can avoid magic numbers:
if (size >= bom_len && !memcmp(buf, utf8_bom, bom_len))
entry = buf + bom_len;
else
entry = buf;
[]
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dir: allow a BOM at the beginning of exclude files

2015-04-16 Thread Johannes Schindelin
Hi,

On 2015-04-16 17:39, Junio C Hamano wrote:

> Also do we need a similar change to the attribute side, or are we
> already covered and we forgot to do the same for the ignore files?

I fear so: https://github.com/git/git/blob/v2.3.5/attr.c#L359-L376

As for the config, we are safe: 
https://github.com/git/git/blob/v2.3.5/config.c#L419-L439

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dir: allow a BOM at the beginning of exclude files

2015-04-16 Thread Jeff King
On Thu, Apr 16, 2015 at 08:39:55AM -0700, Junio C Hamano wrote:

> >  test_expect_success 'status untracked directory with --ignored' '
> > echo "ignored" >.gitignore &&
> > +   sed -e "s/^/\xef\xbb\xbf/" .gitignore >.gitignore.new &&
> > +   mv .gitignore.new .gitignore &&
> 
> Is this "write literal in \xHEX on the replacement side of sed
> substitution" potable?  In any case, replacing the above three with
> something like:
> 
>   printf "ignored\n" >.gitignore
> 
> may be more sensible, no?

I'm not sure about sed, but I agree it is suspect. And note that printf
with hex codes is not portable, either You have to use octal:

  printf '\357\273\277ignored\n' >.gitignore

Also, as a nit, I'd much rather see this in its own test rather than
crammed into another test_expect_success. It's much easier to diagnose
failures if the test description mentions the goal, and it is not tied
up with testing other parts that might fail.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dir: allow a BOM at the beginning of exclude files

2015-04-16 Thread Junio C Hamano
Carlos Martín Nieto  writes:

> diff --git a/dir.c b/dir.c
> index 0943a81..6368247 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -581,6 +581,7 @@ int add_excludes_from_file_to_list(const char *fname,
>   struct stat st;
>   int fd, i, lineno = 1;
>   size_t size = 0;
> + static const unsigned char *utf8_bom = (unsigned char *) "\xef\xbb\xbf";
>   char *buf, *entry;
>  
>   fd = open(fname, O_RDONLY);
> @@ -617,7 +618,12 @@ int add_excludes_from_file_to_list(const char *fname,
>   }
>  
>   el->filebuf = buf;
> - entry = buf;
> +
> + if (size >= 3 && !memcmp(buf, utf8_bom, 3))

OK.

> + entry = buf + 3;
> + else
> + entry = buf;
> +
>   for (i = 0; i < size; i++) {
>   if (buf[i] == '\n') {
>   if (entry != buf + i && entry[0] != '#') {
> diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
> index 460789b..0a06fbf 100755
> --- a/t/t7061-wtstatus-ignore.sh
> +++ b/t/t7061-wtstatus-ignore.sh
> @@ -13,6 +13,8 @@ EOF
>  
>  test_expect_success 'status untracked directory with --ignored' '
>   echo "ignored" >.gitignore &&
> + sed -e "s/^/\xef\xbb\xbf/" .gitignore >.gitignore.new &&
> + mv .gitignore.new .gitignore &&

Is this "write literal in \xHEX on the replacement side of sed
substitution" potable?  In any case, replacing the above three with
something like:

printf "ignored\n" >.gitignore

may be more sensible, no?

Also do we need a similar change to the attribute side, or are we
already covered and we forgot to do the same for the ignore files?


>   mkdir untracked &&
>   : >untracked/ignored &&
>   : >untracked/uncommitted &&
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dir: allow a BOM at the beginning of exclude files

2015-04-16 Thread Carlos Martín Nieto
On Thu, 2015-04-16 at 16:05 +0200, Carlos Martín Nieto wrote:
> Some text editors like Notepad or LibreOffice write an UTF-8 BOM in
> order to indicate that the file is Unicode text rather than whatever the
> current locale would indicate.
> 
> If someone uses such an editor to edit a gitignore file, we are left
> with those three bytes at the beginning of the file. If we do not skip
> them, we will attempt to match a filename with the BOM as prefix, which
> won't match the files the user is expecting.

Signed-off-by: Carlos Martín Nieto 

which I keep forgetting.

> 
> ---
> 
> If you're wondering how I came up with LibreOffice, I was doing a
> workshop recently and one of the participants was not content with the
> choice of vim or nano, so he opened LibreOffice to edit the gitignore
> file with confusing consequences.
> 
> This codepath doesn't go as far as the config code in validating that
> we do not have a partial BOM which would mean there's some invalid
> content, but we don't really have invalid content any other way, as
> we're just dealing with a list of paths in the file.
> 
>  dir.c  | 8 +++-
>  t/t7061-wtstatus-ignore.sh | 2 ++
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/dir.c b/dir.c
> index 0943a81..6368247 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -581,6 +581,7 @@ int add_excludes_from_file_to_list(const char *fname,
>   struct stat st;
>   int fd, i, lineno = 1;
>   size_t size = 0;
> + static const unsigned char *utf8_bom = (unsigned char *) "\xef\xbb\xbf";
>   char *buf, *entry;
>  
>   fd = open(fname, O_RDONLY);
> @@ -617,7 +618,12 @@ int add_excludes_from_file_to_list(const char *fname,
>   }
>  
>   el->filebuf = buf;
> - entry = buf;
> +
> + if (size >= 3 && !memcmp(buf, utf8_bom, 3))
> + entry = buf + 3;
> + else
> + entry = buf;
> +
>   for (i = 0; i < size; i++) {
>   if (buf[i] == '\n') {
>   if (entry != buf + i && entry[0] != '#') {
> diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
> index 460789b..0a06fbf 100755
> --- a/t/t7061-wtstatus-ignore.sh
> +++ b/t/t7061-wtstatus-ignore.sh
> @@ -13,6 +13,8 @@ EOF
>  
>  test_expect_success 'status untracked directory with --ignored' '
>   echo "ignored" >.gitignore &&
> + sed -e "s/^/\xef\xbb\xbf/" .gitignore >.gitignore.new &&
> + mv .gitignore.new .gitignore &&
>   mkdir untracked &&
>   : >untracked/ignored &&
>   : >untracked/uncommitted &&



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dir: allow a BOM at the beginning of exclude files

2015-04-16 Thread Carlos Martín Nieto
On Thu, 2015-04-16 at 17:03 +0200, Johannes Schindelin wrote:
> Hi Carlos,
> 
> On 2015-04-16 16:05, Carlos Martín Nieto wrote:
> > Some text editors like Notepad or LibreOffice write an UTF-8 BOM in
> > order to indicate that the file is Unicode text rather than whatever the
> > current locale would indicate.
> > 
> > If someone uses such an editor to edit a gitignore file, we are left
> > with those three bytes at the beginning of the file. If we do not skip
> > them, we will attempt to match a filename with the BOM as prefix, which
> > won't match the files the user is expecting.
> > 
> > ---
> > 
> > If you're wondering how I came up with LibreOffice, I was doing a
> > workshop recently and one of the participants was not content with the
> > choice of vim or nano, so he opened LibreOffice to edit the gitignore
> > file with confusing consequences.
> > 
> > This codepath doesn't go as far as the config code in validating that
> > we do not have a partial BOM which would mean there's some invalid
> > content, but we don't really have invalid content any other way, as
> > we're just dealing with a list of paths in the file.
> 
> Yeah, users are entertaining!
> 
> I agree that this is a good patch. *Maybe* we would need the same handling in 
> more places, in which case it might make sense to refactor the test into its 
> own function.

Yes, this was my train of thought. If we (discover that) need it in a
third place, then we can unify the test and skip. For two places I
reckoned it was fine if we duplicated a bit.

> 
> In any case, though, the Git project requires a [Developer's Certificate of 
> Origin](https://github.com/git/git/blob/v2.3.5/Documentation/SubmittingPatches#L234-L277);
>  Would you mind adding that?
> 

Yeah, sorry. I keep forgetting to do that. I'll reply to my original
e-mail with it.

Cheers,
   cmn


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dir: allow a BOM at the beginning of exclude files

2015-04-16 Thread Johannes Schindelin
Hi Carlos,

On 2015-04-16 16:05, Carlos Martín Nieto wrote:
> Some text editors like Notepad or LibreOffice write an UTF-8 BOM in
> order to indicate that the file is Unicode text rather than whatever the
> current locale would indicate.
> 
> If someone uses such an editor to edit a gitignore file, we are left
> with those three bytes at the beginning of the file. If we do not skip
> them, we will attempt to match a filename with the BOM as prefix, which
> won't match the files the user is expecting.
> 
> ---
> 
> If you're wondering how I came up with LibreOffice, I was doing a
> workshop recently and one of the participants was not content with the
> choice of vim or nano, so he opened LibreOffice to edit the gitignore
> file with confusing consequences.
> 
> This codepath doesn't go as far as the config code in validating that
> we do not have a partial BOM which would mean there's some invalid
> content, but we don't really have invalid content any other way, as
> we're just dealing with a list of paths in the file.

Yeah, users are entertaining!

I agree that this is a good patch. *Maybe* we would need the same handling in 
more places, in which case it might make sense to refactor the test into its 
own function.

In any case, though, the Git project requires a [Developer's Certificate of 
Origin](https://github.com/git/git/blob/v2.3.5/Documentation/SubmittingPatches#L234-L277);
 Would you mind adding that?

Thanks,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html