Re: Git *accepts* a branch name, it can't identity in the future?

2017-08-20 Thread Kaartic Sivaraam
Thanks, but Johannes has already found the issue and given a solution.
Regardless, replying to the questions just for the note.

On Sun, 2017-08-20 at 04:33 -0400, Jeff King wrote:
> What does "git for-each-ref" say about which branches you _do_ have?
> 
> Also, what platform are you on?
> 

I use a "Debian GNU/Linux buster/sid 64-bit"

> I'm wondering specifically if you have a filesystem (like HFS+ on MacOS)
> that silently rewrites invalid unicode in filenames we create. That
> would mean your branches are still there, but probably with some funny
> filename like "done/%xxdoc-fix". Git wouldn't know that name because the
> filesystem rewriting happened behinds its back (though I'd think that a
> further open() call would find the same file, so maybe this is barking
> up the wrong tree).
> 

That sounds dangerous!


> Another line of thinking: are you sure the � you are writing on the
> command line is identical to the one generated by the corruption (and if
> you cut and paste, is perhaps a generic glyph placed in the buffer by
> your terminal to replace an invalid codepoint, rather than the actual
> bytes)?
> 

This was the issue. I wasn't providing git with the actual bytes that
resulted as a consequence of the sloppy script.


>   [you didn't say how your script works, so let's use git to rename]

I know of no other way to rename a branch, so I didn't mention it :)


>   $ broken=$(printf '\223')
> 
>   [and we can rename it using that knowledge]
>   $ git branch ${broken}doc-fix doc-fix
> 

Johannes has already given a solution, this one works too.


-- 
Kaartic


Re: Git *accepts* a branch name, it can't identity in the future?

2017-08-20 Thread Kaartic Sivaraam
On Sun, 2017-08-20 at 10:20 +0200, Johannes Sixt wrote:
> It is not Git's fault that your terminal converts an invalid UTF-8 
> sequence (that your script produces) to �. Nor is it when you paste that 
> character onto the command line, that it is passed as a (correct) UTF-8 
> character.
> 

You're right. I just now realise how I missed the line between "what's
seen by us" and "what's seen by the program".


> Perhaps this helps (untested):
> 
> $ git branch -m done/$(printf '\x93')doc-fix done/dic-fix
> 

This one helped, thanks.

-- 
Kaartic


Re: Git *accepts* a branch name, it can't identity in the future?

2017-08-20 Thread Jeff King
On Sun, Aug 20, 2017 at 01:21:29PM +0530, Kaartic Sivaraam wrote:

> I made a small assumption in the script which turned out to be false. I
> thought the unicode prefixes I used corresponded to only two bytes.
> This lead to the issue. The unicode character '✓' corresponds to three
> characters and as a result instead of removing it, my script replaced
> it with the unknown character '�'. So, the branch named '✓doc-fix'
> became 'done/�doc-fix'. Here's the issue. I couldn't use 
> 
> $ git branch -m done/�doc-fix done/dic-fix 
> 
> to rename the branch. Nor could I refer to it in anyway. Git simply
> says,
> 
> error: pathspec 'done/�doc-fix' did not match any file(s) known to git.

What does "git for-each-ref" say about which branches you _do_ have?

Also, what platform are you on?

I'm wondering specifically if you have a filesystem (like HFS+ on MacOS)
that silently rewrites invalid unicode in filenames we create. That
would mean your branches are still there, but probably with some funny
filename like "done/%xxdoc-fix". Git wouldn't know that name because the
filesystem rewriting happened behinds its back (though I'd think that a
further open() call would find the same file, so maybe this is barking
up the wrong tree).

Another line of thinking: are you sure the � you are writing on the
command line is identical to the one generated by the corruption (and if
you cut and paste, is perhaps a generic glyph placed in the buffer by
your terminal to replace an invalid codepoint, rather than the actual
bytes)?

> I just wanted to know why git accepted a branch name which it can't
> identify later?
> 
> If it had rejected that name in the first place it would have been
> better. In case you would like to know how I got that weird name,
> here's a way to get that
> 
> $ echo '✓doc-fix' | cut -c3-100

  [a few defines to make it easy to prod git]
  $ check=$(printf '\342\234\223')
  $ broken=$(printf '\223')

  [this is your starting state, a branch with the unicode name]
  $ git branch ${check}doc-fix

  [you didn't say how your script works, so let's use git to rename]
  $ git branch -m ${check}doc-fix ${broken}doc-fix

  [my terminal doesn't show the unknown-character glyph, but we
   can see the funny character with "cat -A"]:
  $ git for-each-ref --format='%(refname)' | cat -A
  refs/heads/master$
  refs/heads/M-^Sdoc-fix$

  [and we can rename it using that knowledge]
  $ git branch ${broken}doc-fix doc-fix

-Peff


Re: Git *accepts* a branch name, it can't identity in the future?

2017-08-20 Thread Johannes Sixt

Am 20.08.2017 um 09:51 schrieb Kaartic Sivaraam:

I made a small assumption in the script which turned out to be false. I
thought the unicode prefixes I used corresponded to only two bytes.
This lead to the issue. The unicode character '✓' corresponds to three
characters and as a result instead of removing it, my script replaced
it with the unknown character '�'. So, the branch named '✓doc-fix'
became 'done/�doc-fix'. Here's the issue. I couldn't use

 $ git branch -m done/�doc-fix done/dic-fix

to rename the branch. Nor could I refer to it in anyway. Git simply
says,

 error: pathspec 'done/�doc-fix' did not match any file(s) known to git.

It's not a big issue as I haven't lost anything out of it. The branches
have been merged into 'master'.

I just wanted to know why git accepted a branch name which it can't
identify later?

If it had rejected that name in the first place it would have been
better. In case you would like to know how I got that weird name,
here's a way to get that

 $ echo '✓doc-fix' | cut -c3-100



See, these two are different:

$ echo '✓doc-fix' | cut -c3-100 | od -t x1
000 93 64 6f 63 2d 66 69 78 0a
011
$ echo '�doc-fix' | od -t x1
000 64 6f bd 64 6f 63 2d 66 69 78 0a
013

It is not Git's fault that your terminal converts an invalid UTF-8 
sequence (that your script produces) to �. Nor is it when you paste that 
character onto the command line, that it is passed as a (correct) UTF-8 
character.


Perhaps this helps (untested):

$ git branch -m done/$(printf '\x93')doc-fix done/dic-fix

In Git's database, branch names are just sequences of bytes. It is 
outside the scope to verify that all input is encoded correctly.


-- Hannes


Git *accepts* a branch name, it can't identity in the future?

2017-08-20 Thread Kaartic Sivaraam
Hello all,

First of all, I would like to tell that this happened completely by
accident and it's partly my mistake. Here's what happened.

I recently started creating 'feature branches' a lot for the few
patches that I sent to this mailing list. To identify the status of the
patch corresponding to that branch I prefixed them with special unicode
characters like ✓, ˅ etc. instead of using conventional hierarchical
names like, 'done/', 'archived/'.

Then I started finding it difficult to distinguish these unicode-
prefixed names probably because they had only one unicode character in
common. So, I thought of switching to the conventional way of using
scoped branch names (old is gold, you see). I wrote a tiny script to
rename the branches by replacing a specific unicode prefix with a
corresponding hierachy. For example, the script would convert a branch
named '✓doc-fix' to 'done/doc-fix'.

I made a small assumption in the script which turned out to be false. I
thought the unicode prefixes I used corresponded to only two bytes.
This lead to the issue. The unicode character '✓' corresponds to three
characters and as a result instead of removing it, my script replaced
it with the unknown character '�'. So, the branch named '✓doc-fix'
became 'done/�doc-fix'. Here's the issue. I couldn't use 

$ git branch -m done/�doc-fix done/dic-fix 

to rename the branch. Nor could I refer to it in anyway. Git simply
says,

error: pathspec 'done/�doc-fix' did not match any file(s) known to git.

It's not a big issue as I haven't lost anything out of it. The branches
have been merged into 'master'.

I just wanted to know why git accepted a branch name which it can't
identify later?

If it had rejected that name in the first place it would have been
better. In case you would like to know how I got that weird name,
here's a way to get that

$ echo '✓doc-fix' | cut -c3-100

-- 
Kaartic