Re: change the filetype from binary to text after the file is commited to a git repo

2017-07-24 Thread tonka3...@gmail.com
Thx jeff, i will try it tomorrow.

> Am 24.07.2017 um 22:32 schrieb Jeff King :
> 
> On Mon, Jul 24, 2017 at 10:26:22PM +0200, tonka3...@gmail.com wrote:
> 
>>> I'm not sure exactly what you're trying to accomplish. If you're unhappy
>>> with the file as utf-16, then you should probably convert to utf-8 as a
>>> single commit (since the diff will otherwise be unreadable) and then
>>> make further changes in utf-8.
> 
>> That was exactly what i'm searching for. The utf-16 back in the days
>> was by accident (thx to visual studio). So if the last commit and the
>> acutal change are both utf-8 the diff should work again.  Just for my
>> understanding. Git just take the bytes of the whole file on every
>> commit, so there is no general problem with that, the size of the
>> utf-16 is just twice as big as the utf-8 one, is that correct?
> 
> Right. The diff switching the encodings will be listed as "binary" (and
> you should write a good commit message explaining what's going on!), but
> then after that the changes to the utf-8 version will display as normal
> text.  Git only looks at the actual bytes being diffed, not older
> versions of the file.
> 
> -Peff


Re: change the filetype from binary to text after the file is commited to a git repo

2017-07-24 Thread Jeff King
On Mon, Jul 24, 2017 at 10:26:22PM +0200, tonka3...@gmail.com wrote:

> > I'm not sure exactly what you're trying to accomplish. If you're unhappy
> > with the file as utf-16, then you should probably convert to utf-8 as a
> > single commit (since the diff will otherwise be unreadable) and then
> > make further changes in utf-8.

> That was exactly what i'm searching for. The utf-16 back in the days
> was by accident (thx to visual studio). So if the last commit and the
> acutal change are both utf-8 the diff should work again.  Just for my
> understanding. Git just take the bytes of the whole file on every
> commit, so there is no general problem with that, the size of the
> utf-16 is just twice as big as the utf-8 one, is that correct?

Right. The diff switching the encodings will be listed as "binary" (and
you should write a good commit message explaining what's going on!), but
then after that the changes to the utf-8 version will display as normal
text.  Git only looks at the actual bytes being diffed, not older
versions of the file.

-Peff


Re: change the filetype from binary to text after the file is commited to a git repo

2017-07-24 Thread Jeff King
On Mon, Jul 24, 2017 at 09:02:12PM +0200, tonka3...@gmail.com wrote:

> There is no .gitattributes file in the repo. I think that the git
> heuristic will also detect utf-16 files as binary (in windows), so i
> think that is the reason why my file is binary (i have to check that
> tomorrow).

Correct. UTF-16 _is_ binary, if you are trying to include it alongside
ASCII content (like the rest of the text diff headers). The two cannot
mix.

> If i add a .gitattribute file i have the problem that git
> diff will treat the old and the new blob as utf-8, which generate
> garbage.

Git's diff doesn't look at encodings at all; it does a diff of the
actual bytes without respect to any encoding. So yes, if you use "-a" or
a gitattribute to ask git to show you the bytes, the UTF-16 is likely to
look like garbage (and a commit rewriting from utf-16 to utf-8 will
basically be a rewrite of the whole file contents).

> Do you have another idea?  Could it be possible to add only a space in
> code (utf-8) and then add the real content in a second commit, so the
> old and the new one are both utf-8?

I'm not sure exactly what you're trying to accomplish. If you're unhappy
with the file as utf-16, then you should probably convert to utf-8 as a
single commit (since the diff will otherwise be unreadable) and then
make further changes in utf-8.

If you need the file to remain utf-16 but you want more readable diffs
for those versions, you can ask git to convert to utf-8 before
performing the diff. Such a diff couldn't be applied, but would be
useful for reading. E.g., try:

  echo 'file diff=utf16' >.gitattributes
  git config diff.utf16.textconv 'iconv -f utf16 -t utf8'

You can read more about how this works in the "textconv" section of "git
help attributes".

Note that I'm relying on the external "iconv" tool to do the conversion
there. It's pretty standard on most Unix systems, but I don't know what
would be the best tool on Windows.

-Peff


Re: change the filetype from binary to text after the file is commited to a git repo

2017-07-24 Thread tonka3...@gmail.com
Hey Jeff,

Thx for your answer.

There is no .gitattributes file in the repo. I think that the git heuristic 
will also detect utf-16 files as binary (in windows), so i think that is the 
reason why my file is binary (i have to check that tomorrow). If i add a 
.gitattribute file i have the problem that git diff will treat the old and the 
new blob as utf-8, which generate garbage.

Do you have another idea?
Could it be possible to add only a space in code (utf-8) and then add the real 
content in a second commit, so the old and the new one are both utf-8?

> Am 24.07.2017 um 20:18 schrieb Jeff King :
> 
>> On Mon, Jul 24, 2017 at 07:11:06AM +0200, tonka tonka wrote:
>> 
>> I have a problem with an already committed file into my repo. This git
>> repo was converted from svn to git some years ago. Last week I have
>> change some lines in a file and I saw in the diff that it is marked as
>> binary (it's a simple .cpp file). I think on the first commit it was
>> detected as an utf-16 file (on windows). But no matter what I do I
>> can't get it back to a "normal text" text file (git does not detect
>> that), but I is now only utf-8. I also replace the whole content of
>> the file with just 'a' and git say it's binary.
> 
> Git doesn't store a flag for "binary-ness" on each file (though see
> below). As the diffs are generated on the fly when you ask to compare
> two versions, so too is the determination of "is this binary".
> 
> The default heuristic looks at file size (by default, if the file is
> over 500MB it's considered binary) and whether it has any zero-byte
> characters in the first few kilobytes. But note that if _either_ side of
> a diff is considered binary, then Git won't show a text diff.
> 
> If you want a particular diff to show all content, even if it doesn't
> look like text, add "-a" to your git invocation (e.g., "git show -a").
> 
> That said, you can also use .gitattributes (see "git help attributes")
> to mark a file as binary or not-binary, skipping the heuristic check.
> I'm guessing since you converted from svn that you don't have a
> .gitattributes file, but it's possible that somebody later added one
> that marks the file as binary (and so the solution would be to drop that
> entry).
> 
> -Peff


Re: change the filetype from binary to text after the file is commited to a git repo

2017-07-24 Thread Jeff King
On Mon, Jul 24, 2017 at 07:11:06AM +0200, tonka tonka wrote:

> I have a problem with an already committed file into my repo. This git
> repo was converted from svn to git some years ago. Last week I have
> change some lines in a file and I saw in the diff that it is marked as
> binary (it's a simple .cpp file). I think on the first commit it was
> detected as an utf-16 file (on windows). But no matter what I do I
> can't get it back to a "normal text" text file (git does not detect
> that), but I is now only utf-8. I also replace the whole content of
> the file with just 'a' and git say it's binary.

Git doesn't store a flag for "binary-ness" on each file (though see
below). As the diffs are generated on the fly when you ask to compare
two versions, so too is the determination of "is this binary".

The default heuristic looks at file size (by default, if the file is
over 500MB it's considered binary) and whether it has any zero-byte
characters in the first few kilobytes. But note that if _either_ side of
a diff is considered binary, then Git won't show a text diff.

If you want a particular diff to show all content, even if it doesn't
look like text, add "-a" to your git invocation (e.g., "git show -a").

That said, you can also use .gitattributes (see "git help attributes")
to mark a file as binary or not-binary, skipping the heuristic check.
I'm guessing since you converted from svn that you don't have a
.gitattributes file, but it's possible that somebody later added one
that marks the file as binary (and so the solution would be to drop that
entry).

-Peff


change the filetype from binary to text after the file is commited to a git repo

2017-07-23 Thread tonka tonka
Hey everybody,

I have a problem with an already committed file into my repo. This git
repo was converted from svn to git some years ago. Last week I have
change some lines in a file and I saw in the diff that it is marked as
binary (it's a simple .cpp file). I think on the first commit it was
detected as an utf-16 file (on windows). But no matter what I do I
can't get it back to a "normal text" text file (git does not detect
that), but I is now only utf-8. I also replace the whole content of
the file with just 'a' and git say it's binary.


Is the only way to get it back to text-mode?:
* copy a utf-8 version of the original file
* delete the file
* make a commit
* add the old file as a new one

I think that will work but it will also break my history.

Is there a better way to get these behavior without losing history?

Best regards
Tonka