Re: CRLF, LF ... CR ?

2012-09-27 Thread Junio C Hamano
David Aguilar dav...@gmail.com writes:

 That said, perhaps the autocrlf code is simple enough that it
 could be easily tweaked to also handle this special case,...

I wouldn't be surprised if it is quite simple.

We (actually Linus, IIRC) simply declared from the get-go that it is
not worth spending any line of code only to worry about pre OSX
Macintosh when we did the end-of-line stuff, and nobody so far
showed any need.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CRLF, LF ... CR ?

2012-09-27 Thread Jens Bauer
Hi Junio and David.

Rule is in fact quite simple.
If it's a text-file and it contains a LF, a CRLF or a CR, then that's a 
line-break. :)
-So everywhere a LF is checked for, a CR should most likely be checked for.
Usually, when checking for CRLF, one is looking for the LF. If a CR precedes 
the LF, the CR is discarded.
It's done this way, in case we have a small buffer (say 100 bytes), and we read 
the file only 100 bytes each time.
If we searched for CR instead, we can't check the next character without a lot 
of clumsy or slow coding.
-But even today, I don't believe that a line would exceed 2000 characters; 
although, one could have a 16K buffer, then *if* the last character in that 
buffer is a CR, read ahead (just a single byte read would be acceptable in that 
case).

Unfortunately, it seems that it's not only old Mac OS users that have the 
problem:
http://stackoverflow.com/questions/10491564/git-and-cr-vs-lf-but-not-crlf

...That's a Windows user, who seem to be quite lazy - but if he already has a 
huge repository filled with CR, I do understand why he don't want to change 
things.
Here, adding scanning for CR later, can still solve his problem, because the 
files are not modified.

But on Mac OS X, there are also problems. If an application was written long 
ago (for Mac OS 9 or Carbon), it might still use CR instead of LF, if it's 
ported.

Anyway, Linus is right about not just popping code in, because someone, 
somewhere needs it for a single use - even if that is me. ;)
It's not leading to any kind of crashes if there's no scan for CR.
But on the other hand, it saves the community from FAQ about how to get it 
working, and it's always an advantage not modifying files more than necessary, 
when dealing with a VCS/SCM.

The safest way is to hunt for the LF, remembering the previous character. The 
following is just some untested code I wrote as an example on how it could 
approximately be done - it does not have to use this many lines. This is a 
buffer-based example:

#define LF 0x0a
#define CR 0x0d

const char  *buffer;/* buffer containing entire 
text file */
const char  *b; /* beginning of line */
const char  *s; /* source */
const char  *e; /* end pointer */
charc;
charlast;
const char  *l; /* pointer to last 
character */

s = buffer;
e = s[length];
b = s;
c = 0;
while(s  e)
{
last = c;
c = *s++;
if(LF == c) /* deal with Linux, 
UNIX and DOS */
{
b = l;
// we have a linefeed - new line.
l = s;
if(CR == last)
{
l = s - 1;
}
/* line contents are from b to l */
}
else if(CR == last) /* deal with old Mac OS and 
other weirdos. ;) */
{
b = l;
l = s - 1;
/* line contents are from b to l */
}
}

-As written above, it can be optimized. There's a small bug though; it doesn't 
scan if the very last character in the file is a CR.


Love
Jens

On Wed, 26 Sep 2012 23:16:38 -0700, Junio C Hamano wrote:
 David Aguilar dav...@gmail.com writes:
 
 That said, perhaps the autocrlf code is simple enough that it
 could be easily tweaked to also handle this special case,...
 
 I wouldn't be surprised if it is quite simple.
 
 We (actually Linus, IIRC) simply declared from the get-go that it is
 not worth spending any line of code only to worry about pre OSX
 Macintosh when we did the end-of-line stuff, and nobody so far
 showed any need.
 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CRLF, LF ... CR ?

2012-09-26 Thread David Aguilar
On Thu, Sep 13, 2012 at 9:51 PM, Junio C Hamano gits...@pobox.com wrote:
 David Aguilar dav...@gmail.com writes:

 git doesn't really even support LF.

 At the storage level that is correct, but the above is a bit of
 stretch.  It may not be support, but git _does_ rely on LF when
 running many text oriented operations (a rough rule of thumb is
 does 'a line' in a file matter to the operation?).  Think about
 git diff and git blame.

Thanks for the thorough explanation.  You're 100% correct, as always.

I'll be honest: I had a small bias when responding.
I didn't want anyone to think a autocr feature would be useful,
so I played the git is really simple angle hoping it would
put a kabosh on the idea.  That was a little silly of me.

That said, perhaps the autocrlf code is simple enough that it
could be easily tweaked to also handle this special case, but
I am not familiar with the code enough to say.  My gut feeling
was that it was too narrow a use case.  I guess if someone[*]
wanted to whip up a patch then it would be a different story,
but it doesn't seem to be the itch of anyone around here so far.

[*] Jens, that could be you ;-)

cheers,
-- 
David
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CRLF, LF ... CR ?

2012-09-26 Thread Jens Bauer
Hi David and Junio.

At first, I was planning to reply that I'd probably not be qualified for that.
But to tell the truth, I have been writing a lot of CR/LF/CRLF code throughout 
the years, so maybe I could do it.
Unfortunately, I have to go slow about programming, because I burned myself out 
a number of times, so programming work is not compatible with me; though 
programming hobby is.

The implementation would be dependent on on how git is currently handling lines.
Worst case is, if it's mixed CR, LF and CRLF, such as a text-file, that 
contains all 3 kinds of line endings (because 3 different people have been 
editing the file).


Love
Jens

On Wed, 26 Sep 2012 01:42:02 -0700, David Aguilar wrote:
 On Thu, Sep 13, 2012 at 9:51 PM, Junio C Hamano gits...@pobox.com wrote:
 David Aguilar dav...@gmail.com writes:
 
 git doesn't really even support LF.
 
 At the storage level that is correct, but the above is a bit of
 stretch.  It may not be support, but git _does_ rely on LF when
 running many text oriented operations (a rough rule of thumb is
 does 'a line' in a file matter to the operation?).  Think about
 git diff and git blame.
 
 Thanks for the thorough explanation.  You're 100% correct, as always.
 
 I'll be honest: I had a small bias when responding.
 I didn't want anyone to think a autocr feature would be useful,
 so I played the git is really simple angle hoping it would
 put a kabosh on the idea.  That was a little silly of me.
 
 That said, perhaps the autocrlf code is simple enough that it
 could be easily tweaked to also handle this special case, but
 I am not familiar with the code enough to say.  My gut feeling
 was that it was too narrow a use case.  I guess if someone[*]
 wanted to whip up a patch then it would be a different story,
 but it doesn't seem to be the itch of anyone around here so far.
 
 [*] Jens, that could be you ;-)
 
 cheers,
 -- 
 David
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CRLF, LF ... CR ?

2012-09-26 Thread Jens Bauer
Hi David and Junio.

Woops, that's what happens when deleting a block of lines in a message...

The CR/LF/CRLF implementation depends a lot on if git is reading a stream or 
reading from memory.

I'd like to correct the last line to read...
Worst case is, if a file contains mixed CR, LF and CRLF, such as a text-file, 
that contains all 3 kinds of line endings (because 3 different people have been 
editing the file).


Love
Jens

On Wed, 26 Sep 2012 12:12:39 +0200, Jens Bauer wrote:
 The implementation would be dependent on on how git is currently 
 handling lines.
 Worst case is, if it's mixed CR, LF and CRLF, such as a text-file, 
 that contains all 3 kinds of line endings (because 3 different people 
 have been editing the file).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CRLF, LF ... CR ?

2012-09-13 Thread Jens Bauer
Hi Jeff and Drew.

I've been messing a little with clean/smudge filters; I think I understand them 
partly.

Let's call the file I have on the server that have cr line endings, 
mypcb.osm.
If I clone the project, and do the following...
$ cat mypcb.osm | tr '\r' '\n'
I can read the file in the terminal window, otherwise it's just a 
one-line-file.

So far, so good.
In my home directory, I have a .gitconfig file, here's the interesting part:
[core]
editor = nano
excludesfile = /Users/jens/.gitexcludes
attributesfile = /Users/jens/.gitattributes

[filter cr]
clean = tr '\\r' '\\n'
smudge = tr '\\n' '\\r'


In my home directory I added .gitattributes:
*.osm   filter=cr

I've verified that .gitattributes is read; because if I add two spaces, like 
*.osm filter = cr, I get an 'invalid filter name' error.
I've also verified that the clean/smudge lines are read; if I only have '\n' 
for instance, I get an error.

Now, when I clone the project, make a change and then issue this command...
$ git diff mypcb.osm

...I get a strange diff. On line 3, one of the files shows a lot of control-m 
(cr) lines.
After that, I see lf lines, all prefixed with a '+', as if they were added.

I think I might be nearly there, just missing some obvious detail somewhere.
Any hints ?


Love
Jens

On Thu, 13 Sep 2012 17:53:00 +0200, Jens Bauer wrote:
 Hi Jeff and Drew.
 
 Thank you for your quick replies! :)
 
 The diffs look nasty yes; that's my main issue.
 It can be worked around in many ways; eg a simple (but time consuming) way:
 $ git diff mypcb.osm mypcb.diff  nano mypcb.diff
 
 -It'd be better to just pipe it into a regex, which changes CR to LF 
 on the fly.
 
 OsmondPCB is able to read files that has mixed LF and CR. (By mixed, 
 I do not talk about CRLF)
 
 The files do not need line-by-line diffing, but I think it would make 
 it more readable.
 Thank you very much for the hint on the clean/smudge filters. I'll 
 have a look at it. =)
 
 
 Love
 Jens
 
 On Thu, 13 Sep 2012 11:43:10 -0400, Jeff King wrote:
 On Thu, Sep 13, 2012 at 11:34:50AM -0400, Drew Northup wrote:
 
 I've read that git supports two different line endings; either CRLF 
 or LF, but it does not support CR.
 Would it make sense to add support for CR (if so, I hereby request 
 it as a new feature) ?
 
 Even if Git can't do CRLF/LF translation on a file it will still store
 and track the content of it it just fine. In fact you probably want
 translation completely disabled in this case. 
 
 Yeah. If the files always should just have CR, then just don't ask git
 to do any translation (by not setting the text attribute, or even
 setting -text if you have something like autocrlf turned on globally),
 and it will preserve the bytes exactly. I suspect diffs will look nasty
 because we won't interpret CR as a line-ending, though.
 
 Do the files actually need line-by-line diffing and merging? If not,
 then you are fine.
 
 If so, then it would probably be nice to store them with a canonical LF
 in the repository, but convert to CR on checkout. Git can't do that
 internally, but you could define clean/smudge filters to do so (see the
 section in git help attributes on Checking-out and checking-in;
 specifically the filter subsection).
 
 -Peff
 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CRLF, LF ... CR ?

2012-09-13 Thread Jeff King
On Thu, Sep 13, 2012 at 08:17:20PM +0200, Jens Bauer wrote:

 In my home directory, I have a .gitconfig file, here's the interesting part:
 [core]
 editor = nano
 excludesfile = /Users/jens/.gitexcludes
 attributesfile = /Users/jens/.gitattributes
 
 [filter cr]
 clean = tr '\\r' '\\n'
 smudge = tr '\\n' '\\r'
 
 
 In my home directory I added .gitattributes:
 *.osm   filter=cr

Looks right.

 Now, when I clone the project, make a change and then issue this command...
 $ git diff mypcb.osm
 
 ...I get a strange diff. On line 3, one of the files shows a lot of control-m 
 (cr) lines.
 After that, I see lf lines, all prefixed with a '+', as if they were added.
 
 I think I might be nearly there, just missing some obvious detail somewhere.

Yes, that's expected.  The point of the clean filter is to convert
your working tree file into a canonical (lf-only) representation inside
the repository. But you've already made commits with the cr form in the
repository. So you can choose one of:

  1. Make a new commit with these settings, which will have the
 canonical format. Accept that the old history will be funny, but
 you will be OK from here on out.

  2. Rewrite the old history to pretend that it was always LF. This
 gives you a nice clean history, but if you are collaborating with
 other people, they will need to rebase their work on the new
 history. See git help filter-branch for details.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CRLF, LF ... CR ?

2012-09-13 Thread Jens Bauer
Hi Jeff and Drew.

Excellent. I now removed the repository from the server, removed it from my 
gitolite.conf, added it to gitolite.conf, re-initialized and it works.
git diff shows what I wanted.

Thank you *very* much for making my dream come true. :)
-And thank you all for all the hard work you're doing. -Git is how all other 
open-source projects should be: Well-written and well-defined (oh - and fast!). 
:)


Love
Jens

On Thu, 13 Sep 2012 14:23:44 -0400, Jeff King wrote:
 On Thu, Sep 13, 2012 at 08:17:20PM +0200, Jens Bauer wrote:
 
 In my home directory, I have a .gitconfig file, here's the 
 interesting part:
 [core]
 editor = nano
 excludesfile = /Users/jens/.gitexcludes
 attributesfile = /Users/jens/.gitattributes
 
 [filter cr]
 clean = tr '\\r' '\\n'
 smudge = tr '\\n' '\\r'
 
 
 In my home directory I added .gitattributes:
 *.osm   filter=cr
 
 Looks right.
 
 Now, when I clone the project, make a change and then issue this command...
 $ git diff mypcb.osm
 
 ...I get a strange diff. On line 3, one of the files shows a lot of 
 control-m (cr) lines.
 After that, I see lf lines, all prefixed with a '+', as if they 
 were added.
 
 I think I might be nearly there, just missing some obvious detail 
 somewhere.
 
 Yes, that's expected.  The point of the clean filter is to convert
 your working tree file into a canonical (lf-only) representation inside
 the repository. But you've already made commits with the cr form in the
 repository. So you can choose one of:
 
   1. Make a new commit with these settings, which will have the
  canonical format. Accept that the old history will be funny, but
  you will be OK from here on out.
 
   2. Rewrite the old history to pretend that it was always LF. This
  gives you a nice clean history, but if you are collaborating with
  other people, they will need to rebase their work on the new
  history. See git help filter-branch for details.
 
 -Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CRLF, LF ... CR ?

2012-09-13 Thread Jens Bauer
Hi Johannes.

I've changed...
tr '\\r' '\\n'
...to...
tr '\\15' '\\12'

...As you are right in that it is more correct. (Then in theory, it would be 
portable).
[I once came across tftpd, tried compiling it on a Mac, but it failed to work, 
because \r and \n were swapped on the compiler, so I asked the author to use 
\15 and \12, which made it fully portable]

It now works even better. I can't and won't complain - thank you. =)


Love
Jens

On Thu, 13 Sep 2012 20:34:08 +0200, Johannes Sixt wrote:
 Am 13.09.2012 17:53, schrieb Jens Bauer:
 Hi Jeff and Drew.
 
 Thank you for your quick replies! :)
 
 The diffs look nasty yes; that's my main issue.
 It can be worked around in many ways; eg a simple (but time consuming) way:
 $ git diff mypcb.osm mypcb.diff  nano mypcb.diff
 
 -It'd be better to just pipe it into a regex, which changes CR to LF 
 on the fly.
 
 OsmondPCB is able to read files that has mixed LF and CR. (By mixed, 
 I do not talk about CRLF)
 
 That is good news. Just write a 'clean' filter that amounts to
 
tr '\015' '\012'
 
 You don't need a 'smudge' filter that reverts this conversion.
 
 -- Hannes
 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CRLF, LF ... CR ?

2012-09-13 Thread David Aguilar
On Thu, Sep 13, 2012 at 8:09 AM, Jens Bauer jens-li...@gpio.dk wrote:
 Hi everyone.

 I'm quite fond of git, and have used it for a while.
 Recently, I've started making printed circuit boards (PCBs) using an 
 application called OsmondPCB (for Mac), and I'd like to use git to track 
 changes on these.
 This application was originally written for the old Mac OS (Mac OS 6 to Mac 
 OS 9.2).
 The old Mac OS does not use LF, nor CRLF for line endings, but only CR.

 I've read that git supports two different line endings; either CRLF or LF, 
 but it does not support CR.
 Would it make sense to add support for CR (if so, I hereby request it as a 
 new feature) ?
 The alternative is to ask the developer if he would change the file format, 
 so that new versions of his software would change the files to end in LF, but 
 he'd have to be careful not to break compatibility.
 If the software is to be changed, this would not fix similar issues that 
 other people might have.

Do you mean that you want automatic conversion from CR to LF?

What's about just storing the files as-is,
with no conversion at all? (this is the default git behavior)

git doesn't really even support LF.  It stores content as-is which
means LF works just fine.  git prefers to not mess around with the content,
but we do have autocrlf to help folks stuck on windows.

If you need to, you can use .gitattributes to add a clean/smudge filter
that does this conversion for you.

See the filter section for an example:

http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html

If you're serious about wanting that feature then we'll
happily review any patches you might have.  That said, I don't really
think it's a common enough case for git to natively support, so
I'd recommend going with the .gitattributes filter.

good luck,
-- 
David
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CRLF, LF ... CR ?

2012-09-13 Thread Junio C Hamano
David Aguilar dav...@gmail.com writes:

 git doesn't really even support LF.

At the storage level that is correct, but the above is a bit of
stretch.  It may not be support, but git _does_ rely on LF when
running many text oriented operations (a rough rule of thumb is
does 'a line' in a file matter to the operation?).  Think about
git diff and git blame.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html