Re: CRLF, LF ... CR ?
David Aguilar dav...@gmail.com writes: That said, perhaps the autocrlf code is simple enough that it could be easily tweaked to also handle this special case,... I wouldn't be surprised if it is quite simple. We (actually Linus, IIRC) simply declared from the get-go that it is not worth spending any line of code only to worry about pre OSX Macintosh when we did the end-of-line stuff, and nobody so far showed any need. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CRLF, LF ... CR ?
Hi Junio and David. Rule is in fact quite simple. If it's a text-file and it contains a LF, a CRLF or a CR, then that's a line-break. :) -So everywhere a LF is checked for, a CR should most likely be checked for. Usually, when checking for CRLF, one is looking for the LF. If a CR precedes the LF, the CR is discarded. It's done this way, in case we have a small buffer (say 100 bytes), and we read the file only 100 bytes each time. If we searched for CR instead, we can't check the next character without a lot of clumsy or slow coding. -But even today, I don't believe that a line would exceed 2000 characters; although, one could have a 16K buffer, then *if* the last character in that buffer is a CR, read ahead (just a single byte read would be acceptable in that case). Unfortunately, it seems that it's not only old Mac OS users that have the problem: http://stackoverflow.com/questions/10491564/git-and-cr-vs-lf-but-not-crlf ...That's a Windows user, who seem to be quite lazy - but if he already has a huge repository filled with CR, I do understand why he don't want to change things. Here, adding scanning for CR later, can still solve his problem, because the files are not modified. But on Mac OS X, there are also problems. If an application was written long ago (for Mac OS 9 or Carbon), it might still use CR instead of LF, if it's ported. Anyway, Linus is right about not just popping code in, because someone, somewhere needs it for a single use - even if that is me. ;) It's not leading to any kind of crashes if there's no scan for CR. But on the other hand, it saves the community from FAQ about how to get it working, and it's always an advantage not modifying files more than necessary, when dealing with a VCS/SCM. The safest way is to hunt for the LF, remembering the previous character. The following is just some untested code I wrote as an example on how it could approximately be done - it does not have to use this many lines. This is a buffer-based example: #define LF 0x0a #define CR 0x0d const char *buffer;/* buffer containing entire text file */ const char *b; /* beginning of line */ const char *s; /* source */ const char *e; /* end pointer */ charc; charlast; const char *l; /* pointer to last character */ s = buffer; e = s[length]; b = s; c = 0; while(s e) { last = c; c = *s++; if(LF == c) /* deal with Linux, UNIX and DOS */ { b = l; // we have a linefeed - new line. l = s; if(CR == last) { l = s - 1; } /* line contents are from b to l */ } else if(CR == last) /* deal with old Mac OS and other weirdos. ;) */ { b = l; l = s - 1; /* line contents are from b to l */ } } -As written above, it can be optimized. There's a small bug though; it doesn't scan if the very last character in the file is a CR. Love Jens On Wed, 26 Sep 2012 23:16:38 -0700, Junio C Hamano wrote: David Aguilar dav...@gmail.com writes: That said, perhaps the autocrlf code is simple enough that it could be easily tweaked to also handle this special case,... I wouldn't be surprised if it is quite simple. We (actually Linus, IIRC) simply declared from the get-go that it is not worth spending any line of code only to worry about pre OSX Macintosh when we did the end-of-line stuff, and nobody so far showed any need. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CRLF, LF ... CR ?
On Thu, Sep 13, 2012 at 9:51 PM, Junio C Hamano gits...@pobox.com wrote: David Aguilar dav...@gmail.com writes: git doesn't really even support LF. At the storage level that is correct, but the above is a bit of stretch. It may not be support, but git _does_ rely on LF when running many text oriented operations (a rough rule of thumb is does 'a line' in a file matter to the operation?). Think about git diff and git blame. Thanks for the thorough explanation. You're 100% correct, as always. I'll be honest: I had a small bias when responding. I didn't want anyone to think a autocr feature would be useful, so I played the git is really simple angle hoping it would put a kabosh on the idea. That was a little silly of me. That said, perhaps the autocrlf code is simple enough that it could be easily tweaked to also handle this special case, but I am not familiar with the code enough to say. My gut feeling was that it was too narrow a use case. I guess if someone[*] wanted to whip up a patch then it would be a different story, but it doesn't seem to be the itch of anyone around here so far. [*] Jens, that could be you ;-) cheers, -- David -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CRLF, LF ... CR ?
Hi David and Junio. At first, I was planning to reply that I'd probably not be qualified for that. But to tell the truth, I have been writing a lot of CR/LF/CRLF code throughout the years, so maybe I could do it. Unfortunately, I have to go slow about programming, because I burned myself out a number of times, so programming work is not compatible with me; though programming hobby is. The implementation would be dependent on on how git is currently handling lines. Worst case is, if it's mixed CR, LF and CRLF, such as a text-file, that contains all 3 kinds of line endings (because 3 different people have been editing the file). Love Jens On Wed, 26 Sep 2012 01:42:02 -0700, David Aguilar wrote: On Thu, Sep 13, 2012 at 9:51 PM, Junio C Hamano gits...@pobox.com wrote: David Aguilar dav...@gmail.com writes: git doesn't really even support LF. At the storage level that is correct, but the above is a bit of stretch. It may not be support, but git _does_ rely on LF when running many text oriented operations (a rough rule of thumb is does 'a line' in a file matter to the operation?). Think about git diff and git blame. Thanks for the thorough explanation. You're 100% correct, as always. I'll be honest: I had a small bias when responding. I didn't want anyone to think a autocr feature would be useful, so I played the git is really simple angle hoping it would put a kabosh on the idea. That was a little silly of me. That said, perhaps the autocrlf code is simple enough that it could be easily tweaked to also handle this special case, but I am not familiar with the code enough to say. My gut feeling was that it was too narrow a use case. I guess if someone[*] wanted to whip up a patch then it would be a different story, but it doesn't seem to be the itch of anyone around here so far. [*] Jens, that could be you ;-) cheers, -- David -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CRLF, LF ... CR ?
Hi David and Junio. Woops, that's what happens when deleting a block of lines in a message... The CR/LF/CRLF implementation depends a lot on if git is reading a stream or reading from memory. I'd like to correct the last line to read... Worst case is, if a file contains mixed CR, LF and CRLF, such as a text-file, that contains all 3 kinds of line endings (because 3 different people have been editing the file). Love Jens On Wed, 26 Sep 2012 12:12:39 +0200, Jens Bauer wrote: The implementation would be dependent on on how git is currently handling lines. Worst case is, if it's mixed CR, LF and CRLF, such as a text-file, that contains all 3 kinds of line endings (because 3 different people have been editing the file). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CRLF, LF ... CR ?
Hi Jeff and Drew. I've been messing a little with clean/smudge filters; I think I understand them partly. Let's call the file I have on the server that have cr line endings, mypcb.osm. If I clone the project, and do the following... $ cat mypcb.osm | tr '\r' '\n' I can read the file in the terminal window, otherwise it's just a one-line-file. So far, so good. In my home directory, I have a .gitconfig file, here's the interesting part: [core] editor = nano excludesfile = /Users/jens/.gitexcludes attributesfile = /Users/jens/.gitattributes [filter cr] clean = tr '\\r' '\\n' smudge = tr '\\n' '\\r' In my home directory I added .gitattributes: *.osm filter=cr I've verified that .gitattributes is read; because if I add two spaces, like *.osm filter = cr, I get an 'invalid filter name' error. I've also verified that the clean/smudge lines are read; if I only have '\n' for instance, I get an error. Now, when I clone the project, make a change and then issue this command... $ git diff mypcb.osm ...I get a strange diff. On line 3, one of the files shows a lot of control-m (cr) lines. After that, I see lf lines, all prefixed with a '+', as if they were added. I think I might be nearly there, just missing some obvious detail somewhere. Any hints ? Love Jens On Thu, 13 Sep 2012 17:53:00 +0200, Jens Bauer wrote: Hi Jeff and Drew. Thank you for your quick replies! :) The diffs look nasty yes; that's my main issue. It can be worked around in many ways; eg a simple (but time consuming) way: $ git diff mypcb.osm mypcb.diff nano mypcb.diff -It'd be better to just pipe it into a regex, which changes CR to LF on the fly. OsmondPCB is able to read files that has mixed LF and CR. (By mixed, I do not talk about CRLF) The files do not need line-by-line diffing, but I think it would make it more readable. Thank you very much for the hint on the clean/smudge filters. I'll have a look at it. =) Love Jens On Thu, 13 Sep 2012 11:43:10 -0400, Jeff King wrote: On Thu, Sep 13, 2012 at 11:34:50AM -0400, Drew Northup wrote: I've read that git supports two different line endings; either CRLF or LF, but it does not support CR. Would it make sense to add support for CR (if so, I hereby request it as a new feature) ? Even if Git can't do CRLF/LF translation on a file it will still store and track the content of it it just fine. In fact you probably want translation completely disabled in this case. Yeah. If the files always should just have CR, then just don't ask git to do any translation (by not setting the text attribute, or even setting -text if you have something like autocrlf turned on globally), and it will preserve the bytes exactly. I suspect diffs will look nasty because we won't interpret CR as a line-ending, though. Do the files actually need line-by-line diffing and merging? If not, then you are fine. If so, then it would probably be nice to store them with a canonical LF in the repository, but convert to CR on checkout. Git can't do that internally, but you could define clean/smudge filters to do so (see the section in git help attributes on Checking-out and checking-in; specifically the filter subsection). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CRLF, LF ... CR ?
On Thu, Sep 13, 2012 at 08:17:20PM +0200, Jens Bauer wrote: In my home directory, I have a .gitconfig file, here's the interesting part: [core] editor = nano excludesfile = /Users/jens/.gitexcludes attributesfile = /Users/jens/.gitattributes [filter cr] clean = tr '\\r' '\\n' smudge = tr '\\n' '\\r' In my home directory I added .gitattributes: *.osm filter=cr Looks right. Now, when I clone the project, make a change and then issue this command... $ git diff mypcb.osm ...I get a strange diff. On line 3, one of the files shows a lot of control-m (cr) lines. After that, I see lf lines, all prefixed with a '+', as if they were added. I think I might be nearly there, just missing some obvious detail somewhere. Yes, that's expected. The point of the clean filter is to convert your working tree file into a canonical (lf-only) representation inside the repository. But you've already made commits with the cr form in the repository. So you can choose one of: 1. Make a new commit with these settings, which will have the canonical format. Accept that the old history will be funny, but you will be OK from here on out. 2. Rewrite the old history to pretend that it was always LF. This gives you a nice clean history, but if you are collaborating with other people, they will need to rebase their work on the new history. See git help filter-branch for details. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CRLF, LF ... CR ?
Hi Jeff and Drew. Excellent. I now removed the repository from the server, removed it from my gitolite.conf, added it to gitolite.conf, re-initialized and it works. git diff shows what I wanted. Thank you *very* much for making my dream come true. :) -And thank you all for all the hard work you're doing. -Git is how all other open-source projects should be: Well-written and well-defined (oh - and fast!). :) Love Jens On Thu, 13 Sep 2012 14:23:44 -0400, Jeff King wrote: On Thu, Sep 13, 2012 at 08:17:20PM +0200, Jens Bauer wrote: In my home directory, I have a .gitconfig file, here's the interesting part: [core] editor = nano excludesfile = /Users/jens/.gitexcludes attributesfile = /Users/jens/.gitattributes [filter cr] clean = tr '\\r' '\\n' smudge = tr '\\n' '\\r' In my home directory I added .gitattributes: *.osm filter=cr Looks right. Now, when I clone the project, make a change and then issue this command... $ git diff mypcb.osm ...I get a strange diff. On line 3, one of the files shows a lot of control-m (cr) lines. After that, I see lf lines, all prefixed with a '+', as if they were added. I think I might be nearly there, just missing some obvious detail somewhere. Yes, that's expected. The point of the clean filter is to convert your working tree file into a canonical (lf-only) representation inside the repository. But you've already made commits with the cr form in the repository. So you can choose one of: 1. Make a new commit with these settings, which will have the canonical format. Accept that the old history will be funny, but you will be OK from here on out. 2. Rewrite the old history to pretend that it was always LF. This gives you a nice clean history, but if you are collaborating with other people, they will need to rebase their work on the new history. See git help filter-branch for details. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CRLF, LF ... CR ?
Hi Johannes. I've changed... tr '\\r' '\\n' ...to... tr '\\15' '\\12' ...As you are right in that it is more correct. (Then in theory, it would be portable). [I once came across tftpd, tried compiling it on a Mac, but it failed to work, because \r and \n were swapped on the compiler, so I asked the author to use \15 and \12, which made it fully portable] It now works even better. I can't and won't complain - thank you. =) Love Jens On Thu, 13 Sep 2012 20:34:08 +0200, Johannes Sixt wrote: Am 13.09.2012 17:53, schrieb Jens Bauer: Hi Jeff and Drew. Thank you for your quick replies! :) The diffs look nasty yes; that's my main issue. It can be worked around in many ways; eg a simple (but time consuming) way: $ git diff mypcb.osm mypcb.diff nano mypcb.diff -It'd be better to just pipe it into a regex, which changes CR to LF on the fly. OsmondPCB is able to read files that has mixed LF and CR. (By mixed, I do not talk about CRLF) That is good news. Just write a 'clean' filter that amounts to tr '\015' '\012' You don't need a 'smudge' filter that reverts this conversion. -- Hannes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CRLF, LF ... CR ?
On Thu, Sep 13, 2012 at 8:09 AM, Jens Bauer jens-li...@gpio.dk wrote: Hi everyone. I'm quite fond of git, and have used it for a while. Recently, I've started making printed circuit boards (PCBs) using an application called OsmondPCB (for Mac), and I'd like to use git to track changes on these. This application was originally written for the old Mac OS (Mac OS 6 to Mac OS 9.2). The old Mac OS does not use LF, nor CRLF for line endings, but only CR. I've read that git supports two different line endings; either CRLF or LF, but it does not support CR. Would it make sense to add support for CR (if so, I hereby request it as a new feature) ? The alternative is to ask the developer if he would change the file format, so that new versions of his software would change the files to end in LF, but he'd have to be careful not to break compatibility. If the software is to be changed, this would not fix similar issues that other people might have. Do you mean that you want automatic conversion from CR to LF? What's about just storing the files as-is, with no conversion at all? (this is the default git behavior) git doesn't really even support LF. It stores content as-is which means LF works just fine. git prefers to not mess around with the content, but we do have autocrlf to help folks stuck on windows. If you need to, you can use .gitattributes to add a clean/smudge filter that does this conversion for you. See the filter section for an example: http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html If you're serious about wanting that feature then we'll happily review any patches you might have. That said, I don't really think it's a common enough case for git to natively support, so I'd recommend going with the .gitattributes filter. good luck, -- David -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CRLF, LF ... CR ?
David Aguilar dav...@gmail.com writes: git doesn't really even support LF. At the storage level that is correct, but the above is a bit of stretch. It may not be support, but git _does_ rely on LF when running many text oriented operations (a rough rule of thumb is does 'a line' in a file matter to the operation?). Think about git diff and git blame. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html