Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Guido van Rossum [EMAIL PROTECTED] wrote: Does anyone else have the feeling that discussions with Mr. MacLaren don't usually bear any fruit? Yes. I do. My ability to predict the (technical) future is good; my ability to persuade people of it is almost non-existent. However, when an almost identical thread to this one occurs in a decade's time, I shall be safely retired. And, assuming no changes to the basic models, when one occurs in twenty years' time, I shall be forgotten. Such is life. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761Fax: +44 1223 334679 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Greg Ewing [EMAIL PROTECTED] wrote: I don't know PRECISELY what you mean by universal newlines mode I mean precisely what Python means by the term: any of \r, \n or \r\n represent a newline, and no distinction is made between them. Excellent. While this over-simplifies the issue, let's stick to the over-simplified form, as we may be able to get somewhere. The question is independent of what the outside system believes a text file should look like, and is solely what Python believes a sequence of characters should mean. For example, does 'A\r\nB' mean that B is separated from A by one newline or two? The point is that, once we know that, we can design a translator to and from Python's conventions to any reasonable system (and, as I say, I have done it many times). But, if Python's own interpretation is ambiguous, it is a sure recipe for different translators being incompatible, even on the same system. Which is what has happened here. So, damn the outside system, EXACTLY what does Python mean by such characters, and EXACTLY what uses of them are discouraged as having unspecified meanings? If we could get an answer to that precisely enough to write a parse tree with all terminals explicit, this problem would go away. And that is all that I say can or should be done. The details of how to write the translators to other file systems are then a separate matter. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761Fax: +44 1223 334679 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
On 01/10/2007, Nick Maclaren [EMAIL PROTECTED] wrote: So, damn the outside system, EXACTLY what does Python mean by such characters, and EXACTLY what uses of them are discouraged as having unspecified meanings? If we could get an answer to that precisely enough to write a parse tree with all terminals explicit, this problem would go away. Python, the language, means nothing by the characters. They are bytes with defined values in a byte string (in 2.x, in 3.0 they are Unicode characters, but otherwise no difference). The *language* places no interpretation on them. Certain library functions place an interpretation on the byte values, but you need to read the function definition for that. And (a) they may not all be consistent, and (b) they may say follows platform behaviour, but that's the way it is, so you have to live with it. Paul. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Paul Moore [EMAIL PROTECTED] wrote: So, damn the outside system, EXACTLY what does Python mean by such characters, and EXACTLY what uses of them are discouraged as having unspecified meanings? If we could get an answer to that precisely enough to write a parse tree with all terminals explicit, this problem would go away. Python, the language, means nothing by the characters. They are bytes with defined values in a byte string (in 2.x, in 3.0 they are Unicode characters, but otherwise no difference). The *language* places no interpretation on them. Actually, it's not that simple, because of the universal newline rule and the fact that both Unix/C ASCII and Unicode DO provide meanings for their characters, but let that pass. Your statement is not far off the situation. Certain library functions place an interpretation on the byte values, but you need to read the function definition for that. And (a) they may not all be consistent, and (b) they may say follows platform behaviour, but that's the way it is, so you have to live with it. And that is why there will continue to be confusion and inconsistency, and why there will be similar threads to this for the foreseeable future. If you regard continuing problems of this sort as acceptable, then fine, but I am pointing out that they are fairly easy to avoid. But only by specifying a precise Python model. Incidentally, the response (b) you give is a common one, but isn't usually correct when it is given. It is, after all, the cause of the problem that started this thread. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761Fax: +44 1223 334679 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Michael Foord wrote: Steven Bethard wrote: On 9/29/07, Michael Foord [EMAIL PROTECTED] wrote: Terry Reedy wrote: There are two normal ways for internal Python text to have \r\n: 1. Read from a file with \r\r\n. Then \r\r\n is correct output (on the same platform). 2. Intentially put there by a programmer. If s/he also chooses default \n translation on output, \rtranslation of \n is correct. Actually, I usually get these strings from Windows UI components. A file containing '\r\n' is read in with '\r\n' being translated to '\n'. New user input is added containing '\r\n' line endings. The file is written out and now contains a mix of '\r\n' and '\r\r\n'. Out of curiosity, why don't the Python wrappers for your Windows UI components do the appropriate '\r\n' - '\n' conversions? One of the great things about IronPython is that you don't *need* any wrappers - you access .NET objects natively (which in fact wrap the lower level win32 API) - and the .NET APIs are usually not as bad as you probably assume. ;-) This thread might represent an argument that you *do* need wrappers ... You just have to be aware that line endings are '\r\n'. I'm not sure how or if pywin32 handles this. Presumably that awareness should be implemented by the unnecessary wrappers. regards Steve -- Steve Holden+1 571 484 6266 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden Sorry, the dog ate my .sigline ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Steve Holden wrote: Michael Foord wrote: Steven Bethard wrote: On 9/29/07, Michael Foord [EMAIL PROTECTED] wrote: Terry Reedy wrote: There are two normal ways for internal Python text to have \r\n: 1. Read from a file with \r\r\n. Then \r\r\n is correct output (on the same platform). 2. Intentially put there by a programmer. If s/he also chooses default \n translation on output, \rtranslation of \n is correct. Actually, I usually get these strings from Windows UI components. A file containing '\r\n' is read in with '\r\n' being translated to '\n'. New user input is added containing '\r\n' line endings. The file is written out and now contains a mix of '\r\n' and '\r\r\n'. Out of curiosity, why don't the Python wrappers for your Windows UI components do the appropriate '\r\n' - '\n' conversions? One of the great things about IronPython is that you don't *need* any wrappers - you access .NET objects natively (which in fact wrap the lower level win32 API) - and the .NET APIs are usually not as bad as you probably assume. ;-) This thread might represent an argument that you *do* need wrappers ... You just have to be aware that line endings are '\r\n'. I'm not sure how or if pywin32 handles this. Presumably that awareness should be implemented by the unnecessary wrappers. Well, it's an OS level difference and I thought that in general Python *doesn't* try to protect you from OS differences. These different line endings are returned by the components - and making the string type aware of where it comes from and transform itself accordingly seems odd. It also leaves you with all sorts of other problems like string comparison (do you ignore difference in line endings?), string length (on different sides of the .NET / IronPython strings would report different lengths). It is also different from how libraries like wxPython behave - where they *don't* protect you from OS differences and if a textbox has '\r\n' line endings - that is what you get... Michael http://www.manning.com/foord regards Steve ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Well, it's an OS level difference and I thought that in general Python *doesn't* try to protect you from OS differences. I think that's the key point. In general, Python tries to present a translucent interface to the OS in which OS differences can show through, in contrast to other languages (Java?) which try to present a complete abstraction of the underlying environment. This makes Python in general more useful, thought it also makes it harder to write portable code in Python, because you have to be aware of the potential differences (and they aren't particularly well documented -- it's not clear that they can be). Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Guido van Rossum wrote: The best solution for IronPython is probably to have the occasional wrapper around .NET APIs that translates between \r\n and \n on the boundary between Python and .NET; That's probably true. I was responding to the notion that IronPython shouldn't need any wrappers. To make that really true would require IronPython to become a different language that has a different canonical representation of newlines. It's fine with me to keep things as they are. -- Greg Ewing, Computer Science Dept, +--+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | [EMAIL PROTECTED] +--+ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Nick Maclaren wrote: if Python's own interpretation is ambiguous, it is a sure recipe for different translators being incompatible, Python's own interpretation is not ambiguous. The problem at hand is people wanting to use some random mixture of Python and .NET conventions. -- Greg Ewing, Computer Science Dept, +--+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | [EMAIL PROTECTED] +--+ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Michael Foord wrote: It is also different from how libraries like wxPython behave - where they *don't* protect you from OS differences and if a textbox has '\r\n' line endings - that is what you get... That sounds like an undesirable deficiency of those library wrappers, especially cross-platform ones like wxPython. -- Greg Ewing, Computer Science Dept, +--+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | [EMAIL PROTECTED] +--+ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Nick Maclaren [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] | The question is independent of what the outside system believes a | text file should look like, and is solely what Python believes a | sequence of characters should mean. For example, does 'A\r\nB' | mean that B is separated from A by one newline or two? The grammar presupposes that Python code is divided into lines. Any successful interpreter must adjust to the external source's idea of line endings. This is implementation, not language definition. The grammar itself has no notion of structure within Python string objects. The split method lets one define anything as chunk separators. The builtin compile method that uses strings as code input specifies \n and only \n as a line ending. The universal line-ending model of string output to files does the same. So from either viewpoint, the unambiguous answer to your question is 'one'. Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Does anyone else have the feeling that discussions with Mr. MacLaren don't usually bear any fruit? -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Greg Ewing [EMAIL PROTECTED] wrote: Grrk. That's the problem. You don't get back what you have written You do as long as you *don't* use universal newlines mode for reading. This is the best that can be done, because universal newlines are inherently ambiguous. I don't know PRECISELY what you mean by universal newlines mode, and this issue is all about the details, so any response would merely enhance the confusion. If you want universal newlines, you just have to accept that you can't also have \r characters meaning something other than newlines in your files. This is true regardless of what programming language or I/O model is being used. No, that is not true, and I have used more than one model where it wasn't. Let's stick to models where newlines are special characters - I prefer the ones where they are not, but that is by the way. Model 1: certain characters can be used only in combination. E.g. \f must occur immediately before (or after) a \n, which it modifies. r is either a newline-with-overprint or must be associated with a \n. In both cases, only ONE of the alternatives is permitted in the chosen model - the other use then becomes an error (and raises an exception). Model 2: (BCPL) there are a variety of newline characters, \n for plain newline, \f for newline-with-form-feed and \r for newline- with-overprint. ALL cause a newline, with the associated property. Note that the above is what the program sees - what is written to the outside world and how input is read is another matter. But I can assure you, from my own and many other people's experience, that neither of the above models cause the confusion being shown by the postings in this thread. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761Fax: +44 1223 334679 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Michael Actually, I usually get these strings from Windows UI Michael components. A file containing '\r\n' is read in with '\r\n' Michael being translated to '\n'. New user input is added containing Michael '\r\n' line endings. The file is written out and now contains a Michael mix of '\r\n' and '\r\r\n'. So you need a translation layer between the UI component and your code. Treat the component as a text file and perform the desired mapping. Yes? Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
[EMAIL PROTECTED] wrote: Michael Actually, I usually get these strings from Windows UI Michael components. A file containing '\r\n' is read in with '\r\n' Michael being translated to '\n'. New user input is added containing Michael '\r\n' line endings. The file is written out and now contains a Michael mix of '\r\n' and '\r\r\n'. So you need a translation layer between the UI component and your code. Treat the component as a text file and perform the desired mapping. Yes? Actually the problem was reported by one of the IronPython developers on behalf of another user. We stick to using the .NET file I/O and so don't have a problem. The only time it is an issue for us is our tests, where we have string literals in our test code (where new lines are obviously '\n') and we do a manual 'replace'. Not very difficult. It is just slightly ironic that the time Python 'gets it wrong' (for some value of wrong) is when you are using text mode for I/O :-) Michael Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Michael Foord [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: Michael Actually, I usually get these strings from Windows UI Michael components. A file containing '\r\n' is read in with '\r\n' Michael being translated to '\n'. New user input is added containing Michael '\r\n' line endings. The file is written out and now contains a Michael mix of '\r\n' and '\r\r\n'. So you need a translation layer between the UI component and your code. Treat the component as a text file and perform the desired mapping. Yes? Actually the problem was reported by one of the IronPython developers on behalf of another user. We stick to using the .NET file I/O and so don't have a problem. The only time it is an issue for us is our tests, where we have string literals in our test code (where new lines are obviously '\n') and we do a manual 'replace'. Not very difficult. It is just slightly ironic that the time Python 'gets it wrong' (for some value of wrong) is when you are using text mode for I/O :-) Plus ca change, That has been the problem for as long as I have been using the byte stream model (nearly 40 years now). Provided that you can get control, OR there are well-defined semantics, you can sort things out. The semantics we define only the trivial case, and the programmer must do something arcane, undefined and system-dependent for the rest means that it is impossible for an interface to do the 'right' translation unless it knows what each side of it is assuming. As I say, there are solutions. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: [EMAIL PROTECTED] Tel.: +44 1223 334761Fax: +44 1223 334679 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Nick Maclaren wrote: I don't know PRECISELY what you mean by universal newlines mode I mean precisely what Python means by the term: any of \r, \n or \r\n represent a newline, and no distinction is made between them. You only need to use that if you don't know what convention is being used by the file you're reading. And if you don't know that, you've already lost information about what the contents of the file means, and there's nothing that any I/O system can do to get it back. Model 1: certain characters can be used only in combination. ... That's all fine if you know the file adheres to those conventions. Just open it in binary mode and go for it. The I/O systems of C and/or Python are designed for environments where the files *don't* adhere to conventions as helpful as that. They're making the best of what they're given. Note that the above is what the program sees - what is written to the outside world and how input is read is another matter. But I can assure you, from my own and many other people's experience, that neither of the above models cause the confusion being shown by the postings in this thread. There's no confusion about how newlines are represented *inside* a Python program. The convention is quite clear - a newline is \n and only \n. Confusion only arises when people try to process strings internally that don't adhere to that convention. -- Greg Ewing, Computer Science Dept, +--+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | [EMAIL PROTECTED] +--+ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Michael Foord wrote: We stick to using the .NET file I/O and so don't have a problem. The only time it is an issue for us is our tests, where we have string literals in our test code (where new lines are obviously '\n') If you're going to do that, you really need to be consistent about and have IronPython use \r\n internally for line endings *everywhere*, including string literals. It is just slightly ironic that the time Python 'gets it wrong' (for some value of wrong) is when you are using text mode for I/O :-) I would say IronPython is getting it wrong by using inconsistent internal representations of line endings. -- Greg Ewing, Computer Science Dept, +--+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | [EMAIL PROTECTED] +--+ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
On 9/30/07, Greg Ewing [EMAIL PROTECTED] wrote: Michael Foord wrote: We stick to using the .NET file I/O and so don't have a problem. The only time it is an issue for us is our tests, where we have string literals in our test code (where new lines are obviously '\n') If you're going to do that, you really need to be consistent about and have IronPython use \r\n internally for line endings *everywhere*, including string literals. I don't know what you mean by internally. There's lots of portable code that uses the \n character in string literals (either to generate line endings or to recognize them). That code can't suddenly be made invalid. And changing all string literals that say \n to secretly become \r\n would be worse than the \r -- \n swap that some old Apple tools used to do. (If len(\n) == 2, what would len(\r\n) be?) It is just slightly ironic that the time Python 'gets it wrong' (for some value of wrong) is when you are using text mode for I/O :-) I would say IronPython is getting it wrong by using inconsistent internal representations of line endings. Honestly, I find it hard to see much merit in this discussion. A number of Python libraries, including print() and io.py, use \n to represent line endings in memory, and translate these to/from platform-appropriate line endings when reading/writing text files. OTOH, some other APIs, for example, sockets talking various internet protocols (from SMTP to HTTP) as well as most (all?) native .NET APIs, use \r\n to represent line endings. There are any number of ways to convert between these conversions, including various invocations of s.replace() and s.splitlines() (the latter does a universal-newlines-like thing). Applications can take care of this, and APIs can choose to use either convention for line endings (or both, in the case of input). Yes, occasionally users get confused. Too bad. They'll have to learn about this issue. The issue isn't going away by wishing it to go away; it is a fundamental difference between Windows and Unix, and neither is likely to change or disappear. Changing Python to use the Windows convention internally isn't going to help one bit. Changing Python to use the platforn's convention is impossible without introducing a new string escape that would mean \r\n on Windows and \n on Unix; and given that there are legitimate reasons to sometimes deal with \r\n explicitly even on Unix (and with just \n even on Windows) we wouldn't be completely isolated from the issue. Changing APIs to not represent the line ending as a character (as the Java I/O libraries do) would be too big a change (and how would we distinguish between readline() returning an empty line and EOF?) -- and I'm sure the issue still pops up in plenty of places in Java. The best solution for IronPython is probably to have the occasional wrapper around .NET APIs that translates between \r\n and \n on the boundary between Python and .NET; but one must be able to turn this off or bypass the wrappers in cases where the data retrieved from one .NET API is just passed straight on to another .NET API (and the translation would just cause two redundant copies being made). Get used to it. End of discussion. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Guido van Rossum wrote: [snip..] Python *does* have its own I/O model. There are binary files and text files. For binary files, you write bytes and the semantic model is that of an array of bytes; byte indices are seek positions. For text files, the contents is considered to be Unicode, encoded as bytes in a binary file. So text file always has an underlying binary file. Two translations take place, both of which have defaults varying by platform. One translation is encoding Unicode text into bytes upon output, and decoding bytes to Unicode text upon input. This can use any encoding supported by the encodings package. The other translation deals with line endings. Upon input, any of \r\n, \r, or \n is translated to a single \n by default (this is nhe universal newlines algorithm from Python 2.x). This can be tweaked or disabled. Upon output, \n is translated into a platform specific string chosen from \r\n, \r, or \n. This can also be disabled or overridden. Note that \r, when written, is never treated specially; if you want special processing for \r on output, you can write your own translation layer. So the question is, that when a string containing '\r\n' is written to a file in text mode on a Windows platform, should it be written with the encoded representation of '\r\n' or '\r\r\n'? Purity would dictate the latter and practicality the former (IMO)... However, that would mean that round tripping a string would change it ('\r\n' would be written as '\r\n' and then read as '\n') - on the other hand (particularly given that we are treating the data as text and not a binary blob) I don't see how writing '\r\r\n' would ever actually be useful in text. +1 on just writing '\r\n' from me. Michael Foord http://www.manning.com/foord That's all. There is nothing unimplementable or confusing in these specifications. Python doesn't care about record I/O on legacy OSes; it does care about variability found in practice between popular OSes. Note that \r, \n and friends in Python 3000 are either ASCII (in bytes literals) or Unicode (in text literals). Again, no support for legacy systems that don't use ASCII or a superset. Legacy OSes are called that for a reason. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Michael Foord [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] | Guido van Rossum wrote: [snip first part of nice summary of Python i/o model] | The other translation deals with line endings. Upon input, any of | \r\n, \r, or \n is translated to a single \n by default (this is nhe [sic] | universal newlines algorithm from Python 2.x). This can be tweaked | or disabled. Upon output, \n is translated into a platform specific | string chosen from \r\n, \r, or \n. This can also be disabled or | overridden. Note that \r, when written, is never treated specially; if | you want special processing for \r on output, you can write your own | translation layer. | So the question is, that when a string containing '\r\n' is written to a | file in text mode on a Windows platform, should it be written with the | encoded representation of '\r\n' or '\r\r\n'? I think Guido pretty clearly said that on output, the default behavior is that \r is nothing special. If you want a special case exception, write a special case translator. +1 from me. To propose otherwise is to propose that the default semantic meaning of Python text objects depend on the platform that it might be output-translated for. I believe the point of universal newline support was to get away from this. | Purity would dictate the latter and practicality the former (IMO)... I disagree. Special case exceptions complicate both learnability and code readability and maintainability. Simplicity is practicality. The symmetry of 'platform-line-endings =input \n =output plaform-line-endings' is both pure and practical. | However, that would mean that round tripping a string would change it | ('\r\n' would be written as '\r\n' and then read as '\n') Whereas \r\r\n would be read back as \r\n, which is what should happen. Round-trip-ability is practical to me. | - on the other | hand (particularly given that we are treating the data as text and not a | binary blob) I don't see how writing '\r\r\n' would ever actually be | useful in text. There are two normal ways for internal Python text to have \r\n: 1. Read from a file with \r\r\n. Then \r\r\n is correct output (on the same platform). 2. Intentially put there by a programmer. If s/he also chooses default \n translation on output, \rtranslation of \n is correct. The leaves 1. Bugs due to ignorance or accident. These should be repaired. 2. Other special situations, which can be handled by disabling, overriding, and layering the defaults. This seems enough flexibility to me. Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Terry Reedy wrote: Michael Foord [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] | Guido van Rossum wrote: [snip first part of nice summary of Python i/o model] | The other translation deals with line endings. Upon input, any of | \r\n, \r, or \n is translated to a single \n by default (this is nhe [sic] | universal newlines algorithm from Python 2.x). This can be tweaked | or disabled. Upon output, \n is translated into a platform specific | string chosen from \r\n, \r, or \n. This can also be disabled or | overridden. Note that \r, when written, is never treated specially; if | you want special processing for \r on output, you can write your own | translation layer. | So the question is, that when a string containing '\r\n' is written to a | file in text mode on a Windows platform, should it be written with the | encoded representation of '\r\n' or '\r\r\n'? I think Guido pretty clearly said that on output, the default behavior is that \r is nothing special. If you want a special case exception, write a special case translator. +1 from me. To propose otherwise is to propose that the default semantic meaning of Python text objects depend on the platform that it might be output-translated for. I believe the point of universal newline support was to get away from this. | Purity would dictate the latter and practicality the former (IMO)... I disagree. Special case exceptions complicate both learnability and code readability and maintainability. Simplicity is practicality. The symmetry of 'platform-line-endings =input \n =output plaform-line-endings' is both pure and practical. | However, that would mean that round tripping a string would change it | ('\r\n' would be written as '\r\n' and then read as '\n') Whereas \r\r\n would be read back as \r\n, which is what should happen. Round-trip-ability is practical to me. | - on the other | hand (particularly given that we are treating the data as text and not a | binary blob) I don't see how writing '\r\r\n' would ever actually be | useful in text. There are two normal ways for internal Python text to have \r\n: 1. Read from a file with \r\r\n. Then \r\r\n is correct output (on the same platform). 2. Intentially put there by a programmer. If s/he also chooses default \n translation on output, \rtranslation of \n is correct. Actually, I usually get these strings from Windows UI components. A file containing '\r\n' is read in with '\r\n' being translated to '\n'. New user input is added containing '\r\n' line endings. The file is written out and now contains a mix of '\r\n' and '\r\r\n'. Michael ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
On 9/29/07, Michael Foord [EMAIL PROTECTED] wrote: Terry Reedy wrote: There are two normal ways for internal Python text to have \r\n: 1. Read from a file with \r\r\n. Then \r\r\n is correct output (on the same platform). 2. Intentially put there by a programmer. If s/he also chooses default \n translation on output, \rtranslation of \n is correct. Actually, I usually get these strings from Windows UI components. A file containing '\r\n' is read in with '\r\n' being translated to '\n'. New user input is added containing '\r\n' line endings. The file is written out and now contains a mix of '\r\n' and '\r\r\n'. Out of curiosity, why don't the Python wrappers for your Windows UI components do the appropriate '\r\n' - '\n' conversions? STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Steven Bethard wrote: On 9/29/07, Michael Foord [EMAIL PROTECTED] wrote: Terry Reedy wrote: There are two normal ways for internal Python text to have \r\n: 1. Read from a file with \r\r\n. Then \r\r\n is correct output (on the same platform). 2. Intentially put there by a programmer. If s/he also chooses default \n translation on output, \rtranslation of \n is correct. Actually, I usually get these strings from Windows UI components. A file containing '\r\n' is read in with '\r\n' being translated to '\n'. New user input is added containing '\r\n' line endings. The file is written out and now contains a mix of '\r\n' and '\r\r\n'. Out of curiosity, why don't the Python wrappers for your Windows UI components do the appropriate '\r\n' - '\n' conversions? One of the great things about IronPython is that you don't *need* any wrappers - you access .NET objects natively (which in fact wrap the lower level win32 API) - and the .NET APIs are usually not as bad as you probably assume. ;-) You just have to be aware that line endings are '\r\n'. I'm not sure how or if pywin32 handles this. Michael STeVe ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
On 9/29/07, Michael Foord [EMAIL PROTECTED] wrote: Steven Bethard wrote: On 9/29/07, Michael Foord [EMAIL PROTECTED] wrote: Terry Reedy wrote: There are two normal ways for internal Python text to have \r\n: 1. Read from a file with \r\r\n. Then \r\r\n is correct output (on the same platform). 2. Intentially put there by a programmer. If s/he also chooses default \n translation on output, \rtranslation of \n is correct. Actually, I usually get these strings from Windows UI components. A file containing '\r\n' is read in with '\r\n' being translated to '\n'. New user input is added containing '\r\n' line endings. The file is written out and now contains a mix of '\r\n' and '\r\r\n'. Out of curiosity, why don't the Python wrappers for your Windows UI components do the appropriate '\r\n' - '\n' conversions? One of the great things about IronPython is that you don't *need* any wrappers - you access .NET objects natively (which in fact wrap the lower level win32 API) - and the .NET APIs are usually not as bad as you probably assume. ;-) You just have to be aware that line endings are '\r\n'. Ahh, I see. So all the .NET components function like Python 3.0's io.open(..., newline='\n'), where no translation of \n (to or from \r\n) is performed. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Steven Bethard wrote: On 9/29/07, Michael Foord [EMAIL PROTECTED] wrote: Steven Bethard wrote: On 9/29/07, Michael Foord [EMAIL PROTECTED] wrote: Terry Reedy wrote: There are two normal ways for internal Python text to have \r\n: 1. Read from a file with \r\r\n. Then \r\r\n is correct output (on the same platform). 2. Intentially put there by a programmer. If s/he also chooses default \n translation on output, \rtranslation of \n is correct. Actually, I usually get these strings from Windows UI components. A file containing '\r\n' is read in with '\r\n' being translated to '\n'. New user input is added containing '\r\n' line endings. The file is written out and now contains a mix of '\r\n' and '\r\r\n'. Out of curiosity, why don't the Python wrappers for your Windows UI components do the appropriate '\r\n' - '\n' conversions? One of the great things about IronPython is that you don't *need* any wrappers - you access .NET objects natively (which in fact wrap the lower level win32 API) - and the .NET APIs are usually not as bad as you probably assume. ;-) You just have to be aware that line endings are '\r\n'. Ahh, I see. So all the .NET components function like Python 3.0's io.open(..., newline='\n'), where no translation of \n (to or from \r\n) is performed. Effectively yes. Although for Python compatibility, opening a file in text mode using the python 'open' or 'file' will behave in the usual way. Michael STeVe ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Actually, I usually get these strings from Windows UI components. A file containing '\r\n' is read in with '\r\n' being translated to '\n'. New user input is added containing '\r\n' line endings. The file is written out and now contains a mix of '\r\n' and '\r\r\n'. Out of curiosity, why don't the Python wrappers for your Windows UI components do the appropriate '\r\n' - '\n' conversions? One of the great things about IronPython is that you don't *need* any wrappers - you access .NET objects natively (which in fact wrap the lower level win32 API) - and the .NET APIs are usually not as bad as you probably assume. ;-) Given the current lengthy discussion about newline translation, maybe it isn't such a great thing :-) Seriously, you do need a wrapper in this particular case - to convert the .NET line ending convention to Python's. The issue here is that such a wrapper is so trivial, that it's usually easier to simply do the translation with adhoc .replace('\r\n', '\n') calls. The problem comes when you accidentally forget a translation - then you get the clash between the .NET (\r\\n) and Python (\n) models. But of course, the solution in that case is to simply add the omitted translation, not to change Python's IO model. Of course, all this grand theory is just that - theory. In my case, it helped me understand what's going on, but that's all. For real life code, you just add the appropriate replace() calls. Whether theory helps you keep track of where replace() is needed, or whether you just know, doesn't really matter much. But regardless - the Python IO model doesn't need changing. (Not even 2.x, and the py3k model is even better in this regard). Paul. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Michael Foord [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] | Terry Reedy wrote: | There are two normal ways for internal Python text to have \r\n: | 1. Read from a file with \r\r\n. Then \r\r\n is correct output (on the | same platform). | 2. Intentially put there by a programmer. If s/he also chooses default \n | translation on output, \rtranslation of \n is correct. | | Actually, I usually get these strings from Windows UI components. A file | containing '\r\n' is read in with '\r\n' being translated to '\n'. New | user input is added containing '\r\n' line endings. The file is written | out and now contains a mix of '\r\n' and '\r\r\n'. I covered this in the part you snipped: 2. Other special situations, which can be handled by disabling, overriding, and layering the defaults. This seems enough flexibility to me. While mixing input like this may seem 'normal' to you, I believe it is 'special' considering the total Python community. I can think of at least 4 decent solutions, depending on the details of the input and what you do with it. tjr ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Michael Foord wrote: One of the great things about IronPython is that you don't *need* any wrappers - you access .NET objects natively But it seems that you really *do* need wrappers to deal with the line endings problem, whether they're provided automatically or you it yourself manually. This is reminiscent of the C-string vs. Pascal-string fiasco when Apple switched from Pascal to C as their main application programming language. Some development environments provided glue code that did the translation automatically; others required you to do it yourself, which was a huge nuisance. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Dino Viehland wrote: My understanding is that users can write code that uses only \n and Python will write the end-of-line character(s) that are appropriate for the platform when writing to a file. That's what I meant by uses \n for everything internally. But if you write \r\n to a file Python completely ignores the presence of the \r and transforms the \n into a \r\n anyway, hence the \r\r in the resulting stream. My last question is simply does anyone find writing \r\r\n when the original string contained \r\n a useful behavior - personally I don't see how it is. But Guido's response makes this sound like it's a problem w/ VC++ stdio implementation and not something that Python is explicitly doing. Anyway, it'd might be useful to have a text-mode file that you can write \r\n to and only get \r\n in the resulting file. But if the general sentiment is s.replace('\r', '') is the way to go we can advice our users of the behavior when interoperating w/ APIs that return \r\n in strings. We always do replace('\r\n','\n') but same difference... Michael -Original Message- From: Martin v. Löwis [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 26, 2007 3:01 PM To: Dino Viehland Cc: python-dev@python.org Subject: Re: [Python-Dev] New lines, carriage returns, and Windows This works great as long as you stay within an entirely Python world. Because Python uses \n for everything internally I think you misunderstand fairly significantly how this all works together. Python does not use \n for everything internally. Python is well capable of representing \r separately, and does so if you ask it to. So I'm curious: Is there a reason this behavior is useful that I'm missing? I think you are missing how it works in the first place (or else you failed to communicate to me what precise behavior you find puzzling). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
And if this is fine for you, given that you may have the largest WinForms / IronPython code base, I tend to think the replace may be reasonable. But we have had someone get surprised by this behavior. -Original Message- From: Michael Foord [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 26, 2007 3:15 PM To: Dino Viehland Cc: python-dev@python.org Subject: Re: [python] Re: [Python-Dev] New lines, carriage returns, and Windows Dino Viehland wrote: My understanding is that users can write code that uses only \n and Python will write the end-of-line character(s) that are appropriate for the platform when writing to a file. That's what I meant by uses \n for everything internally. But if you write \r\n to a file Python completely ignores the presence of the \r and transforms the \n into a \r\n anyway, hence the \r\r in the resulting stream. My last question is simply does anyone find writing \r\r\n when the original string contained \r\n a useful behavior - personally I don't see how it is. But Guido's response makes this sound like it's a problem w/ VC++ stdio implementation and not something that Python is explicitly doing. Anyway, it'd might be useful to have a text-mode file that you can write \r\n to and only get \r\n in the resulting file. But if the general sentiment is s.replace('\r', '') is the way to go we can advice our users of the behavior when interoperating w/ APIs that return \r\n in strings. We always do replace('\r\n','\n') but same difference... Michael -Original Message- From: Martin v. Löwis [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 26, 2007 3:01 PM To: Dino Viehland Cc: python-dev@python.org Subject: Re: [Python-Dev] New lines, carriage returns, and Windows This works great as long as you stay within an entirely Python world. Because Python uses \n for everything internally I think you misunderstand fairly significantly how this all works together. Python does not use \n for everything internally. Python is well capable of representing \r separately, and does so if you ask it to. So I'm curious: Is there a reason this behavior is useful that I'm missing? I think you are missing how it works in the first place (or else you failed to communicate to me what precise behavior you find puzzling). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Dino Viehland wrote: And if this is fine for you, given that you may have the largest WinForms / IronPython code base, I tend to think the replace may be reasonable. But we have had someone get surprised by this behavior. It is a slight impedance mismatch between Python and Windows - but isn't restricted to IronPython, so changing Python semantics doesn't seem like the right answer. Alternatively a more intelligent text mode (that writes '\n' as '\r\n' and '\r\n' as '\r\n' on Windows) doesn't sound like *such* a bad idea - but you will still get caught out by this. A string read in text mode will read '\r\n' as '\n'. Setting this on a winforms component will still do the wrong thing. Better to be aware of the difference and use binary mode. Michael -Original Message- From: Michael Foord [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 26, 2007 3:15 PM To: Dino Viehland Cc: python-dev@python.org Subject: Re: [python] Re: [Python-Dev] New lines, carriage returns, and Windows Dino Viehland wrote: My understanding is that users can write code that uses only \n and Python will write the end-of-line character(s) that are appropriate for the platform when writing to a file. That's what I meant by uses \n for everything internally. But if you write \r\n to a file Python completely ignores the presence of the \r and transforms the \n into a \r\n anyway, hence the \r\r in the resulting stream. My last question is simply does anyone find writing \r\r\n when the original string contained \r\n a useful behavior - personally I don't see how it is. But Guido's response makes this sound like it's a problem w/ VC++ stdio implementation and not something that Python is explicitly doing. Anyway, it'd might be useful to have a text-mode file that you can write \r\n to and only get \r\n in the resulting file. But if the general sentiment is s.replace('\r', '') is the way to go we can advice our users of the behavior when interoperating w/ APIs that return \r\n in strings. We always do replace('\r\n','\n') but same difference... Michael -Original Message- From: Martin v. Löwis [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 26, 2007 3:01 PM To: Dino Viehland Cc: python-dev@python.org Subject: Re: [Python-Dev] New lines, carriage returns, and Windows This works great as long as you stay within an entirely Python world. Because Python uses \n for everything internally I think you misunderstand fairly significantly how this all works together. Python does not use \n for everything internally. Python is well capable of representing \r separately, and does so if you ask it to. So I'm curious: Is there a reason this behavior is useful that I'm missing? I think you are missing how it works in the first place (or else you failed to communicate to me what precise behavior you find puzzling). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com