Re: [Tutor] Why difference between printing string typing its object reference at the prompt?
David Hutto wrote: If your app has a standard usage of phrases, you can place a file in that translates a tag into a particular language phrase. if submit_tag_selection == 'english': submit = 'Submit' if submit_tag_selection == 'english': submit = 'Soumettre' Of course this could be done without the if, you would just translate the normal selections within a file with the commonly used phrases in the app, and substitute it within a parse for: x = open('translate_file_french', 'r') for line in x: if line.split('=')[0] == 'Submit': print '%s' % (line.split('=')[1]) 'Soumettre' *Untested, but should work Now missing any context I am going to assume the topic shifted to how to do translations for a internationalized application/site. Feel free to ignore if I am wrong or OT. I would do this, but not using line splitting. I would create a (YAML) config files that contain translations of site text (i.e. Submit). You could do this with pickle too, but I think YAML files are better for humans to edit. text = 'SUBMIT_BUTTON_TEXT' with open('translate_file_fr') as f: # parse YAML into dictionary of { text_to_replace : text_to_replace_with } # some work for k,v in translation_key.iteritems(): text = text.replace(k, v) Alternately you could create a python module and just import the appropriate language. translation_key = __import__('translation.' + language ) text = ''join([ '...', translation_key.submit_button_text, '...']) Of course, I have no idea the downsides of this approach as I have not had to do something like this before. I would be interested in whether this matches the standard approach and the up/down-sides to it. Ramit Prasad This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why difference between printing string typing its object reference at the prompt?
If your app has a standard usage of phrases, you can place a file in that translates a tag into a particular language phrase. if submit_tag_selection == 'english': submit = 'Submit' if submit_tag_selection == 'english': submit = 'Soumettre' Of course this could be done without the if, you would just translate the normal selections within a file with the commonly used phrases in the app, and substitute it within a parse for: x = open('translate_file_french', 'r') for line in x: if line.split('=')[0] == 'Submit': print '%s' % (line.split('=')[1]) 'Soumettre' *Untested, but should work -- Best Regards, David Hutto CEO: http://www.hitwebdevelopment.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why difference between printing string typing its object reference at the prompt?
On 11/10/12 02:23, boB Stepp wrote: bytes have string methods as a convenience, such as find, split, and partition. They also have the method decode(), which uses a specified encoding such as utf-8 to create a string from an encoded bytes sequence. What is the intended use of byte types? One purpose is to facilitate the handling of raw data streams such as might be read from a binary file or over a network. If you are using locale settings with 16 bit characters reading such a stream as a character string will result in you processing pairs of bytes at a time. Using a byte string you guarantee you process 8 bits at a time with no attempt at interpretation. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why difference between printing string typing its object reference at the prompt?
On Wed, Oct 10, 2012 at 9:23 PM, boB Stepp robertvst...@gmail.com wrote: aꘌꘌb = True aꘌꘌb True Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ = range(1, 6) Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ (1, 2, 3, 4, 5) Is doing this considered good programming practice? The examples were meant to highlight the absurdity of using letter modifiers and number letters in identifiers. I should have clearly stated that I think these names are bad. bytes have string methods as a convenience, such as find, split, and partition. They also have the method decode(), which uses a specified encoding such as utf-8 to create a string from an encoded bytes sequence. What is the intended use of byte types? bytes objects are important for low-level data processing, such as file and socket I/O. The fundamental addressable value in a computer is a byte (at least for all common, modern computers). When you write a string to a file or socket, it has to be encoded as a sequence of bytes. For example, consider the character ퟡ (MATHEMATICAL DOUBLE-STRUCK DIGIT NINE) with decimal code 120801 (0x1d71e in hexadecimal): ord(ퟡ) 120801 Three common ways to encode this character are as UTF-32, UTF-16, and UTF-8. The UTF-32 encoding is the UCS4 format used by strings in main memory on a wide build (Python 3.3 uses a more efficient scheme that uses 1, 2, or 4 bytes as required). s.encode(utf-32) b'\xff\xfe\x00\x00\xe1\xd7\x01\x00' The utf-32 string encoder also includes a byte order mark (BOM) in the first 4 bytes of the encoded sequence (0xfffe). The order of the BOM determines that this is a little-endian, 4-byte encoding. http://en.wikipedia.org/wiki/Endianness You can use int.from_bytes() to verify that b'\xe1\xd7\x01\x00' is the number 120801 stored as 4 bytes in little-endian order: int.from_bytes(b'\xe1\xd7\x01\x00', 'little') 120801 or crunch the numbers in a generator expression: sum(x * 256**i for i,x in enumerate(b'\xe1\xd7\x01\x00')) 120801 UTF-32 is an inefficient way to represent Unicode. Characters in the BMP, which are by far the most common, only require at most 2 bytes. UTF-16 uses 2 bytes for BMP codes, like the original UCS2, and a 4-byte surrogate-pair encoding for characters in the supplementary planes. Here's the character ퟡ encoded as UTF-16: list(map(hex, s.encode('utf-16'))) ['0xff', '0xfe', '0x35', '0xd8', '0xe1', '0xdf'] Again there's a BOM, 0xfffe, which describes the order and number of bytes per code (i.e. 2 bytes, little endian). The character itself is stored as the surrogate pair [0xd835, 0xdfe1]. You can read more about surrogate pair encoding in the UTF-16 Wikipedia article: http://en.wikipedia.org/wiki/UTF-16 A narrow build of Python uses UCS2 + surrogates. It's not quite UTF-16 since it doesn't treat a surrogate pair as a single character for iteration, string length, and indexing. Python 3.3 eliminates narrow builds. Another common encoding is UTF-8. This maps each code to 1-4 bytes, without requiring a BOM (though the 3-byte BOM 0xefbbbf can be used when saving to a file). Since ASCII is so common, and since on many systems backward compatibility with ASCII is required, UTF-8 includes ASCII as a subset. In other words, codes below 128 are stored unmodified as a single byte. Non-ASCII codes are encoded as 2-4 bytes. See the UTF-8 Wikipedia article for the details: http://en.wikipedia.org/wiki/UTF-8#Description The character ퟡ requires 4 bytes in UTF-8: s = ퟡ sb = s.encode(utf-8) sb b'\xf0\x9d\x9f\xa1' list(sb) [240, 157, 159, 161] If you iterate over the encoded bytestring, the numbers 240, 157, 159, and 161 -- taken separately -- have no special significance. Neither does the length of 4 tell you how many characters are in the bytestring. With a decoded string, in contrast, you know how many characters it has (assuming you've normalized to NFC format) and can iterate through the characters in a simple for loop. If your terminal/console uses UTF-8, you can write the UTF-8 encoded bytes directly to the stdout buffer: sys.stdout.buffer.write(b'\xf0\x9d\x9f\xa1' + b'\n') ퟡ 5 This wrote 5 bytes: 4 bytes for the ퟡ character, plus b'\n' for a newline. Strings in Python 2 In Python 2, str is a bytestring. Iterating over a 2.x str yields single-byte characters. However, these generally aren't 'characters' at all (this goes back to the C programming language char type), not unless you're working with a single-byte encoding such as ASCII or Latin-1. In Python 2, unicode is a separate type and unicode literals require a u prefix to distinguish them from bytestrings, just as bytes literals in Python 3 require a b prefix to distinguish them from strings. Python 2.6 and 2.7 alias str to the name bytes, and they support the b prefix in literals. These were added to ease porting to Python 3, but bear in mind that it's still a classic bytestring, not a bytes object. For example, in 2.x you can use ord() with an item
Re: [Tutor] Why difference between printing string typing its object reference at the prompt?
On 10/11/2012 04:40 AM, eryksun wrote: On Wed, Oct 10, 2012 at 9:23 PM, boB Stepp robertvst...@gmail.com wrote: . What is the intended use of byte types? bytes objects are important for low-level data processing, such as file and socket I/O. The fundamental addressable value in a computer is a byte (at least for all common, modern computers). When you write a string to a file or socket, it has to be encoded as a sequence of bytes. SNIP Another common encoding is UTF-8. This maps each code to 1-4 bytes, Actually, the upper limit for a decoded utf-8 character is at least 6 bytes. I think it's 6, but it's no less than 6. without requiring a BOM (though the 3-byte BOM 0xefbbbf can be used when saving to a file). Since ASCII is so common, and since on many systems backward compatibility with ASCII is required, UTF-8 includes ASCII as a subset. In other words, codes below 128 are stored unmodified as a single byte. Non-ASCII codes are encoded as 2-4 bytes. See the UTF-8 Wikipedia article for the details: http://en.wikipedia.org/wiki/UTF-8#Description This shows cases for up to 6 bytes. snip Three other thing worth pointing out: 1) Python didn't define all these byte formats. These are standards which exist outside of the python world, and Python lets you coexist with them. If you want to create a text file that can be seen properly by an editor that only supports utf-8, you can't output UCS-4 and expect it to come up with anything but gibberish. 2) There are many more byte formats, most of them predating Unicode entirely. Many of these are specific to a particular language or national environment, and contain just those extensions to ASCII that the particular language deems useful. Python provides encoders and decoders to many of these as well. 3) There are many things read and written in byte format that have no relationship to characters. The notion of using text formats for all data (eg. xml) is a fairly recent one. Binary files are quite common, and many devices require binary transfers to work at all. So byte strings are not necessarily strings at all. -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why difference between printing string typing its object reference at the prompt?
On Thu, Oct 11, 2012 at 5:04 AM, Dave Angel d...@davea.name wrote: Actually, the upper limit for a decoded utf-8 character is at least 6 bytes. I think it's 6, but it's no less than 6. Yes, but what would be the point? Unicode only has 17 planes, up to code 0x10. It's limited by UTF-16. 2) There are many more byte formats, most of them predating Unicode entirely. Many of these are specific to a particular language or national environment, and contain just those extensions to ASCII that the particular language deems useful. Python provides encoders and decoders to many of these as well. I mentioned 3 common formats that can completely represent Unicode since this thread is mostly about Python 3 strings and repr -- at least it started that way. 3) There are many things read and written in byte format that have no relationship to characters. The notion of using text formats for all data (eg. xml) is a fairly recent one. Binary files are quite common, and many devices require binary transfers to work at all. So byte strings are not necessarily strings at all. Sure, other than encoded strings, there are also more obvious examples of data represented as bytes -- at least I hope they're obvious -- such as multimedia audio/video/images, sensor data, spreadsheets, and so on. In main memory these exist as data structures/objects (bytes, but not generally in a form suitable for transmission or storage). Before being saved to files or network streams, the data is transformed to serialize and pack it as a byte stream (e.g. the struct module, or pickle which defaults to a binary protocol in Python 3), possibly compress it to a smaller size and add error correction (e.g. the gzip module), and possibly encrypt it for security (e.g. PyCrypto). ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why difference between printing string typing its object reference at the prompt?
On 10/11/2012 05:21 AM, eryksun wrote: On Thu, Oct 11, 2012 at 5:04 AM, Dave Angel d...@davea.name wrote: Actually, the upper limit for a decoded utf-8 character is at least 6 bytes. I think it's 6, but it's no less than 6. Yes, but what would be the point? Unicode only has 17 planes, up to code 0x10. It's limited by UTF-16. More importantly, it was restricted by the 2003 rfc 3629, which I had completely missed. Last time I wrote a utf-8 encoder was before that probably about 1997. http://tools.ietf.org/html/rfc3629 Thanks for pointing it out. -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why difference between printing string typing its object reference at the prompt?
On Tue, Oct 9, 2012 at 4:29 AM, eryksun eryk...@gmail.com wrote: snip Python 3 lets you use any Unicode letter as an identifier, including letter modifiers (Lm) and number letters (Nl). For example: aꘌꘌb = True aꘌꘌb True Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ = range(1, 6) Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ (1, 2, 3, 4, 5) Is doing this considered good programming practice? I recall there was a recent discussion about using the actual characters in formulas instead of descriptive names, where this would make more sense to people knowledgeable in the field using the formulas; however, descriptive names might be better for those who don't have that specialty knowledge. Is there a Python community consensus on how and when it is appropriate (if ever) to use Unicode characters as identifiers? A potential gotcha in Unicode is the design choice to have both [C]omposed and [D]ecomposed forms of characters. For example: from unicodedata import name, normalize s1 = ü name(s1) 'LATIN SMALL LETTER U WITH DIAERESIS' s2 = normalize(NFD, s1) list(map(name, s2)) ['LATIN SMALL LETTER U', 'COMBINING DIAERESIS'] These combine as one glyph when printed: print(s2) ü Different forms of the 'same' character won't compare as equal unless you first normalize them to the same form: s1 == s2 False normalize(NFC, s1) == normalize(NFC, s2) True This looks to make alphabetical sorting potentially much more complex. I will have to give this some thought once I know more. I don't see a mention of byte strings mentioned in the index of my text. Are these just the ASCII character set? After seeing your explanation below, I was able to find the relevant material in my book. It was under bytes type and bytearray type. For some reason these categories did not click in my head as what Steve was addressing. A bytes object (and its mutable cousin bytearray) is a sequence of numbers, each in the range of a byte (0-255). bytes literals start with b, such as b'spam' and can only use ASCII characters, as does the repr of bytes. Slicing returns a new bytes object, but an index or iteration returns integer values: b'spam'[:3] b'spa' b'spam'[0] 115 list(b'spam') [115, 112, 97, 109] bytes have string methods as a convenience, such as find, split, and partition. They also have the method decode(), which uses a specified encoding such as utf-8 to create a string from an encoded bytes sequence. What is the intended use of byte types? Thanks! This continues to be quite informative and this thread is greatly helping me to make better sense of the information that I am self-studying. -- Cheers! boB ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why difference between printing string typing its object reference at the prompt?
On 11/10/12 12:23, boB Stepp wrote: On Tue, Oct 9, 2012 at 4:29 AM, eryksuneryk...@gmail.com wrote: snip Python 3 lets you use any Unicode letter as an identifier, including letter modifiers (Lm) and number letters (Nl). For example: aꘌꘌb = True aꘌꘌb True Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ = range(1, 6) Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ (1, 2, 3, 4, 5) Is doing this considered good programming practice? Not really, but it depends who is doing it and why. If you have a piece of code that is only going to be maintained by people speaking French, with French keyboards, then why not use French words for identifiers? That includes those French letters with accents. Python 3 lets you do so. Silly bits of code like Ⅳ = 4 (or worse, Ⅳ = 9) should be avoided because they are silly, not because they are illegal. That's about the same as using: eine, zwei, drei, vier, fünf = range(1, 6) in code intended to be read by English speakers, only even harder to type. Remember that programmers *discourage* most misspellings of words (with a few exceptions, usually abbreviations): number_of_pages = 42 is preferred to: nombar_off_paiges = 42 But for non-English speakers, most languages *force* them to either write code in Foreign (foreign *to them*), or to misspell words. Allowing Unicode identifiers means that they can write in their native tongue, using correct spelling, *if they so choose*. Of course, if you want your code to be readable world-wide, stick to English :) -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why difference between printing string typing its object reference at the prompt?
On Mon, Oct 8, 2012 at 10:35 PM, boB Stepp robertvst...@gmail.com wrote: I am not up (yet) on the details of Unicode that Python 3 defaults to for strings, but I believe I comprehend the general concept. Looking at the string escape table of chapter 2 it appears that Unicode characters can be either 16-bit or 32-bit. That must be a lot of potential characters! There are 1114112 possible codes (65536 codes/plane * 17 planes), but some are reserved, and only about 10% are assigned. Here's a list by category: http://www.fileformat.info/info/unicode/category/index.htm Python 3 lets you use any Unicode letter as an identifier, including letter modifiers (Lm) and number letters (Nl). For example: aꘌꘌb = True aꘌꘌb True Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ = range(1, 6) Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ (1, 2, 3, 4, 5) A potential gotcha in Unicode is the design choice to have both [C]omposed and [D]ecomposed forms of characters. For example: from unicodedata import name, normalize s1 = ü name(s1) 'LATIN SMALL LETTER U WITH DIAERESIS' s2 = normalize(NFD, s1) list(map(name, s2)) ['LATIN SMALL LETTER U', 'COMBINING DIAERESIS'] These combine as one glyph when printed: print(s2) ü Different forms of the 'same' character won't compare as equal unless you first normalize them to the same form: s1 == s2 False normalize(NFC, s1) == normalize(NFC, s2) True I don't see a mention of byte strings mentioned in the index of my text. Are these just the ASCII character set? A bytes object (and its mutable cousin bytearray) is a sequence of numbers, each in the range of a byte (0-255). bytes literals start with b, such as b'spam' and can only use ASCII characters, as does the repr of bytes. Slicing returns a new bytes object, but an index or iteration returns integer values: b'spam'[:3] b'spa' b'spam'[0] 115 list(b'spam') [115, 112, 97, 109] bytes have string methods as a convenience, such as find, split, and partition. They also have the method decode(), which uses a specified encoding such as utf-8 to create a string from an encoded bytes sequence. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why difference between printing string typing its object reference at the prompt?
Steve, On Thu, Oct 4, 2012 at 6:28 AM, Steven D'Aprano st...@pearwood.info wrote: snip Now, ask me about *raw strings*, and the difference between Unicode and byte strings :) How can I resist asking! I am not in chapter 2 of my study text yet, but looking ahead raw strings seem to be a method of declaring everything within the quotes to be a literal string character including the backslash escape character. Apparently this is designated by using an r before the very first quote. Can this quote be single, double or triple? I am not up (yet) on the details of Unicode that Python 3 defaults to for strings, but I believe I comprehend the general concept. Looking at the string escape table of chapter 2 it appears that Unicode characters can be either 16-bit or 32-bit. That must be a lot of potential characters! It will be interesting to look up the full Unicode tables. Quickly scanning the comparing strings section, I wonder if I should have been so quick to jump in with a couple of responses to the other thread going on recently! I don't see a mention of byte strings mentioned in the index of my text. Are these just the ASCII character set? Since I have not made it formally into this chapter yet, I don't really have specific questions, but I would be interested in anything you are willing to relate on these topics to complete my introduction to strings in Python. Or we can wait until I do get into the data types chapter that looks at these topics in detail and have specific questions. -- Cheers! boB ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why difference between printing string typing its object reference at the prompt?
On 10/03/2012 11:11 PM, boB Stepp wrote: Thanks to all who responded. SNIP. What happens if str() or repr() is not supported by a particular object? Is an exception thrown, an empty string returned or something else I am not imagining? Let's try it and see: class A:pass ... a = A() a __main__.A object at 0x16ae790 This is generic information about an object with no methods at all, and in particular without a __repr__ method. It identifies the module where the class was defined, the name of the class, and the address the particular instance happens to be located at. (In CPython, that happens to be identical to id(a). I'd be happier if it would just identify the number as the id, since ordinarily, the address is of no use. BTW, as far as I know, there's no promise as to how this is formatted, so I wouldn't try to parse it with a program. SNIP What larger phrase does repr stand for? My text mentions representational form later in the book, which sounds similar in concept to what you are discussing. That would be my guess. I don't recall seeing anything about it. -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why difference between printing string typing its object reference at the prompt?
On 04/10/12 13:39, boB Stepp wrote: But not always. For example: py from decimal import Decimal as D py x = D(1.23) py print(str(x)) 1.23 py print(repr(x)) Decimal('1.23') These contrasting examples are very illuminating. So in print(str(x)) the object, D(1.23), is being converted into a readable string, which makes the most sense as 1.23. But print(repr(x)) is giving a string representation of the object as code, which is more than just 1.23, the Decimal('1.23'). Am I understanding this correctly? Pretty close. In the example above, the calls to print are only there to avoid distracting you with the string delimiters, the outer quote marks. It's str() and repr() that are doing the real work. Apart from that, you've got it right. str(x) returns a human-readable version of x, which in this case is 1.23 (excluding the quote marks, of course). The designer of the Decimal class choose for repr() of a decimal to look as much as possible like the call to the class that created the object in the first place. (Or at least an equivalent call.) In this case, that is Decimal('1.23'). Unfortunately, the difference between str() and repr() is kind of arbitrary and depends on the object. str() is supposed to return a human-readable version of the object, for display, while repr() is supposed to return a string which would work as code, but those are more guidelines than hard rules. Will these fine distinctions be easy for me to pick up on as I progress in my Python studies? I suspect that I am going to have to experiment with str() and repr() in each new situation to see what results. *shrug* I've been programming in Python for over 10 years, and I still forget when str() is used and when repr() is used. I always have to check. But maybe that's just me. Remember, there is no hard rule that tells you what the output of str() and repr() must be (apart from strings). Different programmers have different ideas of what is useful, meaningful, or possible. [...] But repr() of a string creates a new string showing the representation of the original string, that is, what you would need to type in source code to make that string. That means: 1) wrap the whole thing in delimiters (quotation marks) 2) escaping special characters like tabs, newlines, and binary characters. As to point 2), will repr() insert \ (I am assuming Python uses a backslash like other languages to escape. I have not read about this in Python yet.) for these special characters? Will str() do the same? Yes to repr(), no to str(). Remember, str() of a string is just the same string unchanged. If the input string contains a newline, the output will also contain a newline: py s = abc + chr(10) + def py print(s) abc def py print(str(s)) abc def But repr() will create a new string, and escape any non-printable character (and a few which are printable): py print(repr(s)) 'abc\ndef' So this shows us that instead of creating string s as I did above, by concatenating two substrings and a newline character, I could just as easily have created it in one go using a \n escape: py t = abc\ndef py s == t True Notice too that there is no difference between the two different flavours of single quote delimiters. Whether you write a or 'a' is entirely a matter of personal preference. Python accepts both to make it easy to input strings containing quote marks: s = this string contains ' a single-quote t = 'this string contains a double-quote' Such single quote strings must start and end on the same line. On the other hand, *triple-quote* delimiters or ''' are used for multiline strings. They can extend over multiple lines. Now, ask me about *raw strings*, and the difference between Unicode and byte strings :) -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why difference between printing string typing its object reference at the prompt?
On 04/10/12 13:11, boB Stepp wrote: What happens if str() or repr() is not supported by a particular object? Is an exception thrown, an empty string returned or something else I am not imagining? I don't think that is possible, at least not by accident or neglect. In Python 3, everything inherits from object, which supports both str and repr, so everything else should too: py class MyClass: ... pass ... py obj = MyClass() py str(obj) '__main__.MyClass object at 0xb7c8c9ac' py repr(obj) '__main__.MyClass object at 0xb7c8c9ac' Not terribly exciting, but at least it tells you what the object is, and gives you enough information to distinguish it from other, similar, objects. I suppose you could write a class that deliberately raised an exception when you called str() on it, in which case it would raise an exception when you called str() on it... :) Likewise for repr(). py class Stupid: ... def __str__(self): ... raise TypeError('cannot stringify this object') ... py obj = Stupid() py str(obj) Traceback (most recent call last): File stdin, line 1, in module File stdin, line 3, in __str__ TypeError: cannot stringify this object What larger phrase does repr stand for? My text mentions representational form later in the book, which sounds similar in concept to what you are discussing. repr is short for representation, as in string representation. As I go along in my study of Python will it become clear to me when and how repr() and str() are being ...used, or implied in many places? Generally, print and the interactive interpreter are the only implicit string conversions. At least the only ones I can think of right now... no, wait, there's another one, error messages. print() displays the str() of the object. The interactive interpreter displays the repr() of the object. Error messages could do whatever they like. Anything else, you have to explicitly convert to a string using the form you want: s = repr(x).lower() t = str(y).replace('ss', 'ß') or whatever. -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why difference between printing string typing its object reference at the prompt?
On Wed, Oct 3, 2012 at 11:11 PM, boB Stepp robertvst...@gmail.com wrote: What happens if str() or repr() is not supported by a particular object? Is an exception thrown, an empty string returned or something else I am not imagining? The __str__ method inherited from object calls __repr__. For a class, __repr__ is inherited from type.__repr__, which returns class 'module_name.class_name'. For an instance, __repr__ is inherited from object.__repr__, which returns module_name.class_name object at address. If you override __str__ or __repr__, you must return a string. Else the interpreter will raise a TypeError. Basic example: class Test:... repr of the class: repr(Test) class '__main__.Test' repr of an instance: repr(Test()) '__main__.Test object at 0x958670c' As I go along in my study of Python will it become clear to me when and how repr() and str() are being ...used, or implied in many places? str is Python's string type, while repr is a built-in function that returns a string suitable for debugging. You can also call str without an argument to get an empty string, i.e. str() == ''. This is similar to other built-in types: int() == 0, float() == 0.0, complex() == 0j, tuple() = (), list() = [], and dict = {}. The returned value is either 0 or empty -- and boolean False in all cases. str also takes the optional arguments encoding and errors to decode an encoded string: str(b'spam', encoding='ascii') 'spam' bytes and bytearray objects have a decode() method that offers the same functionality: b'spam'.decode('ascii') 'spam' But other objects that support the buffer interface might not. For example, take the following array.array with the ASCII encoded bytes of spam: arr = array.array('B', b'spam') Here's the repr: arr array('B', [115, 112, 97, 109]) Without an argument str just returns the repr of the array: print(arr) array('B', [115, 112, 97, 109]) (The print function calls str.) But we can tell str to treat the array as an ASCII encoded buffer: print(str(arr, 'ascii')) spam ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why difference between printing string typing its object reference at the prompt?
Thanks to all who responded. There was much more going on here than I ever would have suspected. I am glad I asked the questions I did. This has been very informative. On Tue, Oct 2, 2012 at 11:53 PM, Dave Angel d...@davea.name wrote: There are two operations supported by (most) objects that produce a string. One is exemplified by the str() function, which converts an object to a string. That's the one called implicitly by print(). This form just represents the data, in the form most likely to be needed by the end user. What happens if str() or repr() is not supported by a particular object? Is an exception thrown, an empty string returned or something else I am not imagining? The other operation is repr(), which attempts to produce a string that could be used in a program to reproduce the actual object. So a repr() will have quote marks artificially added, or brackets, or commas, or whatever seems appropriate for the particular object. This is intended for the programmer's use, not for the end user. What larger phrase does repr stand for? My text mentions representational form later in the book, which sounds similar in concept to what you are discussing. [...] Your question was about string objects, but I tried to make the explanation as generic as possible. Those two functions, str() and repr(), are used, or implied in many places. For example, if you print a list, it'll call str() on the whole list. But the list object's logic will in turn call repr() on each of its elements, and put the whole thing together with braces and commas. As I go along in my study of Python will it become clear to me when and how repr() and str() are being ...used, or implied in many places? Thanks! boB ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why difference between printing string typing its object reference at the prompt?
On Wed, Oct 3, 2012 at 1:38 AM, Steven D'Aprano st...@pearwood.info wrote: snip The long answer is a bit more subtle, and rather long. I had initial suspicions this would be the case Thanks for yours and Dave's detailed exposition! [...] Python is no different: words, text if you will, that are part of the code are written as normal: # source code class Test: pass x = Test # Test here refers to the variable Test, a class But to create a string object, you use quotation marks to tell Python that this is data, not code, please create a string object: x = Test # Test here refers to a string, which is data Notice that the quotation marks are *delimiters*, they mark the start and end of the string, but aren't part of the string in any way. Python knows that the object is a string because you put it in string delimiters, but the delimiters are not part of the string. I was not sure if the quotes were considered part of the string or not. Thanks for the clarification. Now, take a step back and consider objects in general. There are two things we might like to do to an arbitrary object: * display the object, which implicitly means turning it into a string, or at least getting some representation of that object as a string; * convert the object into a string. Python has two built-in functions for that: * repr, which takes any object and returns a string that represents that object; * str, which tries to convert an object into a string, if that makes sense. Often those will do the same thing. For example: py str(42) == repr(42) == 42 True But not always. For example: py from decimal import Decimal as D py x = D(1.23) py print(str(x)) 1.23 py print(repr(x)) Decimal('1.23') These contrasting examples are very illuminating. So in print(str(x)) the object, D(1.23), is being converted into a readable string, which makes the most sense as 1.23. But print(repr(x)) is giving a string representation of the object as code, which is more than just 1.23, the Decimal('1.23'). Am I understanding this correctly? Unfortunately, the difference between str() and repr() is kind of arbitrary and depends on the object. str() is supposed to return a human-readable version of the object, for display, while repr() is supposed to return a string which would work as code, but those are more guidelines than hard rules. Will these fine distinctions be easy for me to pick up on as I progress in my Python studies? I suspect that I am going to have to experiment with str() and repr() in each new situation to see what results. So we have two different ways of converting an object to a string. But strings themselves are objects too. What happens there? py s = Hello world # remember the quotes are delimiters, not part of the string py print(str(s)) Hello world py print(repr(s)) 'Hello world' str() of a string is unchanged (and why shouldn't it be? it's already a string, there's nothing to convert). But repr() of a string creates a new string showing the representation of the original string, that is, what you would need to type in source code to make that string. That means: 1) wrap the whole thing in delimiters (quotation marks) 2) escaping special characters like tabs, newlines, and binary characters. As to point 2), will repr() insert \ (I am assuming Python uses a backslash like other languages to escape. I have not read about this in Python yet.) for these special characters? Will str() do the same? Notice that the string returned by repr() includes quote marks as part of the new string. Given the s above: py t = repr(s) py print(t) 'Hello world' py t 'Hello world' This tells us that the new string t includes single quote marks as the first and last character, so when you print it, the single quote marks are included in the output. But when you just display t interactively (see below), the delimiters are shown. Another great example. I probably would have overlooked this. Now, at the interactive interpreter, evaluating an object on its own without saving the result anywhere displays the repr() to the screen. Why repr()? Well, why not? The decision was somewhat arbitrary. So the designers of Python made this decision. I guess it had to be one way or the other. print, on the other hand, displays the str() of the object directly to the screen. For strings, that means the delimiters are not shown, because they are not part of the string itself. Why str() rather than repr()? Because that's what people mostly want, and if you want the other, you can just say print(repr(obj)). So in the end it is a simple choice to give the users what they want and are already used to. Does this help, or are you more confused than ever? This has been incredibly useful! Many thanks!! boB ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options:
Re: [Tutor] Why difference between printing string typing its object reference at the prompt?
On 10/02/2012 11:15 PM, boB Stepp wrote: After much diddling around I have finally settled on a text to study (Programming in Python 3, 2nd edition, by Mark Summerfield) and have defaulted to using IDLE, deferring worrying about editors/IDEs until I feel comfortable in Python. I am puzzled by the results of the following: x = Test x 'Test' print(x) Test I understand that 'Test' is the stored value in memory where the single quotes designate the value as being a string data type. So it makes sense to me that just typing the object reference for the string results in including the single quotes. But why does the print() strip the quotes off? Is just as simple as normally people when performing a print just want the unadorned text, so that is the behavior built into the print function? Or is there something more subtle going on that I am totally missing? If an explanation is in one of my several books, it is currently eluding me. There are two operations supported by (most) objects that produce a string. One is exemplified by the str() function, which converts an object to a string. That's the one called implicitly by print(). This form just represents the data, in the form most likely to be needed by the end user. The other operation is repr(), which attempts to produce a string that could be used in a program to reproduce the actual object. So a repr() will have quote marks artificially added, or brackets, or commas, or whatever seems appropriate for the particular object. This is intended for the programmer's use, not for the end user. When you program x = Test, the string object that is created does not have quote marks in it anywhere. It also doesn't care whether you produced it by single quotes, double quotes, triple quotes, or by some manipulation of one or more other objects. It has 4 characters in it. Period. If you take that same string and do a repr() on it, it will produce another string that does have some form of quotes, though not necessarily the ones used originally. In the interactive interpreter (I've never used IDLE), entering in an expression without assigning it to anything will cause the result of the expression to be displayed with repr(). Your question was about string objects, but I tried to make the explanation as generic as possible. Those two functions, str() and repr(), are used, or implied in many places. For example, if you print a list, it'll call str() on the whole list. But the list object's logic will in turn call repr() on each of its elements, and put the whole thing together with braces and commas. (Finer detail: There are special methods in the class for each object, __str__() and __repr__(), which actually have the code. But you should never call them directly, so you won't need to know about them till you start building your own classes) -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why difference between printing string typing its object reference at the prompt?
On 2 Oct 2012 23:17, boB Stepp robertvst...@gmail.com wrote: snip I am puzzled by the results of the following: x = Test x 'Test' print(x) Test I understand that 'Test' is the stored value in memory where the single quotes designate the value as being a string data type. So it makes sense to me that just typing the object reference for the string results in including the single quotes. But why does the print() strip the quotes off? Is just as simple as Hi boB, Under the covers, in python 2.x, print x causes the human readable string representation of x to be output by calling x.__str__. In an interactive prompt, typing x displays the python representation of x by calling x.__repr__. These can be the same or quite similar or quite different. When possible, __repr__ special methods ought to be defined so x equals eval(x.__repr__()). I believe, but don't warrant that in this regard python 3.x behave like 2.x (modulo the difference in the print syntax). Best, Brian vdB ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor