Re: [Tutor] Why difference between printing string typing its object reference at the prompt?

2012-10-18 Thread Prasad, Ramit
David Hutto wrote:
 If your app has  a standard usage of phrases, you can place a file in
 that translates a tag into a particular language phrase.
 
 
 
 if submit_tag_selection == 'english':
  submit = 'Submit'
 if submit_tag_selection == 'english':
  submit = 'Soumettre'
 
 Of course this could be done without the if, you would just translate
 the normal selections within a file with the commonly used phrases in
 the app, and substitute it within a parse for:
 
 x = open('translate_file_french', 'r')
 for line in x:
  if line.split('=')[0] == 'Submit':
print '%s'   %   (line.split('=')[1])
 
 'Soumettre'
 
 *Untested, but should work
 

Now missing any context I am going to assume the topic shifted to
how to do translations for a internationalized application/site. 
Feel free to ignore if I am wrong or OT.

I would do this, but not using line splitting. I would 
create a (YAML) config files that contain translations of
site text (i.e. Submit). You could do this with pickle too,
but I think YAML files are better for humans to edit.

text = 'SUBMIT_BUTTON_TEXT'
with open('translate_file_fr') as f:
   # parse YAML into dictionary of { text_to_replace : text_to_replace_with }
   
# some work
for k,v in translation_key.iteritems():
text = text.replace(k, v)

Alternately you could create a python module and just import
the appropriate language.

translation_key = __import__('translation.' + language ) 
text = ''join([ '...', translation_key.submit_button_text, '...'])


Of course, I have no idea the downsides of this approach as I 
have not had to do something like this before. I would be
interested in whether this matches the standard approach
and the up/down-sides to it.

Ramit Prasad


This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why difference between printing string typing its object reference at the prompt?

2012-10-17 Thread Dwight Hutto
If your app has  a standard usage of phrases, you can place a file in
that translates a tag into a particular language phrase.



if submit_tag_selection == 'english':
 submit = 'Submit'
if submit_tag_selection == 'english':
 submit = 'Soumettre'

Of course this could be done without the if, you would just translate
the normal selections within a file with the commonly used phrases in
the app, and substitute it within a parse for:

x = open('translate_file_french', 'r')
for line in x:
 if line.split('=')[0] == 'Submit':
   print '%s'   %   (line.split('=')[1])

'Soumettre'

*Untested, but should work


-- 
Best Regards,
David Hutto
CEO: http://www.hitwebdevelopment.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why difference between printing string typing its object reference at the prompt?

2012-10-11 Thread Alan Gauld

On 11/10/12 02:23, boB Stepp wrote:


bytes have string methods as a convenience, such as find, split, and
partition. They also have the method decode(), which uses a specified
encoding such as utf-8 to create a string from an encoded bytes
sequence.


What is the intended use of byte types?


One purpose is to facilitate the handling of raw data streams such as 
might be read from a binary file or over a network. If you are using 
locale settings with 16 bit characters reading such a stream as a 
character string will result in you processing pairs of bytes at a time. 
Using a byte string you guarantee you process 8 bits at a time with no 
attempt at interpretation.


--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why difference between printing string typing its object reference at the prompt?

2012-10-11 Thread eryksun
On Wed, Oct 10, 2012 at 9:23 PM, boB Stepp robertvst...@gmail.com wrote:

  aꘌꘌb = True
  aꘌꘌb
 True

  Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ = range(1, 6)
  Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ
 (1, 2, 3, 4, 5)

 Is doing this considered good programming practice?


The examples were meant to highlight the absurdity of using letter
modifiers and number letters in identifiers. I should have clearly
stated that I think these names are bad.


 bytes have string methods as a convenience, such as find, split, and
 partition. They also have the method decode(), which uses a specified
 encoding such as utf-8 to create a string from an encoded bytes
 sequence.

 What is the intended use of byte types?


bytes objects are important for low-level data processing, such as
file and socket I/O. The fundamental addressable value in a computer
is a byte (at least for all common, modern computers). When you write
a string to a file or socket, it has to be encoded as a sequence of
bytes.

For example, consider the character ퟡ (MATHEMATICAL DOUBLE-STRUCK
DIGIT NINE) with decimal code 120801 (0x1d71e in hexadecimal):

 ord(ퟡ)
120801

Three common ways to encode this character are as UTF-32, UTF-16, and UTF-8.

The UTF-32 encoding is the UCS4 format used by strings in main memory
on a wide build (Python 3.3 uses a more efficient scheme that uses
1, 2, or 4 bytes as required).

 s.encode(utf-32)
b'\xff\xfe\x00\x00\xe1\xd7\x01\x00'

The utf-32 string encoder also includes a byte order mark (BOM) in
the first 4 bytes of the encoded sequence (0xfffe). The order of
the BOM determines that this is a little-endian, 4-byte encoding.

http://en.wikipedia.org/wiki/Endianness

You can use int.from_bytes() to verify that b'\xe1\xd7\x01\x00' is the
number 120801 stored as 4 bytes in little-endian order:

 int.from_bytes(b'\xe1\xd7\x01\x00', 'little')
120801

or crunch the numbers in a generator expression:

 sum(x * 256**i for i,x in enumerate(b'\xe1\xd7\x01\x00'))
120801

UTF-32 is an inefficient way to represent Unicode. Characters in the
BMP, which are by far the most common, only require at most 2 bytes.

UTF-16 uses 2 bytes for BMP codes, like the original UCS2, and a
4-byte surrogate-pair encoding for characters in the supplementary
planes. Here's the character ퟡ encoded as UTF-16:

 list(map(hex, s.encode('utf-16')))
['0xff', '0xfe', '0x35', '0xd8', '0xe1', '0xdf']

Again there's a BOM, 0xfffe, which describes the order and number of
bytes per code (i.e. 2 bytes, little endian). The character itself is
stored as the surrogate pair [0xd835, 0xdfe1]. You can read more about
surrogate pair encoding in the UTF-16 Wikipedia article:

http://en.wikipedia.org/wiki/UTF-16

A narrow build of Python uses UCS2 + surrogates. It's not quite
UTF-16 since it doesn't treat a surrogate pair as a single character
for iteration, string length, and indexing. Python 3.3 eliminates
narrow builds.

Another common encoding is UTF-8. This maps each code to 1-4 bytes,
without requiring a BOM (though the 3-byte BOM 0xefbbbf can be used
when saving to a file). Since ASCII is so common, and since on many
systems backward compatibility with ASCII is required, UTF-8 includes
ASCII as a subset. In other words, codes below 128 are stored
unmodified as a single byte. Non-ASCII codes are encoded as 2-4 bytes.
See the UTF-8 Wikipedia article for the details:

http://en.wikipedia.org/wiki/UTF-8#Description

The character ퟡ requires 4 bytes in UTF-8:

 s = ퟡ
 sb = s.encode(utf-8)
 sb
b'\xf0\x9d\x9f\xa1'
 list(sb)
[240, 157, 159, 161]

If you iterate over the encoded bytestring, the numbers 240, 157, 159,
and 161 -- taken separately -- have no special significance. Neither
does the length of 4 tell you how many characters are in the
bytestring. With a decoded string, in contrast, you know how many
characters it has (assuming you've normalized to NFC format) and can
iterate through the characters in a simple for loop.

If your terminal/console uses UTF-8, you can write the UTF-8 encoded
bytes directly to the stdout buffer:

 sys.stdout.buffer.write(b'\xf0\x9d\x9f\xa1' + b'\n')
ퟡ
5

This wrote 5 bytes: 4 bytes for the ퟡ character, plus b'\n' for a newline.


Strings in Python 2

In Python 2, str is a bytestring. Iterating over a 2.x str yields
single-byte characters. However, these generally aren't 'characters'
at all (this goes back to the C programming language char type), not
unless you're working with a single-byte encoding such as ASCII or
Latin-1. In Python 2, unicode is a separate type and unicode literals
require a u prefix to distinguish them from bytestrings, just as bytes
literals in Python 3 require a b prefix to distinguish them from
strings.

Python 2.6 and 2.7 alias str to the name bytes, and they support the
b prefix in literals. These were added to ease porting to Python 3,
but bear in mind that it's still a classic bytestring, not a bytes
object. For example, in 2.x you can use ord() with an item 

Re: [Tutor] Why difference between printing string typing its object reference at the prompt?

2012-10-11 Thread Dave Angel
On 10/11/2012 04:40 AM, eryksun wrote:
 On Wed, Oct 10, 2012 at 9:23 PM, boB Stepp robertvst...@gmail.com wrote:
 .
 What is the intended use of byte types?

 bytes objects are important for low-level data processing, such as
 file and socket I/O. The fundamental addressable value in a computer
 is a byte (at least for all common, modern computers). When you write
 a string to a file or socket, it has to be encoded as a sequence of
 bytes.

 SNIP

 Another common encoding is UTF-8. This maps each code to 1-4 bytes,

Actually, the upper limit for a decoded utf-8 character is at least 6
bytes.  I think it's 6, but it's no less than 6.

 without requiring a BOM (though the 3-byte BOM 0xefbbbf can be used
 when saving to a file). Since ASCII is so common, and since on many
 systems backward compatibility with ASCII is required, UTF-8 includes
 ASCII as a subset. In other words, codes below 128 are stored
 unmodified as a single byte. Non-ASCII codes are encoded as 2-4 bytes.
 See the UTF-8 Wikipedia article for the details:

 http://en.wikipedia.org/wiki/UTF-8#Description
This shows cases for up to 6 bytes.
 snip

Three other thing worth pointing out:  1) Python didn't define all these
byte formats.  These are standards which exist outside of the python
world, and Python lets you coexist with them.  If you want to create a
text file that can be seen properly by an editor that only supports
utf-8, you can't output UCS-4 and expect it to come up with anything but
gibberish.

2) There are many more byte formats, most of them predating Unicode
entirely.  Many of these are specific to a particular language or
national environment, and contain just those extensions to ASCII that
the particular language deems useful.  Python provides encoders and
decoders to many of these as well.

3) There are many things read and written in byte format that have no
relationship to characters.  The notion of using text formats for all
data (eg. xml) is a fairly recent one.  Binary files are quite common,
and many devices require binary transfers to work at all.  So byte
strings are not necessarily strings at all.

-- 

DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why difference between printing string typing its object reference at the prompt?

2012-10-11 Thread eryksun
On Thu, Oct 11, 2012 at 5:04 AM, Dave Angel d...@davea.name wrote:

 Actually, the upper limit for a decoded utf-8 character is at least 6
 bytes.  I think it's 6, but it's no less than 6.

Yes, but what would be the point? Unicode only has 17 planes, up to
code 0x10. It's limited by UTF-16.

 2) There are many more byte formats, most of them predating Unicode
 entirely.  Many of these are specific to a particular language or
 national environment, and contain just those extensions to ASCII that
 the particular language deems useful.  Python provides encoders and
 decoders to many of these as well.

I mentioned 3 common formats that can completely represent Unicode
since this thread is mostly about Python 3 strings and repr -- at
least it started that way.

 3) There are many things read and written in byte format that have no
 relationship to characters.  The notion of using text formats for all
 data (eg. xml) is a fairly recent one.  Binary files are quite common,
 and many devices require binary transfers to work at all.  So byte
 strings are not necessarily strings at all.

Sure, other than encoded strings, there are also more obvious examples
of data represented as bytes -- at least I hope they're obvious --
such as multimedia audio/video/images, sensor data, spreadsheets, and
so on. In main memory these exist as data structures/objects (bytes,
but not generally in a form suitable for transmission or storage).
Before being saved to files or network streams, the data is
transformed to serialize and pack it as a byte stream (e.g. the struct
module, or pickle which defaults to a binary protocol in Python 3),
possibly compress it to a smaller size and add error correction (e.g.
the gzip module), and possibly encrypt it for security (e.g.
PyCrypto).
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why difference between printing string typing its object reference at the prompt?

2012-10-11 Thread Dave Angel
On 10/11/2012 05:21 AM, eryksun wrote:
 On Thu, Oct 11, 2012 at 5:04 AM, Dave Angel d...@davea.name wrote:

 Actually, the upper limit for a decoded utf-8 character is at least 6
 bytes.  I think it's 6, but it's no less than 6.
 
 Yes, but what would be the point? Unicode only has 17 planes, up to
 code 0x10. It's limited by UTF-16.

More importantly, it was restricted by the 2003 rfc 3629, which I had
completely missed.  Last time I wrote a utf-8 encoder was before that
probably about 1997.

http://tools.ietf.org/html/rfc3629

Thanks for pointing it out.



-- 

DaveA
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why difference between printing string typing its object reference at the prompt?

2012-10-10 Thread boB Stepp
On Tue, Oct 9, 2012 at 4:29 AM, eryksun eryk...@gmail.com wrote:
snip
 Python 3 lets you use any Unicode letter as an identifier, including
 letter modifiers (Lm) and number letters (Nl). For example:

  aꘌꘌb = True
  aꘌꘌb
 True

  Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ = range(1, 6)
  Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ
 (1, 2, 3, 4, 5)

Is doing this considered good programming practice? I recall there was
a recent discussion about using the actual characters in formulas
instead of descriptive names, where this would make more sense to
people knowledgeable in the field using the formulas; however,
descriptive names might be better for those who don't have that
specialty knowledge. Is there a Python community consensus on how and
when it is appropriate (if ever) to use Unicode characters as
identifiers?

 A potential gotcha in Unicode is the design choice to have both
 [C]omposed and [D]ecomposed forms of characters. For example:

  from unicodedata import name, normalize

  s1 = ü
  name(s1)
 'LATIN SMALL LETTER U WITH DIAERESIS'

  s2 = normalize(NFD, s1)
  list(map(name, s2))
 ['LATIN SMALL LETTER U', 'COMBINING DIAERESIS']

 These combine as one glyph when printed:

  print(s2)
 ü

 Different forms of the 'same' character won't compare as equal unless
 you first normalize them to the same form:

  s1 == s2
 False
  normalize(NFC, s1) == normalize(NFC, s2)
 True

This looks to make alphabetical sorting potentially much more complex.
I will have to give this some thought once I know more.

 I don't see a mention of byte strings mentioned in the index of my
 text. Are these just the ASCII character set?

After seeing your explanation below, I was able to find the relevant
material in my book. It was under bytes type and bytearray type.
For some reason these categories did not click in my head as what
Steve was addressing.

 A bytes object (and its mutable cousin bytearray) is a sequence of
 numbers, each in the range of a byte (0-255). bytes literals start
 with b, such as b'spam' and can only use ASCII characters, as does the
 repr of bytes. Slicing returns a new bytes object, but an index or
 iteration returns integer values:

  b'spam'[:3]
 b'spa'
  b'spam'[0]
 115
  list(b'spam')
 [115, 112, 97, 109]

 bytes have string methods as a convenience, such as find, split, and
 partition. They also have the method decode(), which uses a specified
 encoding such as utf-8 to create a string from an encoded bytes
 sequence.

What is the intended use of byte types?

Thanks! This continues to be quite informative and this thread is
greatly helping me to make better sense of the information that I am
self-studying.
-- 
Cheers!
boB
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why difference between printing string typing its object reference at the prompt?

2012-10-10 Thread Steven D'Aprano

On 11/10/12 12:23, boB Stepp wrote:

On Tue, Oct 9, 2012 at 4:29 AM, eryksuneryk...@gmail.com  wrote:
snip

Python 3 lets you use any Unicode letter as an identifier, including
letter modifiers (Lm) and number letters (Nl). For example:

   aꘌꘌb = True
   aꘌꘌb
 True

   Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ = range(1, 6)
   Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ
 (1, 2, 3, 4, 5)


Is doing this considered good programming practice?


Not really, but it depends who is doing it and why.

If you have a piece of code that is only going to be maintained by people
speaking French, with French keyboards, then why not use French words for
identifiers? That includes those French letters with accents. Python 3
lets you do so.

Silly bits of code like Ⅳ = 4 (or worse, Ⅳ = 9) should be avoided because
they are silly, not because they are illegal. That's about the same as
using:

eine, zwei, drei, vier, fünf = range(1, 6)

in code intended to be read by English speakers, only even harder to type.

Remember that programmers *discourage* most misspellings of words (with a
few exceptions, usually abbreviations):

number_of_pages = 42

is preferred to:

nombar_off_paiges = 42


But for non-English speakers, most languages *force* them to either
write code in Foreign (foreign *to them*), or to misspell words. Allowing
Unicode identifiers means that they can write in their native tongue,
using correct spelling, *if they so choose*.

Of course, if you want your code to be readable world-wide, stick to
English :)



--
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why difference between printing string typing its object reference at the prompt?

2012-10-09 Thread eryksun
On Mon, Oct 8, 2012 at 10:35 PM, boB Stepp robertvst...@gmail.com wrote:

 I am not up (yet) on the details of Unicode that Python 3 defaults to
 for strings, but I believe I comprehend the general concept. Looking
 at the string escape table of chapter 2 it appears that Unicode
 characters can be either 16-bit or 32-bit. That must be a lot of
 potential characters!

There are 1114112 possible codes (65536 codes/plane * 17 planes), but
some are reserved, and only about 10% are assigned. Here's a list by
category:

http://www.fileformat.info/info/unicode/category/index.htm

Python 3 lets you use any Unicode letter as an identifier, including
letter modifiers (Lm) and number letters (Nl). For example:

 aꘌꘌb = True
 aꘌꘌb
True

 Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ = range(1, 6)
 Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ
(1, 2, 3, 4, 5)

A potential gotcha in Unicode is the design choice to have both
[C]omposed and [D]ecomposed forms of characters. For example:

 from unicodedata import name, normalize

 s1 = ü
 name(s1)
'LATIN SMALL LETTER U WITH DIAERESIS'

 s2 = normalize(NFD, s1)
 list(map(name, s2))
['LATIN SMALL LETTER U', 'COMBINING DIAERESIS']

These combine as one glyph when printed:

 print(s2)
ü

Different forms of the 'same' character won't compare as equal unless
you first normalize them to the same form:

 s1 == s2
False
 normalize(NFC, s1) == normalize(NFC, s2)
True

 I don't see a mention of byte strings mentioned in the index of my
 text. Are these just the ASCII character set?

A bytes object (and its mutable cousin bytearray) is a sequence of
numbers, each in the range of a byte (0-255). bytes literals start
with b, such as b'spam' and can only use ASCII characters, as does the
repr of bytes. Slicing returns a new bytes object, but an index or
iteration returns integer values:

 b'spam'[:3]
b'spa'
 b'spam'[0]
115
 list(b'spam')
[115, 112, 97, 109]

bytes have string methods as a convenience, such as find, split, and
partition. They also have the method decode(), which uses a specified
encoding such as utf-8 to create a string from an encoded bytes
sequence.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why difference between printing string typing its object reference at the prompt?

2012-10-08 Thread boB Stepp
Steve,

On Thu, Oct 4, 2012 at 6:28 AM, Steven D'Aprano st...@pearwood.info wrote:
snip

 Now, ask me about *raw strings*, and the difference between Unicode
 and byte strings :)

How can I resist asking! I am not in chapter 2 of my study text yet,
but looking ahead raw strings seem to be a method of declaring
everything within the quotes to be a literal string character
including the backslash escape character. Apparently this is
designated by using an r before the very first quote. Can this quote
be single, double or triple?

I am not up (yet) on the details of Unicode that Python 3 defaults to
for strings, but I believe I comprehend the general concept. Looking
at the string escape table of chapter 2 it appears that Unicode
characters can be either 16-bit or 32-bit. That must be a lot of
potential characters! It will be interesting to look up the full
Unicode tables. Quickly scanning the comparing strings section, I
wonder if I should have been so quick to jump in with a couple of
responses to the other thread going on recently!

I don't see a mention of byte strings mentioned in the index of my
text. Are these just the ASCII character set?

Since I have not made it formally into this chapter yet, I don't
really have specific questions, but I would be interested in anything
you are willing to relate on these topics to complete my introduction
to strings in Python. Or we can wait until I do get into the data
types chapter that looks at these topics in detail and have specific
questions.
-- 
Cheers!
boB
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why difference between printing string typing its object reference at the prompt?

2012-10-04 Thread Dave Angel
On 10/03/2012 11:11 PM, boB Stepp wrote:
 Thanks to all who responded.
 SNIP.
 What happens if str() or repr() is not supported by a particular
 object? Is an exception thrown, an empty string returned or something
 else I am not imagining?

Let's try it and see:

 class A:pass
...
 a = A()
 a
__main__.A object at 0x16ae790

This is generic information about an object with no methods at all, and
in particular without a __repr__ method.  It identifies the module where
the class was defined, the name of the class, and the address the
particular instance happens to be located at.  (In CPython, that happens
to be identical to id(a).  I'd be happier if it would just identify the
number as the id, since ordinarily, the address is of no use.   BTW, as
far as I know, there's no promise as to how this is formatted, so I
wouldn't try to parse it with a program.

 SNIP
 What larger phrase does repr stand for? My text mentions
 representational form later in the book, which sounds similar in
 concept to what you are discussing.

That would be my guess.  I don't recall seeing anything about it.



-- 

DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why difference between printing string typing its object reference at the prompt?

2012-10-04 Thread Steven D'Aprano

On 04/10/12 13:39, boB Stepp wrote:


But not always. For example:

py  from decimal import Decimal as D
py  x = D(1.23)
py  print(str(x))
1.23
py  print(repr(x))
Decimal('1.23')


These contrasting examples are very illuminating. So in print(str(x))
the object, D(1.23), is being converted into a readable string,
which makes the most sense as 1.23. But print(repr(x)) is giving a
string representation of the object as code, which is more than just
1.23, the Decimal('1.23'). Am I understanding this correctly?


Pretty close.

In the example above, the calls to print are only there to avoid
distracting you with the string delimiters, the outer quote marks. It's
str() and repr() that are doing the real work.

Apart from that, you've got it right. str(x) returns a human-readable
version of x, which in this case is 1.23 (excluding the quote marks,
of course). The designer of the Decimal class choose for repr() of a
decimal to look as much as possible like the call to the class that
created the object in the first place. (Or at least an equivalent
call.) In this case, that is Decimal('1.23').



Unfortunately, the difference between str() and repr() is kind of
arbitrary and depends on the object. str() is supposed to return a
human-readable version of the object, for display, while repr() is
supposed to return a string which would work as code, but those are more
guidelines than hard rules.


Will these fine distinctions be easy for me to pick up on as I
progress in my Python studies? I suspect that I am going to have to
experiment with str() and repr() in each new situation to see what
results.


*shrug* I've been programming in Python for over 10 years, and I still
forget when str() is used and when repr() is used. I always have to check.
But maybe that's just me.

Remember, there is no hard rule that tells you what the output of str()
and repr() must be (apart from strings). Different programmers have
different ideas of what is useful, meaningful, or possible.


[...]

But repr() of a string creates a new string showing the representation
of the original string, that is, what you would need to type in source
code to make that string. That means:

1) wrap the whole thing in delimiters (quotation marks)
2) escaping special characters like tabs, newlines, and binary
characters.


As to point 2), will repr() insert \ (I am assuming Python uses a
backslash like other languages to escape. I have not read about this
in Python yet.) for these special characters? Will str() do the same?


Yes to repr(), no to str().

Remember, str() of a string is just the same string unchanged. If the
input string contains a newline, the output will also contain a newline:

py s = abc + chr(10) + def
py print(s)
abc
def
py print(str(s))
abc
def


But repr() will create a new string, and escape any non-printable
character (and a few which are printable):

py print(repr(s))
'abc\ndef'


So this shows us that instead of creating string s as I did above, by
concatenating two substrings and a newline character, I could just as
easily have created it in one go using a \n escape:

py t = abc\ndef
py s == t
True


Notice too that there is no difference between the two different
flavours of single quote delimiters. Whether you write a or 'a'
is entirely a matter of personal preference. Python accepts both to
make it easy to input strings containing quote marks:

s = this string contains ' a single-quote
t = 'this string contains  a double-quote'


Such single quote strings must start and end on the same line. On the
other hand, *triple-quote* delimiters  or ''' are used for
multiline strings. They can extend over multiple lines.


Now, ask me about *raw strings*, and the difference between Unicode
and byte strings :)


--
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why difference between printing string typing its object reference at the prompt?

2012-10-04 Thread Steven D'Aprano

On 04/10/12 13:11, boB Stepp wrote:


What happens if str() or repr() is not supported by a particular
object? Is an exception thrown, an empty string returned or something
else I am not imagining?


I don't think that is possible, at least not by accident or neglect.
In Python 3, everything inherits from object, which supports both str
and repr, so everything else should too:

py class MyClass:
... pass
...
py obj = MyClass()
py str(obj)
'__main__.MyClass object at 0xb7c8c9ac'
py repr(obj)
'__main__.MyClass object at 0xb7c8c9ac'


Not terribly exciting, but at least it tells you what the object is,
and gives you enough information to distinguish it from other, similar,
objects.

I suppose you could write a class that deliberately raised an exception
when you called str() on it, in which case it would raise an exception
when you called str() on it... :) Likewise for repr().


py class Stupid:
... def __str__(self):
... raise TypeError('cannot stringify this object')
...
py obj = Stupid()
py str(obj)
Traceback (most recent call last):
  File stdin, line 1, in module
  File stdin, line 3, in __str__
TypeError: cannot stringify this object




What larger phrase does repr stand for? My text mentions
representational form later in the book, which sounds similar in
concept to what you are discussing.



repr is short for representation, as in string representation.




As I go along in my study of Python will it become clear to me when
and how repr() and str() are being ...used, or implied in many
places?


Generally, print and the interactive interpreter are the only implicit
string conversions. At least the only ones I can think of right now...
no, wait, there's another one, error messages.

print() displays the str() of the object. The interactive interpreter
displays the repr() of the object. Error messages could do whatever
they like. Anything else, you have to explicitly convert to a string
using the form you want:

s = repr(x).lower()
t = str(y).replace('ss', 'ß')


or whatever.


--
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why difference between printing string typing its object reference at the prompt?

2012-10-04 Thread eryksun
On Wed, Oct 3, 2012 at 11:11 PM, boB Stepp robertvst...@gmail.com wrote:

 What happens if str() or repr() is not supported by a particular
 object? Is an exception thrown, an empty string returned or something
 else I am not imagining?

The __str__ method inherited from object calls __repr__.

For a class, __repr__ is inherited from type.__repr__, which returns
class 'module_name.class_name'.

For an instance, __repr__ is inherited from object.__repr__, which returns
module_name.class_name object at address.


If you override __str__ or __repr__, you must return a string. Else
the interpreter will raise a TypeError.


Basic example:

 class Test:...

repr of the class:

 repr(Test)
class '__main__.Test'

repr of an instance:

 repr(Test())
'__main__.Test object at 0x958670c'


 As I go along in my study of Python will it become clear to me when
 and how repr() and str() are being ...used, or implied in many
 places?

str is Python's string type, while repr is a built-in function that
returns a string suitable for debugging.

You can also call str without an argument to get an empty string, i.e.
str() == ''. This is similar to other built-in types: int() == 0,
float() == 0.0, complex() == 0j, tuple() = (), list() = [], and dict =
{}. The returned value is either 0 or empty -- and boolean False in
all cases.

str also takes the optional arguments encoding and errors to
decode an encoded string:

 str(b'spam', encoding='ascii')
'spam'

bytes and bytearray objects have a decode() method that offers the
same functionality:

 b'spam'.decode('ascii')
'spam'

But other objects that support the buffer interface might not. For
example, take the following array.array with the ASCII encoded bytes
of spam:

 arr = array.array('B', b'spam')

Here's the repr:

 arr
array('B', [115, 112, 97, 109])

Without an argument str just returns the repr of the array:

 print(arr)
array('B', [115, 112, 97, 109])

(The print function calls str.)

But we can tell str to treat the array as an ASCII encoded buffer:

 print(str(arr, 'ascii'))
spam
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why difference between printing string typing its object reference at the prompt?

2012-10-03 Thread boB Stepp
Thanks to all who responded. There was much more going on here than I
ever would have suspected. I am glad I asked the questions I did. This
has been very informative.

On Tue, Oct 2, 2012 at 11:53 PM, Dave Angel d...@davea.name wrote:


 There are two operations supported by (most) objects that produce a
 string.  One is exemplified by the str() function, which converts an
 object to a string.  That's the one called implicitly by print().  This
 form just represents the data, in the form most likely to be needed by
 the end user.

What happens if str() or repr() is not supported by a particular
object? Is an exception thrown, an empty string returned or something
else I am not imagining?


 The other operation is repr(), which attempts to produce a string that
 could be used in a program to reproduce the actual object.  So a repr()
 will have quote marks artificially added, or brackets, or commas, or
 whatever seems appropriate for the particular object.  This is intended
 for the programmer's use, not for the end user.

What larger phrase does repr stand for? My text mentions
representational form later in the book, which sounds similar in
concept to what you are discussing.

[...]
 Your question was about string objects, but I tried to make the
 explanation as generic as possible.  Those two functions, str() and
 repr(), are used, or implied in many places.  For example, if you print
 a list, it'll call str() on the whole list.  But the list object's logic
 will in turn call repr() on each of its elements, and put the whole
 thing together with braces and commas.


As I go along in my study of Python will it become clear to me when
and how repr() and str() are being ...used, or implied in many
places?


Thanks!
boB
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why difference between printing string typing its object reference at the prompt?

2012-10-03 Thread boB Stepp
On Wed, Oct 3, 2012 at 1:38 AM, Steven D'Aprano st...@pearwood.info wrote:

snip

 The long answer is a bit more subtle, and rather long.

I had initial suspicions this would be the case Thanks for yours and
Dave's detailed exposition!

[...]
 Python is no different: words, text if you will, that are part of the
 code are written as normal:

 # source code
 class Test:
 pass

 x = Test  # Test here refers to the variable Test, a class

 But to create a string object, you use quotation marks to tell Python
 that this is data, not code, please create a string object:

 x = Test  # Test here refers to a string, which is data

 Notice that the quotation marks are *delimiters*, they mark the start
 and end of the string, but aren't part of the string in any way. Python
 knows that the object is a string because you put it in string
 delimiters, but the delimiters are not part of the string.

I was not sure if the quotes were considered part of the string or
not. Thanks for the clarification.

 Now, take a step back and consider objects in general. There are two
 things we might like to do to an arbitrary object:

 * display the object, which implicitly means turning it into a
   string, or at least getting some representation of that object
   as a string;

 * convert the object into a string.

 Python has two built-in functions for that:

 * repr, which takes any object and returns a string that represents
   that object;

 * str, which tries to convert an object into a string, if that makes
   sense.

 Often those will do the same thing. For example:

 py str(42) == repr(42) == 42
 True

 But not always. For example:

 py from decimal import Decimal as D
 py x = D(1.23)
 py print(str(x))
 1.23
 py print(repr(x))
 Decimal('1.23')

These contrasting examples are very illuminating. So in print(str(x))
the object, D(1.23), is being converted into a readable string,
which makes the most sense as 1.23. But print(repr(x)) is giving a
string representation of the object as code, which is more than just
1.23, the Decimal('1.23'). Am I understanding this correctly?

 Unfortunately, the difference between str() and repr() is kind of
 arbitrary and depends on the object. str() is supposed to return a
 human-readable version of the object, for display, while repr() is
 supposed to return a string which would work as code, but those are more
 guidelines than hard rules.

Will these fine distinctions be easy for me to pick up on as I
progress in my Python studies? I suspect that I am going to have to
experiment with str() and repr() in each new situation to see what
results.

 So we have two different ways of converting an object to a string. But
 strings themselves are objects too. What happens there?

 py s = Hello world  # remember the quotes are delimiters, not part of the 
 string
 py print(str(s))
 Hello world
 py print(repr(s))
 'Hello world'

 str() of a string is unchanged (and why shouldn't it be? it's already a
 string, there's nothing to convert).

 But repr() of a string creates a new string showing the representation
 of the original string, that is, what you would need to type in source
 code to make that string. That means:

 1) wrap the whole thing in delimiters (quotation marks)
 2) escaping special characters like tabs, newlines, and binary
characters.

As to point 2), will repr() insert \ (I am assuming Python uses a
backslash like other languages to escape. I have not read about this
in Python yet.) for these special characters? Will str() do the same?

 Notice that the string returned by repr() includes quote marks as part
 of the new string. Given the s above:

 py t = repr(s)
 py print(t)
 'Hello world'
 py t
 'Hello world'

 This tells us that the new string t includes single quote marks as the
 first and last character, so when you print it, the single quote marks
 are included in the output. But when you just display t interactively
 (see below), the delimiters are shown.

Another great example. I probably would have overlooked this.

 Now, at the interactive interpreter, evaluating an object on its own
 without saving the result anywhere displays the repr() to the screen.
 Why repr()? Well, why not? The decision was somewhat arbitrary.

So the designers of Python made this decision. I guess it had to be
one way or the other.

 print, on the other hand, displays the str() of the object directly to
 the screen. For strings, that means the delimiters are not shown,
 because they are not part of the string itself. Why str() rather than
 repr()? Because that's what people mostly want, and if you want the
 other, you can just say print(repr(obj)).

So in the end it is a simple choice to give the users what they want
and are already used to.


 Does this help, or are you more confused than ever?

This has been incredibly useful! Many thanks!!

boB
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:

Re: [Tutor] Why difference between printing string typing its object reference at the prompt?

2012-10-02 Thread Dave Angel
On 10/02/2012 11:15 PM, boB Stepp wrote:
 After much diddling around I have finally settled on a text to study
 (Programming in Python 3, 2nd edition, by Mark Summerfield) and have
 defaulted to using IDLE, deferring worrying about editors/IDEs until I
 feel comfortable in Python.

 I am puzzled by the results of the following:

 x = Test
 x
 'Test'
 print(x)
 Test

 I understand that 'Test' is the stored value in memory where the
 single quotes designate the value as being a string data type. So it
 makes sense to me that just typing the object reference for the string
 results in including the single quotes. But why does the print() strip
 the quotes off? Is just as simple as normally people when performing a
 print just want the unadorned text, so that is the behavior built into
 the print function? Or is there something more subtle going on that I
 am totally missing? If an explanation is in one of my several books,
 it is currently eluding me.


There are two operations supported by (most) objects that produce a
string.  One is exemplified by the str() function, which converts an
object to a string.  That's the one called implicitly by print().  This
form just represents the data, in the form most likely to be needed by
the end user.

The other operation is repr(), which attempts to produce a string that
could be used in a program to reproduce the actual object.  So a repr()
will have quote marks artificially added, or brackets, or commas, or
whatever seems appropriate for the particular object.  This is intended
for the programmer's use, not for the end user.

When you program  x = Test, the string object that is created does not
have quote marks in it anywhere.  It also doesn't care whether you
produced it by single quotes, double quotes, triple quotes, or by some
manipulation of one or more other objects.  It has 4 characters in it. 
Period.

If you take that same string and do a repr() on it, it will produce
another string that does have some form of quotes, though not
necessarily the ones used originally.

In the interactive interpreter (I've never used IDLE), entering in an
expression without assigning it to anything will cause the result of the
expression to be displayed with repr().

Your question was about string objects, but I tried to make the
explanation as generic as possible.  Those two functions, str() and
repr(), are used, or implied in many places.  For example, if you print
a list, it'll call str() on the whole list.  But the list object's logic
will in turn call repr() on each of its elements, and put the whole
thing together with braces and commas.

(Finer detail:  There are special methods in the class for each object,
__str__() and __repr__(), which actually have the code.  But you should
never call them directly, so you won't need to know about them till you
start building your own classes)

-- 

DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why difference between printing string typing its object reference at the prompt?

2012-10-02 Thread Brian van den Broek
On 2 Oct 2012 23:17, boB Stepp robertvst...@gmail.com wrote:

snip

 I am puzzled by the results of the following:

  x = Test
  x
 'Test'
  print(x)
 Test

 I understand that 'Test' is the stored value in memory where the
 single quotes designate the value as being a string data type. So it
 makes sense to me that just typing the object reference for the string
 results in including the single quotes. But why does the print() strip
 the quotes off? Is just as simple as

Hi boB,

Under the covers, in python 2.x, print x causes the human readable string
representation of x to be output by calling x.__str__. In an interactive
prompt, typing x displays the python representation of x by calling
x.__repr__.  These can be the same or quite similar or quite different.
When possible, __repr__ special methods ought to be defined so x equals
eval(x.__repr__()).

I believe, but don't warrant that in this regard python 3.x behave like 2.x
(modulo the difference in the print syntax).

Best,

Brian vdB
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor