expandtabs acts unexpectedly

2009-08-19 Thread digisat...@gmail.com
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
Type help, copyright, credits or license for more information.
 ' test\ttest'.expandtabs(4)
' test   test'
 'test \ttest'.expandtabs(4)
'testtest'

1st example: expect returning 4 spaces between 'test', 3 spaces
returned
2nd example: expect returning 5 spaces between 'test', 4 spaces
returned

Is it a bug or something, please advice.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expandtabs acts unexpectedly

2009-08-19 Thread digisat...@gmail.com
On Aug 19, 4:16 pm, Peter Brett pe...@peter-b.co.uk wrote:
 digisat...@gmail.com digisat...@gmail.com writes:
  Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
  [GCC 4.3.3] on linux2
  Type help, copyright, credits or license for more information.
  ' test\ttest'.expandtabs(4)
  ' test   test'
  'test \ttest'.expandtabs(4)
  'test    test'

  1st example: expect returning 4 spaces between 'test', 3 spaces
  returned
  2nd example: expect returning 5 spaces between 'test', 4 spaces
  returned

  Is it a bug or something, please advice.

 Consider where the 4-space tabstops are relative to those strings:

  test   test
 test    test
 ^   ^   ^

 So no, it's not a bug.

 If you just want to replace the tab characters by spaces, use:

     test\ttest.replace(\t,     )
   ' test    test'
    test \ttest.replace(\t,     )
   'test     test'

 HTH,

                                Peter

 --
 Peter Brett pe...@peter-b.co.uk
 Remote Sensing Research Group
 Surrey Space Centre

You corrected me for the understanding of tab stop. Great explanation.
Thank you so much.
-- 
http://mail.python.org/mailman/listinfo/python-list


encoding problem

2008-12-19 Thread digisat...@gmail.com
The below snippet code generates UnicodeDecodeError.
#!/usr/bin/env python
#--*-- coding: utf-8 --*--
s = 'äöü'
u = unicode(s)


It seems that the system use the default encoding- ASCII to decode the
utf8 encoded string literal, and thus generates the error.

The question is why the Python interpreter use the default encoding
instead of utf-8, which I explicitly declared in the source.
--
http://mail.python.org/mailman/listinfo/python-list


Re: encoding problem

2008-12-19 Thread digisat...@gmail.com
On 12月19日, 下午9时34分, Marc 'BlackJack' Rintsch bj_...@gmx.net wrote:
 On Fri, 19 Dec 2008 04:05:12 -0800, digisat...@gmail.com wrote:
  The below snippet code generates UnicodeDecodeError.
  #!/usr/bin/env
  python
  #--*-- coding: utf-8 --*--
  s = 'äöü'
  u = unicode(s)

  It seems that the system use the default encoding- ASCII to decode the
  utf8 encoded string literal, and thus generates the error.

  The question is why the Python interpreter use the default encoding
  instead of utf-8, which I explicitly declared in the source.

 Because the declaration is only for decoding unicode literals in that
 very source file.

 Ciao,
         Marc 'BlackJack' Rintsch

Thanks for the answer.
I believe the declaration is not only for unicode literals, it is for
all literals in the source even including Comments. we can try runing
a source file without encoding declaration and have only 1 line of
comments with non-ASCII characters. That will arise a Syntax error and
bring me to the pep263 URL.

I read the pep263 and quoted below:

 Python's tokenizer/compiler combo will need to be updated to work as
follows:
   1. read the file
   2. decode it into Unicode assuming a fixed per-file encoding
   3. convert it into a UTF-8 byte string
   4. tokenize the UTF-8 content
   5. compile it, creating Unicode objects from the given Unicode
data
  and creating string objects from the Unicode literal data
  by first reencoding the UTF-8 data into 8-bit string data
  using the given file encoding

The above described Python internal process indicate that the step 2
will utilise the specific encoding to decode all literals in source,
while in step5 will evolve a re-encoding with the specific encoding.

That is the reason why we have to explicitly declare a encoding as
long as we have non-ASCII in source.

Bruno answered why we need specify a encoding when decoding a byte
string with perfect explanation, Thank you very much.
--
http://mail.python.org/mailman/listinfo/python-list