Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-18 Thread akhil1988

Thanks David, it solved my problem immediately. 

I will follow your advise from next time but honestly I am new to python
with not much knowledge about text formats. And the main portion of my
project was not to deal with these, so I just wanted to get this solved as I
was already struck at this for 2 days. If you think I am wrong in my
approach to getting problems solved, please let me know. Your advise would
be helpful in future for me.

--Thanks Again,
Akhil 

Scott David Daniels wrote:
> 
> akhil1988 wrote:
> >
>> Nobody-38 wrote:
>>> On Thu, 16 Jul 2009 15:43:37 -0700, akhil1988 wrote:
> ...
> In Python 3 you can't decode strings because they are Unicode strings
> and it doesn't make sense to decode a Unicode string. You can only
> decode encoded things which are byte strings. So you are mixing up
> byte
> strings and Unicode strings.
 ... I read a byte string from sys.stdin which needs to converted to
 unicode
 string for further processing.
>>> In 3.x, sys.stdin (stdout, stderr) are text streams, which means that
>>> they
>>> read and write Unicode strings, not byte strings.
>>>
 I cannot just remove the decode statement and proceed?
 This is it what it looks like:
 for line in sys.stdin:
 line = line.decode('utf-8').strip()
 if line == '': #do something here
 
 If I remove the decode statement, line == '' never gets true. 
>>> Did you inadvertently remove the strip() as well?
>> ... unintentionally I removed strip()
>> I get this error now:
>>  File "./temp.py", line 488, in 
>> main()
>>   File "./temp.py", line 475, in main
>> for line in sys.stdin:
>>   File "/usr/local/lib/python3.1/codecs.py", line 300, in decode
>> (result, consumed) = self._buffer_decode(data, self.errors, final)
>> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2:
>> invalid
>> data
> 
> (1) Do not top post.
> (2) Try to fully understand the problem and proposed solution, rather
>  than trying to get people to tell you just enough to get your code
>  going.
> (3) The only way sys.stdin can possibly return unicode is to do some
>  decoding of its own.  your job is to make sure it uses the correct
>  decoding.  So, if you know your source is always utf-8, try
>  something like:
> 
>  import sys
>  import io
> 
>  sys.stdin = io.TextIOWrapper(sys.stdin.detach(), encoding='utf8')
> 
>  for line in sys.stdin:
>  line = line.strip()
>  if line == '':
>  #do something here
>  
> 
> --Scott David Daniels
> scott.dani...@acm.org
> -- 
> http://mail.python.org/mailman/listinfo/python-list
> 
> 

-- 
View this message in context: 
http://www.nabble.com/UnicodeEncodeError%3A-%27ascii%27-codec-can%27t-encode-character-u%27%5Cxb7%27-in-position-13%3A-ordinal-not-in-range%28128%29-tp24509879p24550540.html
Sent from the Python - python-list mailing list archive at Nabble.com.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-18 Thread akhil1988

Thanks Nobody-38, it solved my problem immediately.

--Thanks Again,
Akhil

Nobody-38 wrote:
> 
> On Thu, 16 Jul 2009 20:26:39 -0700, akhil1988 wrote:
> 
>> Well, you were write: unintentionally I removed strip(). But the problem
>> does
>> not ends here:
>> 
>> I get this error now:
>> 
>>  File "./temp.py", line 488, in 
>> main()
>>   File "./temp.py", line 475, in main
>> for line in sys.stdin:
>>   File "/usr/local/lib/python3.1/codecs.py", line 300, in decode
>> (result, consumed) = self._buffer_decode(data, self.errors, final)
>> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2:
>> invalid
>> data
>> 
>> for this line:
>> â
> 
> Right. You're running in a locale whose encoding is UTF-8, but feeding
> data which isn't valid UTF-8 to stdin. If you want to use data with a
> different encoding, you need to replace sys.stdin, e.g.:
> 
> import sys
> import io
> sys.stdin = io.TextIOWrapper(sys.stdin.detach(), encoding = 'iso-8859-1')
> 
> -- 
> http://mail.python.org/mailman/listinfo/python-list
> 
> 

-- 
View this message in context: 
http://www.nabble.com/UnicodeEncodeError%3A-%27ascii%27-codec-can%27t-encode-character-u%27%5Cxb7%27-in-position-13%3A-ordinal-not-in-range%28128%29-tp24509879p24550497.html
Sent from the Python - python-list mailing list archive at Nabble.com.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-17 Thread Scott David Daniels

akhil1988 wrote:
>

Nobody-38 wrote:

On Thu, 16 Jul 2009 15:43:37 -0700, akhil1988 wrote:

...

In Python 3 you can't decode strings because they are Unicode strings
and it doesn't make sense to decode a Unicode string. You can only
decode encoded things which are byte strings. So you are mixing up byte
strings and Unicode strings.

... I read a byte string from sys.stdin which needs to converted to unicode
string for further processing.

In 3.x, sys.stdin (stdout, stderr) are text streams, which means that they
read and write Unicode strings, not byte strings.


I cannot just remove the decode statement and proceed?
This is it what it looks like:
for line in sys.stdin:
line = line.decode('utf-8').strip()
if line == '': #do something here

If I remove the decode statement, line == '' never gets true. 

Did you inadvertently remove the strip() as well?

... unintentionally I removed strip()
I get this error now:
 File "./temp.py", line 488, in 
main()
  File "./temp.py", line 475, in main
for line in sys.stdin:
  File "/usr/local/lib/python3.1/codecs.py", line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid
data


(1) Do not top post.
(2) Try to fully understand the problem and proposed solution, rather
than trying to get people to tell you just enough to get your code
going.
(3) The only way sys.stdin can possibly return unicode is to do some
decoding of its own.  your job is to make sure it uses the correct
decoding.  So, if you know your source is always utf-8, try
something like:

import sys
import io

sys.stdin = io.TextIOWrapper(sys.stdin.detach(), encoding='utf8')

for line in sys.stdin:
line = line.strip()
if line == '':
#do something here


--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-17 Thread Piet van Oostrum
> akhil1988  (a) wrote:

>a> Well, you were write: unintentionally I removed strip(). But the problem 
>does
>a> not ends here:

>a> I get this error now:

>a>  File "./temp.py", line 488, in 
>a> main()
>a>   File "./temp.py", line 475, in main
>a> for line in sys.stdin:
>a>   File "/usr/local/lib/python3.1/codecs.py", line 300, in decode
>a> (result, consumed) = self._buffer_decode(data, self.errors, final)
>a> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid
>a> data

>a> for this line:
>a> â

Your Python assumes stdin uses utf-8 encoding, probably because your
locale says so. But it seems the input is not really utf-8 but some
other encoding.
-- 
Piet van Oostrum 
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: p...@vanoostrum.org
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-17 Thread Nobody
On Thu, 16 Jul 2009 20:26:39 -0700, akhil1988 wrote:

> Well, you were write: unintentionally I removed strip(). But the problem does
> not ends here:
> 
> I get this error now:
> 
>  File "./temp.py", line 488, in 
> main()
>   File "./temp.py", line 475, in main
> for line in sys.stdin:
>   File "/usr/local/lib/python3.1/codecs.py", line 300, in decode
> (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid
> data
> 
> for this line:
> â

Right. You're running in a locale whose encoding is UTF-8, but feeding
data which isn't valid UTF-8 to stdin. If you want to use data with a
different encoding, you need to replace sys.stdin, e.g.:

import sys
import io
sys.stdin = io.TextIOWrapper(sys.stdin.detach(), encoding = 'iso-8859-1')

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-16 Thread akhil1988

Well, you were write: unintentionally I removed strip(). But the problem does
not ends here:

I get this error now:

 File "./temp.py", line 488, in 
main()
  File "./temp.py", line 475, in main
for line in sys.stdin:
  File "/usr/local/lib/python3.1/codecs.py", line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid
data

for this line:
â


--Akhil

Nobody-38 wrote:
> 
> On Thu, 16 Jul 2009 15:43:37 -0700, akhil1988 wrote:
> 
>>> In Python 3 you can't decode strings because they are Unicode strings
>>> and it doesn't make sense to decode a Unicode string. You can only
>>> decode encoded things which are byte strings. So you are mixing up byte
>>> strings and Unicode strings.
>>
>> Then, how should I do it?
>> I read a byte string from sys.stdin which needs to converted to unicode
>> string for further processing.
> 
> In 3.x, sys.stdin (stdout, stderr) are text streams, which means that they
> read and write Unicode strings, not byte strings.
> 
>> I cannot just remove the decode statement and
>> proceed?
>> 
>> This is it what it looks like:
>> 
>> for line in sys.stdin:
>> line = line.decode('utf-8').strip()
>> if line == '': #do something here
>> elsif #do something here
>> 
>> If I remove the decode statement, line == '' never gets true. 
> 
> Did you inadvertently remove the strip() as well?
> 
> -- 
> http://mail.python.org/mailman/listinfo/python-list
> 
> 

-- 
View this message in context: 
http://www.nabble.com/UnicodeEncodeError%3A-%27ascii%27-codec-can%27t-encode-character-u%27%5Cxb7%27-in-position-13%3A-ordinal-not-in-range%28128%29-tp24509879p24528030.html
Sent from the Python - python-list mailing list archive at Nabble.com.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-16 Thread Nobody
On Thu, 16 Jul 2009 15:43:37 -0700, akhil1988 wrote:

>> In Python 3 you can't decode strings because they are Unicode strings
>> and it doesn't make sense to decode a Unicode string. You can only
>> decode encoded things which are byte strings. So you are mixing up byte
>> strings and Unicode strings.
>
> Then, how should I do it?
> I read a byte string from sys.stdin which needs to converted to unicode
> string for further processing.

In 3.x, sys.stdin (stdout, stderr) are text streams, which means that they
read and write Unicode strings, not byte strings.

> I cannot just remove the decode statement and
> proceed?
> 
> This is it what it looks like:
> 
> for line in sys.stdin:
> line = line.decode('utf-8').strip()
> if line == '': #do something here
> elsif #do something here
> 
> If I remove the decode statement, line == '' never gets true. 

Did you inadvertently remove the strip() as well?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-16 Thread akhil1988

Then, how should I do it?
I read a byte string from sys.stdin which needs to converted to unicode
string for further processing. I cannot just remove the decode statement and
proceed?

This is it what it looks like:

for line in sys.stdin:
line = line.decode('utf-8').strip()
if line == '': #do something here
elsif #do something here

If I remove the decode statement, line == '' never gets true. 

--Akhil


Piet van Oostrum wrote:
> 
>> akhil1988  (a) wrote:
> 
>>a> ok!
>>a> I got the indentation errors fixed. Bu I get another error:
> 
>>a> Traceback (most recent call last):
>>a>   File "./temp.py", line 484, in 
>>a> main()
>>a>   File "./temp.py", line 476, in main
>>a> line.decode('utf-8').strip()
>>a> AttributeError: 'str' object has no attribute 'decode'
> 
>>a> I am using Python3.1
> 
> In Python 3 you can't decode strings because they are Unicode strings
> and it doesn't make sense to decode a Unicode string. You can only
> decode encoded things which are byte strings. So you are mixing up byte
> strings and Unicode strings.
> -- 
> Piet van Oostrum 
> URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
> Private email: p...@vanoostrum.org
> -- 
> http://mail.python.org/mailman/listinfo/python-list
> 
> 

-- 
View this message in context: 
http://www.nabble.com/UnicodeEncodeError%3A-%27ascii%27-codec-can%27t-encode-character-u%27%5Cxb7%27-in-position-13%3A-ordinal-not-in-range%28128%29-tp24509879p24525761.html
Sent from the Python - python-list mailing list archive at Nabble.com.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-16 Thread Piet van Oostrum
> akhil1988  (a) wrote:

>a> ok!
>a> I got the indentation errors fixed. Bu I get another error:

>a> Traceback (most recent call last):
>a>   File "./temp.py", line 484, in 
>a> main()
>a>   File "./temp.py", line 476, in main
>a> line.decode('utf-8').strip()
>a> AttributeError: 'str' object has no attribute 'decode'

>a> I am using Python3.1

In Python 3 you can't decode strings because they are Unicode strings
and it doesn't make sense to decode a Unicode string. You can only
decode encoded things which are byte strings. So you are mixing up byte
strings and Unicode strings.
-- 
Piet van Oostrum 
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: p...@vanoostrum.org
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-16 Thread akhil1988

ok!
I got the indentation errors fixed. Bu I get another error:

Traceback (most recent call last):
  File "./temp.py", line 484, in 
main()
  File "./temp.py", line 476, in main
line.decode('utf-8').strip()
AttributeError: 'str' object has no attribute 'decode'

I am using Python3.1

Thanks
Akhil




akhil1988 wrote:
> 
> Hi,
> 
> Thanks all for the replies.
> 
> I am working on a cluster of 15 nodes and I have now installed python 3.1
> on all of them. I tried installing python2.6 but there was some make
> error. So, I do not want to give more time in installing 2.4  and rather
> use 3.1 but for that I need to convert my 2.4 code to 3.1. 
> 
> I used 2to3 tool, and it did make many changes in the 2.4 code, but still
> there are some indentation errors that I am unable to resolve being new to
> python. I have attached my python code, can anyone please fix the
> indentation error in the code. I am using vi editor.
> 
> --Thanks a lot,
> Akhil http://www.nabble.com/file/p24522412/temp.py temp.py 
> 
> 
> alex23 wrote:
>> 
>> On Jul 16, 9:00 pm, akhil1988  wrote:
>>> I have switched to python 3.1 , but now I am getting some syntax errors
>>> in
>>> the code:
>> 
>> Python 3.x was a major release that endeavoured to clean up a number
>> of lingering issues with the language, the upshot being that it isn't
>> entirely backwards compatible with past versions. Unicode became the
>> default string type, which is what is causing the error here: the u-
>> prefix is no longer required (or even allowed).
>> 
>> However, Py3.x _does_ come with a handy tool for automatically
>> converting Python 2.x code to 3.x, called 2to3. One of the things it
>> should do is convert Py2.x unicode values into their correct
>> representation in 3.x.
>> 
>> With any luck, it should be able to convert the code you're using
>> entirely. Let us know how it goes.
>> -- 
>> http://mail.python.org/mailman/listinfo/python-list
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/UnicodeEncodeError%3A-%27ascii%27-codec-can%27t-encode-character-u%27%5Cxb7%27-in-position-13%3A-ordinal-not-in-range%28128%29-tp24509879p24523113.html
Sent from the Python - python-list mailing list archive at Nabble.com.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-16 Thread akhil1988

Hi,

Thanks all for the replies.

I am working on a cluster of 15 nodes and I have now installed python 3.1 on
all of them. I tried installing python2.6 but there was some make error. So,
I do not want to give more time in installing 2.4  and rather use 3.1 but
for that I need to convert my 2.4 code to 3.1. 

I used 2to3 tool, and it did make many changes in the 2.4 code, but still
there are some indentation errors that I am unable to resolve being new to
python. I have attached my python code, can anyone please fix the
indentation error in the code. I am using vi editor.

--Thanks a lot,
Akhil http://www.nabble.com/file/p24522412/temp.py temp.py 


alex23 wrote:
> 
> On Jul 16, 9:00 pm, akhil1988  wrote:
>> I have switched to python 3.1 , but now I am getting some syntax errors
>> in
>> the code:
> 
> Python 3.x was a major release that endeavoured to clean up a number
> of lingering issues with the language, the upshot being that it isn't
> entirely backwards compatible with past versions. Unicode became the
> default string type, which is what is causing the error here: the u-
> prefix is no longer required (or even allowed).
> 
> However, Py3.x _does_ come with a handy tool for automatically
> converting Python 2.x code to 3.x, called 2to3. One of the things it
> should do is convert Py2.x unicode values into their correct
> representation in 3.x.
> 
> With any luck, it should be able to convert the code you're using
> entirely. Let us know how it goes.
> -- 
> http://mail.python.org/mailman/listinfo/python-list
> 
> 

-- 
View this message in context: 
http://www.nabble.com/UnicodeEncodeError%3A-%27ascii%27-codec-can%27t-encode-character-u%27%5Cxb7%27-in-position-13%3A-ordinal-not-in-range%28128%29-tp24509879p24522412.html
Sent from the Python - python-list mailing list archive at Nabble.com.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-16 Thread John Machin
On Jul 16, 9:04 pm, akhil1988  wrote:
> Please click reply on the post and then read this reply in the editor.
> Actually, some sequences have been replaced to their graphical form when
> this post is published. So the python code is being displayed, what actually
> it is not.

What editor? I guess you mean that somebody's browser may be
interpreting the & blah ; thingies  but this is not relevant; your
syntax error problem is that 2.x unicode literals have a u in front
but 3.x str literals don't IOW you need to lose the first u in
u'\u00A0'
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-16 Thread Max Erickson
akhil1988  wrote:

> 
> akhil1988 wrote:
>> 
>> I have switched to python 3.1 , but now I am getting some syntax
>> errors in the code:
>> 
>> File "./customWikiExtractor.py", line 81
>> __char_entities =  {' '   :u'\u00A0', '¡'
>> :u'\u00A1', 
>> '¢':u'\u00A2',
>> ^

You may want to try 2.6. Python 3.1 is not syntax compatible with 2.5 
(so the u'' stuff won't work in 3.1):

http://docs.python.org/dev/py3k/whatsnew/3.0.html#removed-syntax



max

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-16 Thread alex23
On Jul 16, 9:00 pm, akhil1988  wrote:
> I have switched to python 3.1 , but now I am getting some syntax errors in
> the code:

Python 3.x was a major release that endeavoured to clean up a number
of lingering issues with the language, the upshot being that it isn't
entirely backwards compatible with past versions. Unicode became the
default string type, which is what is causing the error here: the u-
prefix is no longer required (or even allowed).

However, Py3.x _does_ come with a handy tool for automatically
converting Python 2.x code to 3.x, called 2to3. One of the things it
should do is convert Py2.x unicode values into their correct
representation in 3.x.

With any luck, it should be able to convert the code you're using
entirely. Let us know how it goes.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-16 Thread akhil1988

Please click reply on the post and then read this reply in the editor.
Actually, some sequences have been replaced to their graphical form when
this post is published. So the python code is being displayed, what actually
it is not.

--Akhil



akhil1988 wrote:
> 
> I have switched to python 3.1 , but now I am getting some syntax errors in
> the code:
> 
> File "./customWikiExtractor.py", line 81
> __char_entities =  {' '   :u'\u00A0', '¡' :u'\u00A1',
> '¢':u'\u00A2',
> ^
> SyntaxError: invalid syntax
> 
> line 81 is:
> __char_entities =  {' '   :u'\u00A0', '¡' :u'\u00A1', '¢'   
> :u'\u00A2',
> '£'  :u'\u00A3', '¤':u'\u00A4',
> '¥' :u'\u00A5',
> '¦' :u'\u00A6', '§'  :u'\u00A7',
> '¨' :u'\u00A8',
> '©'   :u'\u00A9', 'ª'  :u'\u00AA',
> '«'   :u'\u00AB',
> '¬':u'\u00AC', '­'   :u'\u00AD',
> '®' :u'\u00AE',
> '¯'   :u'\u00AF', '°'   :u'\u00B0',
> '±'  :u'\u00B1',
> '²'   :u'\u00B2', '³'  :u'\u00B3',
> '´'   :u'\u00B4',
> 'µ'  :u'\u00B5', '¶'  :u'\u00B6',
> '·'  :u'\u00B7',
> '¸'  :u'\u00B8', '¹'  :u'\u00B9',
> 'º':u'\u00BA',}
> 
> --Akhil
> 
> 
> John Nagle-2 wrote:
>> 
>> akhil1988 wrote:
>>> Sorry, it is sgmllib.py and not sgmmlib.py
>> 
>> Oh, that bug again.  See
>> 
>>  http://bugs.python.org/issue1651995
>> 
>> It's a bug in SGMLParser.  When Python 2.5 restricted ASCII to 0..127,
>> SGMLParser needed to be modified, but wasn't.
>> 
>> I reported that bug in February 2007.  It was fixed in
>> Python 2.6 and 3.0 on March 31, 2009.
>> 
>>  John Nagle
>> -- 
>> http://mail.python.org/mailman/listinfo/python-list
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/UnicodeEncodeError%3A-%27ascii%27-codec-can%27t-encode-character-u%27%5Cxb7%27-in-position-13%3A-ordinal-not-in-range%28128%29-tp24509879p24514367.html
Sent from the Python - python-list mailing list archive at Nabble.com.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-16 Thread akhil1988

I have switched to python 3.1 , but now I am getting some syntax errors in
the code:

File "./customWikiExtractor.py", line 81
__char_entities =  {' '   :u'\u00A0', '¡' :u'\u00A1',
'¢':u'\u00A2',
^
SyntaxError: invalid syntax

line 81 is:
__char_entities =  {' '   :u'\u00A0', '¡' :u'\u00A1', '¢'   
:u'\u00A2',
'£'  :u'\u00A3', '¤':u'\u00A4', '¥'
:u'\u00A5',
'¦' :u'\u00A6', '§'  :u'\u00A7', '¨'
:u'\u00A8',
'©'   :u'\u00A9', 'ª'  :u'\u00AA',
'«'   :u'\u00AB',
'¬':u'\u00AC', '­'   :u'\u00AD', '®'
:u'\u00AE',
'¯'   :u'\u00AF', '°'   :u'\u00B0',
'±'  :u'\u00B1',
'²'   :u'\u00B2', '³'  :u'\u00B3',
'´'   :u'\u00B4',
'µ'  :u'\u00B5', '¶'  :u'\u00B6',
'·'  :u'\u00B7',
'¸'  :u'\u00B8', '¹'  :u'\u00B9',
'º':u'\u00BA',}

--Akhil


John Nagle-2 wrote:
> 
> akhil1988 wrote:
>> Sorry, it is sgmllib.py and not sgmmlib.py
> 
> Oh, that bug again.  See
> 
>   http://bugs.python.org/issue1651995
> 
> It's a bug in SGMLParser.  When Python 2.5 restricted ASCII to 0..127,
> SGMLParser needed to be modified, but wasn't.
> 
> I reported that bug in February 2007.  It was fixed in
> Python 2.6 and 3.0 on March 31, 2009.
> 
>   John Nagle
> -- 
> http://mail.python.org/mailman/listinfo/python-list
> 
> 

-- 
View this message in context: 
http://www.nabble.com/UnicodeEncodeError%3A-%27ascii%27-codec-can%27t-encode-character-u%27%5Cxb7%27-in-position-13%3A-ordinal-not-in-range%28128%29-tp24509879p24514309.html
Sent from the Python - python-list mailing list archive at Nabble.com.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-16 Thread Piet van Oostrum
> akhil1988  (a) wrote:

>a> Chris,

>a> Using 

>a> print (u'line: %s' % line).encode('utf-8')

>a> the 'line' gets printed, but actually this print statement I was using just
>a> for testing, actually my code operates on 'line', on which I use line =
>a> line.decode('utf-8') as 'line' is read as bytes from a stream.

>a> And if I use line = line.encode('utf-8'), 

>a> I start getting other error like
>a> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4561:
>a> ordinal not in range(128)
>a> at line = line.replace('<<', u'«').replace('>>', u'»')

You do a Unicode replace here, so line should be a unicode string.
Therefore you have to do this before the line.encode('utf-8'), but after
the decode('utf-8'). 

It might be better to use different variables for Unicode strings and
byte code strings to prevent confusion, like:

'line' is read as bytes from a stream
uline = line.decode('utf-8')
uline = uline.replace('<<', u'«').replace('>>', u'»')
line = uline.encode('utf-8')
-- 
Piet van Oostrum 
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: p...@vanoostrum.org
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-16 Thread John Nagle

akhil1988 wrote:

Sorry, it is sgmllib.py and not sgmmlib.py


   Oh, that bug again.  See

http://bugs.python.org/issue1651995

It's a bug in SGMLParser.  When Python 2.5 restricted ASCII to 0..127,
SGMLParser needed to be modified, but wasn't.

I reported that bug in February 2007.  It was fixed in
Python 2.6 and 3.0 on March 31, 2009.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-15 Thread akhil1988

Chris,

Using 

print (u'line: %s' % line).encode('utf-8')

the 'line' gets printed, but actually this print statement I was using just
for testing, actually my code operates on 'line', on which I use line =
line.decode('utf-8') as 'line' is read as bytes from a stream.

And if I use line = line.encode('utf-8'), 

I start getting other error like
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4561:
ordinal not in range(128)
at line = line.replace('<<', u'«').replace('>>', u'»')


--Akhil

Chris Rebert-6 wrote:
> 
>> Chris Rebert-6 wrote:
>>>
>>> On Wed, Jul 15, 2009 at 9:34 PM, akhil1988 wrote:
>>>>
>>>> Hi!
>>>>
>>>> Can anyone please help me getting rid of this error:
>>>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in
>>>> position
>>>> 13: ordinal not in range(128)
>>>>
>>>> I am not a python programmer (though intend to start learning this
>>>> wonderful
>>>> language), I am just using a python script.
>>>>
>>>> After doing some search, I found that 0xb7 is a 'middle dot character'
>>>> that
>>>> is not interpreted by the python.
>>>> Even after inserting text = text.replace('\u00b7', '') in the script,
>>>> the
>>>> problem still persists.
>>>>
>>>> Can anyone please tell me the easiest way to get rid of this?
>>>
>>> We'll need the full error traceback. The error message at the end is
>>> just not enough information.
>>> As to fixing it, google for "UnicodeEncodeError". You should find
>>> about a million mailinglist threads on it.
> On Wed, Jul 15, 2009 at 10:05 PM, akhil1988 wrote:
>>
>> Well,
>> All I get is this traceback:
>>
>> File "./customWikiExtractor.py", line 492, in ?
>> main()
>> File "./customWikiExtractor.py", line 480, in main
>>print >> sys.stdout, 'line: %s' % line
>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in
>> position
>> 13: ordinal not in range(128)
>>
>> I am giving a string to the python code as input, and python processes it
>> like this:
>>
>> line = line.decode('utf-8').strip()
>>
>> After this when I do,
>> print >> sys.stdout, 'line: %s' % line
>> I get this Unicode error.
> 
> Try this instead (the ">> sys.stdout" part is redundant):
> print (u'line: %s' % line).encode('utf8')
> #if your system doesn't use UTF-8, change as necessary
> 
> Cheers,
> Chris
> -- 
> http://blog.rebertia.com
> -- 
> http://mail.python.org/mailman/listinfo/python-list
> 
> 

-- 
View this message in context: 
http://www.nabble.com/UnicodeEncodeError%3A-%27ascii%27-codec-can%27t-encode-character-u%27%5Cxb7%27-in-position-13%3A-ordinal-not-in-range%28128%29-tp24509879p24510519.html
Sent from the Python - python-list mailing list archive at Nabble.com.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-15 Thread Chris Rebert
> Chris Rebert-6 wrote:
>>
>> On Wed, Jul 15, 2009 at 9:34 PM, akhil1988 wrote:
>>>
>>> Hi!
>>>
>>> Can anyone please help me getting rid of this error:
>>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in
>>> position
>>> 13: ordinal not in range(128)
>>>
>>> I am not a python programmer (though intend to start learning this
>>> wonderful
>>> language), I am just using a python script.
>>>
>>> After doing some search, I found that 0xb7 is a 'middle dot character'
>>> that
>>> is not interpreted by the python.
>>> Even after inserting text = text.replace('\u00b7', '') in the script, the
>>> problem still persists.
>>>
>>> Can anyone please tell me the easiest way to get rid of this?
>>
>> We'll need the full error traceback. The error message at the end is
>> just not enough information.
>> As to fixing it, google for "UnicodeEncodeError". You should find
>> about a million mailinglist threads on it.
On Wed, Jul 15, 2009 at 10:05 PM, akhil1988 wrote:
>
> Well,
> All I get is this traceback:
>
> File "./customWikiExtractor.py", line 492, in ?
> main()
> File "./customWikiExtractor.py", line 480, in main
>print >> sys.stdout, 'line: %s' % line
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position
> 13: ordinal not in range(128)
>
> I am giving a string to the python code as input, and python processes it
> like this:
>
> line = line.decode('utf-8').strip()
>
> After this when I do,
> print >> sys.stdout, 'line: %s' % line
> I get this Unicode error.

Try this instead (the ">> sys.stdout" part is redundant):
print (u'line: %s' % line).encode('utf8')
#if your system doesn't use UTF-8, change as necessary

Cheers,
Chris
-- 
http://blog.rebertia.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-15 Thread akhil1988

Sorry, it is sgmllib.py and not sgmmlib.py

-- Akhil

akhil1988 wrote:
> 
> Well, 
> All I get is this traceback:
> 
> File "./customWikiExtractor.py", line 492, in ?
>  main()
> File "./customWikiExtractor.py", line 480, in main
>     print >> sys.stdout, 'line: %s' % line
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in
> position 13: ordinal not in range(128)
> 
> I am giving a string to the python code as input, and python processes it
> like this:
> 
> line = line.decode('utf-8').strip()
> 
> After this when I do, 
> print >> sys.stdout, 'line: %s' % line
> I get this Unicode error.
> 
> I tried a few repairs, but they did not work like
> changing: in sgmmlib.py (/usr/lib64/python2.4/sgmmlib.py) 
> if not 0 < n <= 255
> to
> if not 0 < n <= 127 
> 
> But since this did not work, I have changed it back to it's original form.
> 
> --Thanks,
> Akhil
> 
> 
> Chris Rebert-6 wrote:
>> 
>> On Wed, Jul 15, 2009 at 9:34 PM, akhil1988 wrote:
>>>
>>> Hi!
>>>
>>> Can anyone please help me getting rid of this error:
>>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in
>>> position
>>> 13: ordinal not in range(128)
>>>
>>> I am not a python programmer (though intend to start learning this
>>> wonderful
>>> language), I am just using a python script.
>>>
>>> After doing some search, I found that 0xb7 is a 'middle dot character'
>>> that
>>> is not interpreted by the python.
>>> Even after inserting text = text.replace('\u00b7', '') in the script,
>>> the
>>> problem still persists.
>>>
>>> Can anyone please tell me the easiest way to get rid of this?
>> 
>> We'll need the full error traceback. The error message at the end is
>> just not enough information.
>> As to fixing it, google for "UnicodeEncodeError". You should find
>> about a million mailinglist threads on it.
>> 
>> Cheers,
>> Chris
>> -- 
>> http://blog.rebertia.com
>> -- 
>> http://mail.python.org/mailman/listinfo/python-list
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/UnicodeEncodeError%3A-%27ascii%27-codec-can%27t-encode-character-u%27%5Cxb7%27-in-position-13%3A-ordinal-not-in-range%28128%29-tp24509879p24510252.html
Sent from the Python - python-list mailing list archive at Nabble.com.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-15 Thread akhil1988

Well, 
All I get is this traceback:

File "./customWikiExtractor.py", line 492, in ?
 main()
File "./customWikiExtractor.py", line 480, in main
print >> sys.stdout, 'line: %s' % line
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position
13: ordinal not in range(128)

I am giving a string to the python code as input, and python processes it
like this:

line = line.decode('utf-8').strip()

After this when I do, 
print >> sys.stdout, 'line: %s' % line
I get this Unicode error.

I tried a few repairs, but they did not work like
changing: in sgmmlib.py (/usr/lib64/python2.4/sgmmlib.py) 
if not 0 < n <= 255
to
if not 0 < n <= 127 

But since this did not work, I have changed it back to it's original form.

--Thanks,
Akhil


Chris Rebert-6 wrote:
> 
> On Wed, Jul 15, 2009 at 9:34 PM, akhil1988 wrote:
>>
>> Hi!
>>
>> Can anyone please help me getting rid of this error:
>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in
>> position
>> 13: ordinal not in range(128)
>>
>> I am not a python programmer (though intend to start learning this
>> wonderful
>> language), I am just using a python script.
>>
>> After doing some search, I found that 0xb7 is a 'middle dot character'
>> that
>> is not interpreted by the python.
>> Even after inserting text = text.replace('\u00b7', '') in the script, the
>> problem still persists.
>>
>> Can anyone please tell me the easiest way to get rid of this?
> 
> We'll need the full error traceback. The error message at the end is
> just not enough information.
> As to fixing it, google for "UnicodeEncodeError". You should find
> about a million mailinglist threads on it.
> 
> Cheers,
> Chris
> -- 
> http://blog.rebertia.com
> -- 
> http://mail.python.org/mailman/listinfo/python-list
> 
> 

-- 
View this message in context: 
http://www.nabble.com/UnicodeEncodeError%3A-%27ascii%27-codec-can%27t-encode-character-u%27%5Cxb7%27-in-position-13%3A-ordinal-not-in-range%28128%29-tp24509879p24510222.html
Sent from the Python - python-list mailing list archive at Nabble.com.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-15 Thread Chris Rebert
On Wed, Jul 15, 2009 at 9:34 PM, akhil1988 wrote:
>
> Hi!
>
> Can anyone please help me getting rid of this error:
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position
> 13: ordinal not in range(128)
>
> I am not a python programmer (though intend to start learning this wonderful
> language), I am just using a python script.
>
> After doing some search, I found that 0xb7 is a 'middle dot character' that
> is not interpreted by the python.
> Even after inserting text = text.replace('\u00b7', '') in the script, the
> problem still persists.
>
> Can anyone please tell me the easiest way to get rid of this?

We'll need the full error traceback. The error message at the end is
just not enough information.
As to fixing it, google for "UnicodeEncodeError". You should find
about a million mailinglist threads on it.

Cheers,
Chris
-- 
http://blog.rebertia.com
-- 
http://mail.python.org/mailman/listinfo/python-list


UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

2009-07-15 Thread akhil1988

Hi!

Can anyone please help me getting rid of this error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position
13: ordinal not in range(128)

I am not a python programmer (though intend to start learning this wonderful
language), I am just using a python script.

After doing some search, I found that 0xb7 is a 'middle dot character' that
is not interpreted by the python.
Even after inserting text = text.replace('\u00b7', '') in the script, the
problem still persists.

Can anyone please tell me the easiest way to get rid of this?

--Thanks,
Akhil
-- 
View this message in context: 
http://www.nabble.com/UnicodeEncodeError%3A-%27ascii%27-codec-can%27t-encode-character-u%27%5Cxb7%27-in-position-13%3A-ordinal-not-in-range%28128%29-tp24509879p24509879.html
Sent from the Python - python-list mailing list archive at Nabble.com.

-- 
http://mail.python.org/mailman/listinfo/python-list