Re: [Tutor] Exception repeated in a loop

2005-12-06 Thread Jan Eden
Kent Johnson wrote on 06.12.2005:

>The parser processes up to the error. It never recovers from the
>error. HTMLParser has an internal buffer and buffer pointer that is
>never advanced when an error is detected; each time you call feed()
>it tries to parse the remaining data and gets the same error again.
>Take a look at HTMLParser.goahead() in Lib/HTMLParser.py if you are
>interested in the details.
>
Aha! That's what I needed to know. Thanks to all who answered.

- Jan
-- 
I'd never join any club that would have the likes of me as a member. - Groucho 
Marx
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Exception repeated in a loop

2005-12-06 Thread Jan Eden
Hi,

Nelson, Scott wrote on 06.12.2005:

>An unhandled exception immediately stops the execution of your code.
>
>A handled exception (try/except) does not stop code execution (unless
>you explicitly tell it to).
>
>This shows how a handled exception does not stop code execution:
>
>try:
>   raise Exception
>except:
>   print 'caught exception'
>print 'fell through'

This is exactly what I need Python to do: Raise the exception for a certain 
record and go on with the following records.

I just do not see why the same loop is raised over and over again - obviously 
because of the same malformed HTML tag.

Adding a break statement causes Python to skip all following records, which is 
not what I need.

Thanks,

Jan
-- 
There are two major products that come out of Berkeley: LSD and UNIX. We don't 
believe this to be a coincidence. - Jeremy S. Anderson
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Exception repeated in a loop

2005-12-06 Thread Kent Johnson
Jan Eden wrote:

>Hi,
>
>I use the following loop to parse some HTML code:
>
>for record in data:
>try:
>parser.feed(record['content'])
>except HTMLParseError, (msg):
>print "!!!Parsing error in", record['page_id'], ": ", msg
>
>Now after HTMLParser encounters a parse error in one record, it repeats to 
>execute the except statement for all following records - why is that?
>
>!!!Parsing error in 8832 :  bad end tag: '', at line 56568, column 
>1647999
>!!!Parsing error in 8833 :  bad end tag: '', at line 56568, column 
>1651394
>!!!Parsing error in 8834 :  bad end tag: '', at line 56568, column 
>1654789
>!!!Parsing error in 8835 :  bad end tag: '', at line 56568, column 
>1658184
>
The parser processes up to the error. It never recovers from the error. 
HTMLParser has an internal buffer and buffer pointer that is never 
advanced when an error is detected; each time you call feed() it tries 
to parse the remaining data and gets the same error again. Take a look 
at HTMLParser.goahead() in Lib/HTMLParser.py if you are interested in 
the details.

IIRC HTMLParser is not noted for handling badly formed HTML. Beautiful 
Soup, ElementTidy, or HTML Scraper might be a better choice depending on 
what you are trying to do.

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Exception repeated in a loop

2005-12-06 Thread Nelson, Scott
An unhandled exception immediately stops the execution of your code.

A handled exception (try/except) does not stop code execution (unless
you explicitly tell it to).

This shows how a handled exception does not stop code execution:

try:
raise Exception
except:
print 'caught exception'
print 'fell through'


Hope this helps...

-Scott

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Jan Eden
Sent: Tuesday, December 06, 2005 10:24 AM
To: Pawel Kraszewski; tutor@python.org
Subject: Re: [Tutor] Exception repeated in a loop

Hi Pawel,

Pawel Kraszewski wrote on 06.12.2005:

>Dnia wtorek, 6 grudnia 2005 16:29, Jan Eden napisa?:
>> Hi,
>>
>> I use the following loop to parse some HTML code:
>>
>> for record in data:
>> try:
>> parser.feed(record['content'])
>> except HTMLParseError, (msg):
>> print "!!!Parsing error in", record['page_id'], ": ", msg
>>
>> Now after HTMLParser encounters a parse error in one record, it
repeats to
>> execute the except statement for all following records - why is that?
>
>Short answer: because you told Python to do so...
>
>Long answer:
>
>My hint for students having such problems is to execute their code with
a 
>pencil on a hardcopy. They read aloud what the program currently does
- 
>usually they spot the error during the first "reading".
>
>Your code being "read loud"
>
>1. begin loop
>2.  attempt to execute parser.feed
>3.   abort attempt if it fails, showing the error
>4. take next loop
>
>So - you take next loop regardless of the failure or not. There are two
ways 
>out of here. I wrote them "aloud", to transcribe into python as an
excersize:
>
>(Notice the difference between this and your original)
>
>I)
>
>1. attempt to 
>2.  begin loop
>3.   abort attempt if it fails, showing the error
>4.  take next loop
>
>II)
>1. begin loop
>2.  attempt to execute parser.feed
>3.   abort attempt if it fails, showing the error AND breaking the loop
>4. take next loop
>
Thanks, I tested your suggestion, which works fine. But I don't
understand the problem with my original code.

If the parser raises an exception for a certain record, it should print
the error message and move on to the next record in the loop. Why would
I need the break statement? What's more - if the break statement is
executed, all following records will never be parsed.

I still don't understand why failure of a single record affects the
other records.

Thanks,

Jan
-- 
There's no place like ~/
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Exception repeated in a loop

2005-12-06 Thread Jan Eden
Hi Pawel,

Pawel Kraszewski wrote on 06.12.2005:

>Dnia wtorek, 6 grudnia 2005 16:29, Jan Eden napisa?:
>> Hi,
>>
>> I use the following loop to parse some HTML code:
>>
>> for record in data:
>> try:
>> parser.feed(record['content'])
>> except HTMLParseError, (msg):
>> print "!!!Parsing error in", record['page_id'], ": ", msg
>>
>> Now after HTMLParser encounters a parse error in one record, it repeats to
>> execute the except statement for all following records - why is that?
>
>Short answer: because you told Python to do so...
>
>Long answer:
>
>My hint for students having such problems is to execute their code with a 
>pencil on a hardcopy. They read aloud what the program currently does  - 
>usually they spot the error during the first "reading".
>
>Your code being "read loud"
>
>1. begin loop
>2.  attempt to execute parser.feed
>3.   abort attempt if it fails, showing the error
>4. take next loop
>
>So - you take next loop regardless of the failure or not. There are two ways 
>out of here. I wrote them "aloud", to transcribe into python as an excersize:
>
>(Notice the difference between this and your original)
>
>I)
>
>1. attempt to 
>2.  begin loop
>3.   abort attempt if it fails, showing the error
>4.  take next loop
>
>II)
>1. begin loop
>2.  attempt to execute parser.feed
>3.   abort attempt if it fails, showing the error AND breaking the loop
>4. take next loop
>
Thanks, I tested your suggestion, which works fine. But I don't understand the 
problem with my original code.

If the parser raises an exception for a certain record, it should print the 
error message and move on to the next record in the loop. Why would I need the 
break statement? What's more - if the break statement is executed, all 
following records will never be parsed.

I still don't understand why failure of a single record affects the other 
records.

Thanks,

Jan
-- 
There's no place like ~/
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Exception repeated in a loop

2005-12-06 Thread Pawel Kraszewski
Dnia wtorek, 6 grudnia 2005 16:29, Jan Eden napisaƂ:
> Hi,
>
> I use the following loop to parse some HTML code:
>
> for record in data:
> try:
> parser.feed(record['content'])
> except HTMLParseError, (msg):
> print "!!!Parsing error in", record['page_id'], ": ", msg
>
> Now after HTMLParser encounters a parse error in one record, it repeats to
> execute the except statement for all following records - why is that?

Short answer: because you told Python to do so...

Long answer:

My hint for students having such problems is to execute their code with a 
pencil on a hardcopy. They read aloud what the program currently does  - 
usually they spot the error during the first "reading".

Your code being "read loud"

1. begin loop
2.  attempt to execute parser.feed
3.   abort attempt if it fails, showing the error
4. take next loop

So - you take next loop regardless of the failure or not. There are two ways 
out of here. I wrote them "aloud", to transcribe into python as an excersize:

(Notice the difference between this and your original)

I)

1. attempt to 
2.  begin loop
3.   abort attempt if it fails, showing the error
4.  take next loop

II)
1. begin loop
2.  attempt to execute parser.feed
3.   abort attempt if it fails, showing the error AND breaking the loop
4. take next loop

Hope this helps,
-- 
 Pawel Kraszewski
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Exception repeated in a loop

2005-12-06 Thread Jan Eden
Hi,

I use the following loop to parse some HTML code:

for record in data:
try:
parser.feed(record['content'])
except HTMLParseError, (msg):
print "!!!Parsing error in", record['page_id'], ": ", msg

Now after HTMLParser encounters a parse error in one record, it repeats to 
execute the except statement for all following records - why is that?

!!!Parsing error in 8832 :  bad end tag: '', at line 56568, column 
1647999
!!!Parsing error in 8833 :  bad end tag: '', at line 56568, column 
1651394
!!!Parsing error in 8834 :  bad end tag: '', at line 56568, column 
1654789
!!!Parsing error in 8835 :  bad end tag: '', at line 56568, column 
1658184

Thanks.

Jan
-- 
Hanlon's Razor: Never attribute to malice that which can be adequately 
explained by stupidity.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor