Re: [Tutor] Exception repeated in a loop
Kent Johnson wrote on 06.12.2005: >The parser processes up to the error. It never recovers from the >error. HTMLParser has an internal buffer and buffer pointer that is >never advanced when an error is detected; each time you call feed() >it tries to parse the remaining data and gets the same error again. >Take a look at HTMLParser.goahead() in Lib/HTMLParser.py if you are >interested in the details. > Aha! That's what I needed to know. Thanks to all who answered. - Jan -- I'd never join any club that would have the likes of me as a member. - Groucho Marx ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Exception repeated in a loop
Hi, Nelson, Scott wrote on 06.12.2005: >An unhandled exception immediately stops the execution of your code. > >A handled exception (try/except) does not stop code execution (unless >you explicitly tell it to). > >This shows how a handled exception does not stop code execution: > >try: > raise Exception >except: > print 'caught exception' >print 'fell through' This is exactly what I need Python to do: Raise the exception for a certain record and go on with the following records. I just do not see why the same loop is raised over and over again - obviously because of the same malformed HTML tag. Adding a break statement causes Python to skip all following records, which is not what I need. Thanks, Jan -- There are two major products that come out of Berkeley: LSD and UNIX. We don't believe this to be a coincidence. - Jeremy S. Anderson ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Exception repeated in a loop
Jan Eden wrote: >Hi, > >I use the following loop to parse some HTML code: > >for record in data: >try: >parser.feed(record['content']) >except HTMLParseError, (msg): >print "!!!Parsing error in", record['page_id'], ": ", msg > >Now after HTMLParser encounters a parse error in one record, it repeats to >execute the except statement for all following records - why is that? > >!!!Parsing error in 8832 : bad end tag: '', at line 56568, column >1647999 >!!!Parsing error in 8833 : bad end tag: '', at line 56568, column >1651394 >!!!Parsing error in 8834 : bad end tag: '', at line 56568, column >1654789 >!!!Parsing error in 8835 : bad end tag: '', at line 56568, column >1658184 > The parser processes up to the error. It never recovers from the error. HTMLParser has an internal buffer and buffer pointer that is never advanced when an error is detected; each time you call feed() it tries to parse the remaining data and gets the same error again. Take a look at HTMLParser.goahead() in Lib/HTMLParser.py if you are interested in the details. IIRC HTMLParser is not noted for handling badly formed HTML. Beautiful Soup, ElementTidy, or HTML Scraper might be a better choice depending on what you are trying to do. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Exception repeated in a loop
An unhandled exception immediately stops the execution of your code. A handled exception (try/except) does not stop code execution (unless you explicitly tell it to). This shows how a handled exception does not stop code execution: try: raise Exception except: print 'caught exception' print 'fell through' Hope this helps... -Scott -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jan Eden Sent: Tuesday, December 06, 2005 10:24 AM To: Pawel Kraszewski; tutor@python.org Subject: Re: [Tutor] Exception repeated in a loop Hi Pawel, Pawel Kraszewski wrote on 06.12.2005: >Dnia wtorek, 6 grudnia 2005 16:29, Jan Eden napisa?: >> Hi, >> >> I use the following loop to parse some HTML code: >> >> for record in data: >> try: >> parser.feed(record['content']) >> except HTMLParseError, (msg): >> print "!!!Parsing error in", record['page_id'], ": ", msg >> >> Now after HTMLParser encounters a parse error in one record, it repeats to >> execute the except statement for all following records - why is that? > >Short answer: because you told Python to do so... > >Long answer: > >My hint for students having such problems is to execute their code with a >pencil on a hardcopy. They read aloud what the program currently does - >usually they spot the error during the first "reading". > >Your code being "read loud" > >1. begin loop >2. attempt to execute parser.feed >3. abort attempt if it fails, showing the error >4. take next loop > >So - you take next loop regardless of the failure or not. There are two ways >out of here. I wrote them "aloud", to transcribe into python as an excersize: > >(Notice the difference between this and your original) > >I) > >1. attempt to >2. begin loop >3. abort attempt if it fails, showing the error >4. take next loop > >II) >1. begin loop >2. attempt to execute parser.feed >3. abort attempt if it fails, showing the error AND breaking the loop >4. take next loop > Thanks, I tested your suggestion, which works fine. But I don't understand the problem with my original code. If the parser raises an exception for a certain record, it should print the error message and move on to the next record in the loop. Why would I need the break statement? What's more - if the break statement is executed, all following records will never be parsed. I still don't understand why failure of a single record affects the other records. Thanks, Jan -- There's no place like ~/ ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Exception repeated in a loop
Hi Pawel, Pawel Kraszewski wrote on 06.12.2005: >Dnia wtorek, 6 grudnia 2005 16:29, Jan Eden napisa?: >> Hi, >> >> I use the following loop to parse some HTML code: >> >> for record in data: >> try: >> parser.feed(record['content']) >> except HTMLParseError, (msg): >> print "!!!Parsing error in", record['page_id'], ": ", msg >> >> Now after HTMLParser encounters a parse error in one record, it repeats to >> execute the except statement for all following records - why is that? > >Short answer: because you told Python to do so... > >Long answer: > >My hint for students having such problems is to execute their code with a >pencil on a hardcopy. They read aloud what the program currently does - >usually they spot the error during the first "reading". > >Your code being "read loud" > >1. begin loop >2. attempt to execute parser.feed >3. abort attempt if it fails, showing the error >4. take next loop > >So - you take next loop regardless of the failure or not. There are two ways >out of here. I wrote them "aloud", to transcribe into python as an excersize: > >(Notice the difference between this and your original) > >I) > >1. attempt to >2. begin loop >3. abort attempt if it fails, showing the error >4. take next loop > >II) >1. begin loop >2. attempt to execute parser.feed >3. abort attempt if it fails, showing the error AND breaking the loop >4. take next loop > Thanks, I tested your suggestion, which works fine. But I don't understand the problem with my original code. If the parser raises an exception for a certain record, it should print the error message and move on to the next record in the loop. Why would I need the break statement? What's more - if the break statement is executed, all following records will never be parsed. I still don't understand why failure of a single record affects the other records. Thanks, Jan -- There's no place like ~/ ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Exception repeated in a loop
Dnia wtorek, 6 grudnia 2005 16:29, Jan Eden napisaĆ: > Hi, > > I use the following loop to parse some HTML code: > > for record in data: > try: > parser.feed(record['content']) > except HTMLParseError, (msg): > print "!!!Parsing error in", record['page_id'], ": ", msg > > Now after HTMLParser encounters a parse error in one record, it repeats to > execute the except statement for all following records - why is that? Short answer: because you told Python to do so... Long answer: My hint for students having such problems is to execute their code with a pencil on a hardcopy. They read aloud what the program currently does - usually they spot the error during the first "reading". Your code being "read loud" 1. begin loop 2. attempt to execute parser.feed 3. abort attempt if it fails, showing the error 4. take next loop So - you take next loop regardless of the failure or not. There are two ways out of here. I wrote them "aloud", to transcribe into python as an excersize: (Notice the difference between this and your original) I) 1. attempt to 2. begin loop 3. abort attempt if it fails, showing the error 4. take next loop II) 1. begin loop 2. attempt to execute parser.feed 3. abort attempt if it fails, showing the error AND breaking the loop 4. take next loop Hope this helps, -- Pawel Kraszewski ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Exception repeated in a loop
Hi, I use the following loop to parse some HTML code: for record in data: try: parser.feed(record['content']) except HTMLParseError, (msg): print "!!!Parsing error in", record['page_id'], ": ", msg Now after HTMLParser encounters a parse error in one record, it repeats to execute the except statement for all following records - why is that? !!!Parsing error in 8832 : bad end tag: '', at line 56568, column 1647999 !!!Parsing error in 8833 : bad end tag: '', at line 56568, column 1651394 !!!Parsing error in 8834 : bad end tag: '', at line 56568, column 1654789 !!!Parsing error in 8835 : bad end tag: '', at line 56568, column 1658184 Thanks. Jan -- Hanlon's Razor: Never attribute to malice that which can be adequately explained by stupidity. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor