[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2022-04-02 Thread Marek Suscak


Marek Suscak  added the comment:

Alright, the PR is up. This is my first contribution to Python and I had to 
sign the Contributor Agreement so it may take 1 business day before you're able 
to accept it. In the meantime, can you please review and let me know if 
anything needs to change? I wasn't sure where / how to add an entry in the 
NEWS.d folder. Can you please help me? Thanks!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2022-04-02 Thread Marek Suscak


Change by Marek Suscak :


--
pull_requests: +30325
pull_request: https://github.com/python/cpython/pull/32256

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2022-04-02 Thread Łukasz Langa

Łukasz Langa  added the comment:

Marek, I can merge a fix for 3.9 for this. I don't think we should be removing 
_markupbase.ParserBase.error() in 3.9 as GH-8562 is doing. So we'd need a new 
patch, like GH-17643. However, that will need a test as well.

Would you be interested in creating the patch?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2022-03-27 Thread Irit Katriel


Change by Irit Katriel :


--
nosy: +lukasz.langa

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2022-03-27 Thread Marek Suscak


Marek Suscak  added the comment:

Based on the changelog, it seems that the fix for this issue has only landed in 
Python 3.10 but it's still affecting versions 3.7, 3.8 and 3.9. There are a few 
projects such as Apache Beam that have only recently added support for Python 
3.9 with no support for 3.10. Beautiful Soup is a popular tool that's affected 
by this bug and so my question is if backporting this bugfix to at least Python 
3.9 which is still an officially supported release is on the table?

--
message_count: 11.0 -> 12.0
nosy: +mareksuscak
nosy_count: 8.0 -> 9.0
pull_requests: +30221
pull_request: https://github.com/python/cpython/pull/8562

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2020-06-14 Thread Dong-hee Na


Change by Dong-hee Na :


--
nosy: +corona10

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2020-06-12 Thread Leonard Richardson


Leonard Richardson  added the comment:

I'm the maintainer of Beautiful Soup. I learned about this issue when one of my 
users filed a bug for it against Beautiful Soup 
(https://bugs.launchpad.net/beautifulsoup/+bug/1883264).

BeautifulSoupHTMLParser (my subclass of html.parser.HTMLParser) implements 
error() only to prevent the NotImplementedError raised by ParserBase.error(). 
The docstring for my error() implementation says: "this method is called only 
on very strange markup and our best strategy is to pretend it didn't happen and 
keep going."

It shouldn't affect Beautiful Soup if, in a future version of Python, 
HTMLParser.error is never called and ParserBase is removed.

--
nosy: +leonardr

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2020-04-17 Thread Pikamander2


Pikamander2  added the comment:

I updated to bs4 version 4.9.0 and the same issue occurs.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2020-04-17 Thread Pikamander2


Pikamander2  added the comment:

For reference, my Python version is 3.8.2 and my bs4 version is 4.8.1

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2020-04-17 Thread Pikamander2

Pikamander2  added the comment:

Here's a simplified real world example that I found if it helps anyone:

import bs4

html = ''' '''

soup = bs4.BeautifulSoup(html, 'html.parser')

--
nosy: +Pikamander2
Added file: https://bugs.python.org/file49070/weird-bug-3.py

___
Python tracker 
<https://bugs.python.org/issue34480>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com


[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2020-01-06 Thread Ezio Melotti


Ezio Melotti  added the comment:

I think so.
It might be worth double-checking if BeautifulSoup (and possibly other 3rd 
party libs) use _markupbase.py and/or ParserBase.error().  If they do, giving 
them a heads up and/or going through a regular deprecation process might be 
good, otherwise PR 8562 can be reopened and merged after removing the call to 
self.error() in ParserBase.parse_marked_section().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2020-01-05 Thread Berker Peksag


Berker Peksag  added the comment:

> _markupbase.py is internal, so it should be safe to remove
> ParserBase.error() and the code that calls it as suggested in #31844

Should I reopen https://github.com/python/cpython/pull/8562 then?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2020-01-05 Thread Ezio Melotti

Ezio Melotti  added the comment:

HTMLParser is supposed to follow the HTML5 standard, and never raise an error.

For the example in the first comment (""), the steps should be:

* https://html.spec.whatwg.org/multipage/parsing.html#data-state:tag-open-state
* 
https://html.spec.whatwg.org/multipage/parsing.html#tag-open-state:markup-declaration-open-state
* 
https://html.spec.whatwg.org/multipage/parsing.html#markup-declaration-open-state:bogus-comment-state
* https://html.spec.whatwg.org/multipage/parsing.html#bogus-comment-state

I agree that the error should be fixed by setting `match` to None, and a test 
case that triggers the UnboundLocalError (before the fix) should be added as 
well (what provided by Karthikeyan looks good).

However, it also seems wrong that HTMLParser ends up calling self.error() 
through  Lib/_markupbase.py ParserBase after HTMLParser.error() and all the 
calls to it have been removed.  _markupbase.py is internal, so it should be 
safe to remove ParserBase.error() and the code that calls it as suggested in 
#31844 (and possibly to merge _markupbase into html.parser too).  Even if this 
is done and the call to self.error() is removed from 
ParserBase.parse_marked_section(), `match` still needs to be set to None 
(either in the `else` branch or before the `if/elif` block).

--

___
Python tracker 
<https://bugs.python.org/issue34480>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com


[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2020-01-04 Thread Josh Kamdjou


Josh Kamdjou  added the comment:

(Author of PR https://github.com/python/cpython/pull/17643)

Since the behavior of self.error() is determined by the subclass 
implementation, an Exception is not guaranteed. How should this be handled? It 
seems the options are:

- continue execution, in which case 'match' needs to be defined (I proposed 
initialization to None, which results in returning -1 on the next line)
- return a value
- raise an Exception

Happy to update the PR with @xtreak's test cases.

--
nosy: +jkamdjou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2020-01-04 Thread Josh Kamdjou


Change by Josh Kamdjou :


--
keywords: +patch
pull_requests: +17250
stage: test needed -> patch review
pull_request: https://github.com/python/cpython/pull/17643

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2019-12-18 Thread Karthikeyan Singaravelan

Karthikeyan Singaravelan  added the comment:

Attaching a test case for this issue since the self.error code path was not 
covered in the current test suite. A PR is open proposing match variable 
initialisation with None https://github.com/python/cpython/pull/17643 .

diff --git Lib/test/test_htmlparser.py Lib/test/test_htmlparser.py
index a2bfb39d16..a9aff11706 100644
--- Lib/test/test_htmlparser.py
+++ Lib/test/test_htmlparser.py
@@ -766,6 +766,18 @@ class AttributesTestCase(TestCaseBase):
   [("href", "http://www.example.org/\";>;")]),
  ("data", "spam"), ("endtag", "a")])
 
+def test_invalid_keyword_error_exception(self):
+class InvalidMarkupException(Exception):
+pass
+
+class MyHTMLParser(html.parser.HTMLParser):
+
+def error(self, message):
+raise InvalidMarkupException(message)
+
+parser = MyHTMLParser()
+with self.assertRaises(InvalidMarkupException):
+parser.feed('')
 
 if __name__ == "__main__":
 unittest.main()

--
nosy: +xtreak

___
Python tracker 
<https://bugs.python.org/issue34480>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com


[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2018-09-13 Thread Ezio Melotti


Change by Ezio Melotti :


--
assignee:  -> ezio.melotti

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2018-08-23 Thread Berker Peksag


Change by Berker Peksag :


--
status:  -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2018-08-23 Thread Berker Peksag


Change by Berker Peksag :


--
Removed message: https://bugs.python.org/msg323967

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2018-08-23 Thread Berker Peksag

Berker Peksag  added the comment:

Thanks for the report.

HTMLParser.error() was supposed to raise an exception, but the BeautifulSoup 
project just prints a warning here:

def error(self, msg):
warnings.warn(msg)

https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/head:/bs4/builder/_htmlparser.py#L69

As a result of this, the code doesn't stop executing in the following branch:

else:
self.error('unknown status keyword %r in marked section' % 
rawdata[i+3:j])
if not match:
return -1

https://github.com/python/cpython/blob/3.7/Lib/_markupbase.py#L159

Note that HTMLParser.error() was removed in Python 3.5 
(https://github.com/python/cpython/commit/73a4359eb0eb624c588c5d52083ea4944f9787ea#diff-1a7486df8279dbac7f20abd487947845L171)
 and there is an open issue about the status of _markupbase.ParserBase.error(): 
Issue 31844.

I also think that 
https://github.com/python/cpython/commit/73a4359eb0eb624c588c5d52083ea4944f9787ea#diff-1a7486df8279dbac7f20abd487947845L171
 may have caused a minor regression when it was removed the error() method and 
its uses from the HTMLParser class. It still calls the parse_marked_section() 
method of _markupbase.ParserBase() which it then calls the error() method of 
_markupbase.ParserBase():

elif rawdata[i:i+3] == 'https://github.com/python/cpython/blob/3.7/Lib/html/parser.py#L264

--

___
Python tracker 
<https://bugs.python.org/issue34480>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com


[issue34480] _markupbase.py fails with UnboundLocalError on invalid keyword in marked section

2018-08-23 Thread Conrad

New submission from Conrad :

$ pip freeze | grep beautifulsoup4
beautifulsoup4==4.6.3

$ python
Python 3.7.0 (default, Jul 23 2018, 20:24:19)
[Clang 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from bs4 import BeautifulSoup
>>> text = ""
>>> BeautifulSoup(text, 'html.parser')
/Users/conradroche/solvvy/ml-pipeline/venv/lib/python3.7/site-packages/bs4/builder/_htmlparser.py:78:
 UserWarning: unknown status keyword 'hi ' in marked section
  warnings.warn(msg)
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/Users/conradroche/solvvy/ml-pipeline/venv/lib/python3.7/site-packages/bs4/__init__.py",
 line 282, in __init__
self._feed()
  File 
"/Users/conradroche/solvvy/ml-pipeline/venv/lib/python3.7/site-packages/bs4/__init__.py",
 line 343, in _feed
self.builder.feed(self.markup)
  File 
"/Users/conradroche/solvvy/ml-pipeline/venv/lib/python3.7/site-packages/bs4/builder/_htmlparser.py",
 line 247, in feed
parser.feed(markup)
  File 
"/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py",
 line 111, in feed
self.goahead(0)
  File 
"/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py",
 line 179, in goahead
k = self.parse_html_declaration(i)
  File 
"/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py",
 line 264, in parse_html_declaration
return self.parse_marked_section(i)
  File 
"/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_markupbase.py",
 line 160, in parse_marked_section
if not match:
UnboundLocalError: local variable 'match' referenced before assignment
>>>

--
components: Library (Lib)
messages: 323962
nosy: kodial
priority: normal
severity: normal
status: open
title: _markupbase.py fails with UnboundLocalError on invalid keyword in marked 
section
type: behavior
versions: Python 3.7

___
Python tracker 
<https://bugs.python.org/issue34480>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com