[issue41748] HTMLParser: comma in attribute values with/without space

2021-02-01 Thread Ezio Melotti


Ezio Melotti  added the comment:

Merged! Thanks for the report and the PR!

--
assignee:  -> ezio.melotti
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41748] HTMLParser: comma in attribute values with/without space

2021-02-01 Thread miss-islington


miss-islington  added the comment:


New changeset 0874491bcc392f7bd9c394ec2fdab183e3f320dd by Miss Islington (bot) 
in branch '3.9':
bpo-41748: Handles unquoted attributes with commas (GH-24072)
https://github.com/python/cpython/commit/0874491bcc392f7bd9c394ec2fdab183e3f320dd


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41748] HTMLParser: comma in attribute values with/without space

2021-02-01 Thread miss-islington


miss-islington  added the comment:


New changeset 0869a713f21f4b2fe021d802cf18f1b1af53695f by Miss Islington (bot) 
in branch '3.8':
bpo-41748: Handles unquoted attributes with commas (GH-24072)
https://github.com/python/cpython/commit/0869a713f21f4b2fe021d802cf18f1b1af53695f


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41748] HTMLParser: comma in attribute values with/without space

2021-02-01 Thread STINNER Victor


STINNER Victor  added the comment:

Oh, thank you Karl Dubost for the fix!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41748] HTMLParser: comma in attribute values with/without space

2021-02-01 Thread miss-islington


Change by miss-islington :


--
pull_requests: +23231
pull_request: https://github.com/python/cpython/pull/24416

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41748] HTMLParser: comma in attribute values with/without space

2021-02-01 Thread miss-islington


Change by miss-islington :


--
nosy: +miss-islington
nosy_count: 4.0 -> 5.0
pull_requests: +23230
pull_request: https://github.com/python/cpython/pull/24415

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41748] HTMLParser: comma in attribute values with/without space

2021-02-01 Thread Ezio Melotti


Ezio Melotti  added the comment:


New changeset 9eb11a139fac5514d8456626806a68b3e3b7eafb by Karl Dubost in branch 
'master':
bpo-41748: Handles unquoted attributes with commas (#24072)
https://github.com/python/cpython/commit/9eb11a139fac5514d8456626806a68b3e3b7eafb


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41748] HTMLParser: comma in attribute values with/without space

2021-01-10 Thread karl


karl  added the comment:

Status: The PR should be ready and completed
https://github.com/python/cpython/pull/24072
and eventually be merged at a point. 
Thanks to ezio.melotti for the wonderful guidance.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41748] HTMLParser: comma in attribute values with/without space

2021-01-03 Thread karl


Change by karl :


--
keywords: +patch
pull_requests: +22904
stage: test needed -> patch review
pull_request: https://github.com/python/cpython/pull/24072

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41748] HTMLParser: comma in attribute values with/without space

2021-01-03 Thread Ezio Melotti


Ezio Melotti  added the comment:

Writing tests that verify the expected behavior is a great first step. The 
expected output in the tests should match the behavior described by the HTML 5 
specs (which should also correspond to the browsers' behavior), and should 
initially fail. You can start creating a PR with only the tests, clarifying 
that it's a work in progress, or wait until you have the fix too.

The next step would be tweaking the regex and the code until both the new tests 
and all the other ones work (excluding the one with the commas you are fixing). 
 You can then commit the fix in the same branch and push it -- GitHub will 
automatically update the PR.


> Do you have a suggestion to fix it?

If you are familiar enough with regexes, you could try to figure out whether it 
matches the invalid attributes or not, and if not why (I took a quick look and 
I didn't see anything immediately wrong in the regexes).

Since the output of the failing test is [('data', '')], 
it's likely that the parser doesn't know how to handle it and passes it to one 
of the handle_data() in the goahead() method.  You can figure out which one is 
being called and see which are the if-conditions that are leading the 
interpreter down this path rather than the usual path where the attributes are 
parsed correctly.

If you have other questions let me know :)

--
type: crash -> behavior

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41748] HTMLParser: comma in attribute values with/without space

2021-01-03 Thread karl


karl  added the comment:

Ah!

This is fixing it

diff --git a/Lib/html/parser.py b/Lib/html/parser.py
index 6083077981..790666 100644
--- a/Lib/html/parser.py
+++ b/Lib/html/parser.py
@@ -44,7 +44,7 @@
   (?:\s*=+\s*# value indicator
 (?:'[^']*'   # LITA-enclosed value
   |"[^"]*"   # LIT-enclosed value
-  |(?!['"])[^>\s]*   # bare value
+  |(?!['"])[^>]*   # bare value
  )
  (?:\s*,)*   # possibly followed by a comma
)?(?:\s|/(?!>))*




Ran 48 tests in 0.175s

OK

== Tests result: SUCCESS ==

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41748] HTMLParser: comma in attribute values with/without space

2021-01-03 Thread karl


Change by karl :


--
title: HTMLParser: parsing error -> HTMLParser: comma in attribute values 
with/without space

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com