On 01/08/2022 13:17, Daniel Lee wrote:
Hello, I my code with tkinter was working before, and now, it has many errors
in it. I’m not sure what has happened. The results after running are below:
"D:\Python Projects\tes\venv\Scripts\python.exe" "D:/Python
Projects/tes/main.py"
Traceback (most
Matthew Barnett added the comment:
For reference, I also implemented .regs in the regex module for compatibility,
but I've never used it myself. I had to do some investigating to find out what
it did!
It returns a tuple of the spans of the groups.
Perhaps I might have used it if it didn't
Matthew Barnett added the comment:
I don't think it's a typo, and you could argue the case for "qualifiers", but I
still agree with the proposal as it's a more meaningful term in the context.
--
___
Python tracker
<https://bu
Matthew Barnett added the comment:
I'd just like to point out that to a user it could _look_ like a bug, that an
error occurred while reporting, because the traceback isn't giving a 'clean'
report; the stuff about the KeyError is an internal detail
Matthew Barnett added the comment:
The expression is a repeated alternative where the first alternative is a
repeat. Repeated repeats can result in a lot of attempts and backtracking and
should be avoided.
Try this instead:
(0|1(01*0)*1
Matthew Barnett added the comment:
That pattern has:
(?P[^]]+)+
Is that intentional? It looks wrong to me.
--
___
Python tracker
<https://bugs.python.org/issue46
Change by Matthew Barnett :
--
stage: -> resolved
status: open -> closed
___
Python tracker
<https://bugs.python.org/issue46515>
___
___
Python-bugs-list
Matthew Barnett added the comment:
They're not supported in string literals either:
Python 3.10.1 (tags/v3.10.1:2cd268a, Dec 6 2021, 19:10:37) [MSC v.1929 64 bit
(AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more informat
Matthew Barnett added the comment:
It's not just in the 'if' clause:
>>> class Foo:
... a = ['a', 'b']
... b = ['b', 'c']
... c = [b for x in a]
...
Traceback (most recent call last):
File "", line 1, in
File "", line 4, in Foo
File "
Matthew Barnett added the comment:
For comparison, the regex module says that 0x1C..0x1F aren't whitespace, and
the Unicode property White_Space ("\p{White_Space}" in a pattern, where
supported) also says that they aren't
Matthew Barnett added the comment:
It's definitely a bug.
In order for the pattern to match, the negative lookaround must match, which
means that its subexpression mustn't match, so none of the groups in that
subexpression have captured.
--
versions: +Python 3.10
Matthew Barnett added the comment:
It can be shortened to this:
buffer = b"a" * 8191 + b"\\r\\n"
with open("bug_csv.csv", "wb") as f:
f.write(buffer)
with open("bug_csv.csv", encoding="unicode_escape", newline="") as
Matthew Barnett added the comment:
I wonder whether there should be a couple of other endianness values, namely,
"native" and "network", for those cases where you want to be explicit about it.
If you use "big" it's not clear whether that's because you want
Matthew Barnett added the comment:
I'd probably say "In the face of ambiguity, refuse the temptation to guess".
As there's disagreement about the 'correct' default, make it None and require
either "big" or "little" if length > 1 (the defaul
Matthew Barnett added the comment:
It's called "catastrophic backtracking". Think of the number of ways it could
match, say, 4 characters: 4, 3+1, 2+2, 2+1+1, 1+3, 1+2+1, 1+1+2, 1+1+1+1. Now
try 5 characters...
--
___
Python track
Matthew Barnett added the comment:
I've only just realised that the test cases don't cover all eventualities: none
of them test what happens with multiple spaces _between_ the letters, such as:
' a b c '.split(maxsplit=1) == ['a', 'b c ']
Comparing that with:
' a b c '.split
Matthew Barnett added the comment:
We have that already, although it's spelled:
' x y z'.split(maxsplit=1) == ['x', 'y z']
because the keepempty option doesn't exist yet.
--
___
Python tracker
<https://bugs.python.org/issue28
Matthew Barnett added the comment:
The best way to think of it is that .split() is like .split(' '), except that
it's splitting on any whitespace character instead of just ' ', and keepempty
is defaulting to False instead of True.
Therefore:
' x y z'.split(maxsplit=1, keepempty=True
Matthew Barnett added the comment:
The case:
' a b c '.split(maxsplit=1) == ['a', 'b c ']
suggests that empty strings don't count towards maxsplit, otherwise it would
return [' a b c '] (i.e. the split would give ['', ' a b c '] and dropping
the empty strings would give [' a b c
Matthew Barnett added the comment:
Do any other regex implementations behave the way you want?
In my experience, there's no single "correct" way for a regex to behave;
different implementations might give slightly different results, so if the most
common ones behave a c
Matthew Barnett added the comment:
I'm also -1, for the same reason as Serhiy gave. However, if it was opt-in,
then I'd be OK with it.
--
nosy: +mrabarnett
___
Python tracker
<https://bugs.python.org/issue43
Matthew Barnett added the comment:
Sorry to bikeshed, but I think it would be clearer to keep the version next to
the "python" and the "setup" at the end:
python-3.10.0a5-win32-setup.exe
python-3.10.0a5-win64-setup.exe
Matthew Barnett added the comment:
Example 1:
((a)|b\2)*
^^^ Group 2
((a)|b\2)*
^^ Reference to group 2
The reference refers backwards to the group.
Example 2:
(b\2|(a))*
^^^ Group 2
(b\2|(a))*
^^ Reference to group 2
Matthew Barnett added the comment:
It's not a crash. It's complaining that you're referring to group 2 before
defining it. The re module doesn't support forward references to groups, but
only backward references to them.
--
___
Python tracker
Matthew Barnett added the comment:
In a regex, putting a backslash before any character that's not an ASCII-range
letter or digit makes it a literal. re.escape doesn't special-case control
characters. Its purpose is to make a string that might contain metacharacters
into one that's
Matthew Barnett added the comment:
That behaviour has nothing to do with re.
This line:
samples = filter(lambda sample: not pttn.match(sample), data)
creates a generator that, when evaluated, will use the value of 'pttn' _at that
time_.
However, you then bind 'pttn' to something else
Matthew Barnett added the comment:
Not a bug.
Argument 4 of re.sub is the count:
sub(pattern, repl, string, count=0, flags=0)
not the flags.
--
nosy: +mrabarnett
resolution: -> not a bug
stage: -> resolved
status: open -> closed
_
Matthew Barnett added the comment:
Arguments are evaluated first and then the results are passed to the function.
That's true throughout the language.
In this instance, you can use \g<1> in the replacement string to refer to group
1:
re.sub(r'([a-z]+)', fr"\g<1>{REPLACEM
Matthew Barnett added the comment:
The arguments are: re.sub(pattern, repl, string, count=0, flags=0).
Therefore:
re.sub("pattern","replace", txt, re.IGNORECASE | re.DOTALL)
is passing re.IGNORECASE | re.DOTALL as the count, not the flags.
It's in the documentation
Matthew Barnett added the comment:
The 4th argument of re.sub is 'count', not 'flags'.
re.IGNORECASE has the numeric value of 2, so:
re.sub(r'[aeiou]', '#', 'all is fair in love and war', re.IGNORECASE)
is equivalent to:
re.sub(r'[aeiou]', '#', 'all is fair in love and war', count
Matthew Barnett added the comment:
I think what's happening is that in 'compiler_dict' (Python/compile.c), it's
checking whether 'elements' has reached a maximum (0x). However, it's not
doing this after incrementing; instead, it's checking before incrementing and
resetting 'elements
Matthew Barnett added the comment:
That's what searching does!
Does the pattern match here? If not, advance by one character and try again.
Repeat until a match is found or you've reached the end.
--
___
Python tracker
<https://bugs.python.
Matthew Barnett added the comment:
The documentation is talking about whether it'll match at the current position
in the string. It's not a bug.
--
resolution: -> not a bug
___
Python tracker
<https://bugs.python.org/issu
Matthew Barnett added the comment:
Duplicate of Issue39687.
See https://docs.python.org/3/library/re.html#re.sub and
https://docs.python.org/3/whatsnew/3.7.html#changes-in-the-python-api.
--
resolution: -> duplicate
stage: -> resolved
status: open -&g
Matthew Barnett added the comment:
A smaller change to the regex would be to replace the "(?:.*,)*" with
"(?:[^,]*,)*".
I'd also suggest using a raw string instead:
rx = re.compile(r'''(?:[^,]*,)*[ \t]*([^ \t]+)[ \t]+realm=(["']?)([^"']*)\2''',
re.I)
Change by Matthew Barnett :
--
stage: -> resolved
status: open -> closed
___
Python tracker
<https://bugs.python.org/issue39436>
___
___
Python-bugs-list
Matthew Barnett added the comment:
Python floats have 53 bits of precision, so ints larger than 2**53 will lose
their lower bits (assumed to be 0) when converted.
--
nosy: +mrabarnett
resolution: -> not a bug
___
Python tracker
<
Matthew Barnett added the comment:
I've just tried it on Windows 10 with Python 3.8 64-bit and Python 3.8 32-bit
without issue.
--
nosy: +mrabarnett
___
Python tracker
<https://bugs.python.org/issue38
Matthew Barnett added the comment:
I could also add: would sorting be case-sensitive or case-insensitive? Windows
is case-insensitive, Linux is case-sensitive.
--
nosy: +mrabarnett
___
Python tracker
<https://bugs.python.org/issue38
Matthew Barnett added the comment:
It's been many years since I looked at the code, and there have been changes
since then, so some of the details might not be correct.
As to have it should behave:
re.match('(?:()|(?(1)()|z)){1,2}(?(2)a|z)', 'a')
Iteration 1.
Match the repeated part. Group
Matthew Barnett added the comment:
Suppose you had a pattern:
.*
It would advance one character on each iteration of the * until the . failed to
match. The text is finite, so it would stop matching eventually.
Now suppose you had a pattern:
(?:)*
On each iteration
Matthew Barnett added the comment:
If we did decide to remove it, but there was still a demand for octal escapes,
then I'd suggest introducing \oXXX.
--
___
Python tracker
<https://bugs.python.org/issue38
Matthew Barnett added the comment:
A numeric escape of 3 digits is an octal (base 8) escape; the octal escape
"\100" gives the same character as the hexadecimal escape "\x40".
In a replacement template, you can use "\g<100>" if you want group 100 b
Matthew Barnett added the comment:
You wrote "the u had already been removed by hand". By removing the u in the
_Python 2_ code, you changed that string from a Unicode string to a bytestring.
In a bytestring, \u is not an escape; b"\u" == b"\\u".
Matthew Barnett added the comment:
I've just had a look at _uniq, and the code surprises me.
The obvious way to detect duplicates is with a set, but that requires the items
to be hashable. Are they?
Well, the first line of the function uses 'set', so they are.
Why, then, isn't it using
Matthew Barnett added the comment:
For historical reasons, if it isn't valid as a repeat then it's a literal. This
is true in other regex implementations, and is by no means unique to the re
module.
--
resolution: -> not a bug
stage: -> resolved
status: open -&g
Matthew Barnett added the comment:
The problem is the "(?:[^<]+|<(?!/head>))*?".
If I simplify it a little I get "(?:[^<]+)*?", which is a repeat within a
repeat.
There are many ways in which it could match, and if what follows fails to match
(it doesn't because there's no "
Matthew Barnett added the comment:
I've just come across the same problem.
For future reference, adding the following code before using a Treeview widget
will fix the problem:
def fixed_map(option):
# Fix for setting text colour for Tkinter 8.6.9
# From: https://core.tcl.tk/tk/info
Matthew Barnett added the comment:
That should be:
def __repr__(self):
return repr(self.name)
Not a bug.
--
resolution: -> not a bug
stage: -> resolved
status: open -> closed
___
Python tracker
<https://bug
Matthew Barnett added the comment:
Consider re.findall(r'.{0,2}', 'abcde').
It finds 'ab', then continues where it left off to find 'cd', then 'e'.
It can also find ''; re.match(r'.*', '') does match, after all.
It could, in fact, an infinite number of ''.
And what about re.match(r
Matthew Barnett added the comment:
It's now consistent with Perl, PCRE and .Net (C#), as well as re.split(),
re.sub(), re.findall() and re.finditer().
--
___
Python tracker
<https://bugs.python.org/issue32
Matthew Barnett added the comment:
The list alternates between substrings (s, between the splits) and captures (c):
['1', '1', '2', '2', '11']
-s- -c- -s- -c- -s--
You can use slicing to extract the substrings:
>>> re.split(r'(?<=(\d))(?!\1)(?=\d)', '12111')[ : : 2]
['1
Matthew Barnett added the comment:
>From the docs:
"""If capturing parentheses are used in pattern, then the text of all groups in
the pattern are also returned as part of the resulting list."""
The pattern does contain a capture, so that's why the
Matthew Barnett added the comment:
You could italicise the "protocol" part using asterisks, like this:
*protocol*_request
or this:
*protocol*\ _request
depending on the implementation of the rst software.
--
nosy: +mrabarnett
___
Pyth
Matthew Barnett added the comment:
It matches, and the span is (0, 2).
The only way that it can match like that is for the capture group to match the
'a', and the final 'b' to match the 'b'.
Therefore, re.search(r'(ab|a)*b', 'ab').groups() should be ('a', ), as it is
for the pattern
Matthew Barnett added the comment:
It looks like a bug in re to me.
--
___
Python tracker
<https://bugs.python.org/issue35859>
___
___
Python-bugs-list mailin
Matthew Barnett added the comment:
Look at the spans of the groups:
>>> import re
>>> re.search(r'^(?:(\d*)(\D*))*$', "42AZ").span(1)
(4, 4)
>>> re.search(r'^(?:(\d*)(\D*))*$', "42AZ").span(2)
(4, 4)
They're telling you that the groups are matc
Matthew Barnett added the comment:
@Steven: The complaint is that the BEL character ('\a') doesn't result in a
beep when printed.
@Siva: These days, you shouldn't be relying on '\a' because it's not always
supported. If you want to make a beep, do so with the appropriate function
call. Ask
Matthew Barnett added the comment:
A similar issue exists with centring:
>>> format(42, '^020')
'0420'
--
nosy: +mrabarnett
___
Python tracker
<https://bugs.python.or
Matthew Barnett added the comment:
It always returns the dot.
For example:
>>> posixpath.splitext('.blah.txt')
('.blah', '.txt')
If there's no extension (no dot):
>>> posixpath.splitext('blah')
('blah', '')
Not a bug.
--
nosy: +mrabarnett
resolution:
Matthew Barnett added the comment:
@Ezio: the value of stringy_thingy is irrelevant because it never gets that
far; it fails when it tries to parse the replacement, which occurs before
attempting any matching.
I can't reproduce the difference either.
--
status: pending -> o
Change by Matthew Barnett :
--
nosy: -mrabarnett
___
Python tracker
<https://bugs.python.org/issue34694>
___
___
Python-bugs-list mailing list
Unsubscribe:
Change by Matthew Barnett :
--
Removed message: https://bugs.python.org/msg326012
___
Python tracker
<https://bugs.python.org/issue34763>
___
___
Python-bug
Change by Matthew Barnett :
--
Removed message: https://bugs.python.org/msg326014
___
Python tracker
<https://bugs.python.org/issue34763>
___
___
Python-bug
Change by Matthew Barnett :
--
Removed message: https://bugs.python.org/msg326013
___
Python tracker
<https://bugs.python.org/issue34763>
___
___
Python-bug
Change by Matthew Barnett :
--
Removed message: https://bugs.python.org/msg326015
___
Python tracker
<https://bugs.python.org/issue34763>
___
___
Python-bug
Matthew Barnett added the comment:
Unicode 11.0.0 has 卅 (U+5345) as being numeric and having the value 30.
What's the difference between that and U+4E17?
I notice that they look at lot alike. Are they different variants, perhaps
traditional vs simplified
Matthew Barnett added the comment:
I don't see a problem with this. If the zip file has 'dist/file1.py' then you
know to create a directory when unzipping. If you want to indicate that there's
an empty directory 'foo', then put 'foo/' in the zip file.
--
nosy: +mrabarnett
Matthew Barnett added the comment:
Not all uses of the word "master" are associated with slavery, e.g. "master
craftsman", "master copy", "master file table".
I think it's best to avoid use of master/slave where practicable, but other
uses o
Matthew Barnett added the comment:
For clarity, the first is '\U00010308\U00010316' and the second is
'\U00010306\U00010300\U0001030B'.
The BMP is the Basic Multilingual Plane, which covers the codepoints in the
range U+ to U+. Some software has a problem dealing with codepoints
Matthew Barnett added the comment:
It also raises a ValueError on Windows. For other invalid paths on Windows it
returns False.
--
nosy: +mrabarnett
___
Python tracker
<https://bugs.python.org/issue33
Matthew Barnett <pyt...@mrabarnett.plus.com> added the comment:
You don't give the value of 'newlines', but the problem is probably
catastrophic backtracking, not deadlock.
--
nosy: +mrabarnett
___
Python tracker <rep...@bugs.python.or
Matthew Barnett <pyt...@mrabarnett.plus.com> added the comment:
For the record, '\u200e' is '\N{LEFT-TO-RIGHT MARK}'.
--
nosy: +mrabarnett
___
Python tracker <rep...@bugs.python.org>
<https://bugs.python
Matthew Barnett <pyt...@mrabarnett.plus.com> added the comment:
findall() and finditer() consist of multiple uses of search(), basically, as do
sub() and split(), so we want the same rule to apply to them all.
--
___
Python tracke
Matthew Barnett <pyt...@mrabarnett.plus.com> added the comment:
The pattern:
\b|:+
will match a word boundary (zero-width) before colons, so if there's a word
followed by colons, finditer will find the boundary and then the colons. You
_can_ get a zero-width match (ZWM)
Matthew Barnett <pyt...@mrabarnett.plus.com> added the comment:
@Narendra: The argument, if provided, is merely a default. Checking whether it
_could_ be used would not be straightforward, and raising an exception if it
would never be used would have little, if any, benefit.
It's not
Matthew Barnett <pyt...@mrabarnett.plus.com> added the comment:
Your verbose examples put the pattern into raw triple-quoted strings, which is
OK, but their first character is a backslash, which makes the next character (a
newline) an escaped literal whitespace character. Escaped whit
Matthew Barnett <pyt...@mrabarnett.plus.com> added the comment:
@Victor: True, people often ignore DeprecationWarning anyway, but that's their
problem, at least you can say "well, you were warned". They might not have read
the documentation on it recently because they have n
Matthew Barnett <pyt...@mrabarnett.plus.com> added the comment:
@Tim: the regex module includes some extra checks to reduce the chance of
excessive backtracking. In the case of the OP's example, they seem to be
working. However, it's difficult to know when adding such checks wil
Matthew Barnett <pyt...@mrabarnett.plus.com> added the comment:
You shouldn't assume that just because it takes a long time on one
implementation that it'll take a long time on all of the others, because it's
sometimes possible to include additional checks to reduce the problem. (I doub
Matthew Barnett added the comment:
The re module works with codepoints, it doesn't understand canonical
equivalence.
For example, it doesn't recognise that "\N{LATIN CAPITAL LETTER E}\N{COMBINING
ACUTE ACCENT}" is equivalent to "\N{LATIN CAPITAL LETTER E WITH ACUTE}".
Thi
Matthew Barnett added the comment:
I think the relevant standard is ISO 8601:
https://en.wikipedia.org/wiki/ISO_8601
The first day of the week is Monday.
Note particularly the examples it gives:
Monday 29 December 2008 is written "2009-W01-1"
Sunday 3 January 2010 is wri
Matthew Barnett added the comment:
The regex module is much better in this respect, but it's not foolproof. With
this particular example it completes quickly.
--
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/i
Matthew Barnett added the comment:
The 4th parameter is the count, not the flags:
sub(pattern, repl, string, count=0, flags=0)
>>> re.sub(r'X.', '+', '-X\n-', flags=re.DOTALL)
'-+-'
--
resolution: -> not a bug
stage: -> resolved
status:
Matthew Barnett added the comment:
Python identifiers match the regex:
[_\p{XID_Start}]\p{XID_Continue}*
The standard re module doesn't support \p{...}, but the third-party "regex"
module does.
--
___
Python tracker <rep...@bu
Matthew Barnett added the comment:
In Unicode 9.0.0, U+1885 and U+1886 changed from being
General_Category=Other_Letter (Lo) to General_Category=Nonspacing_Mark (Mn).
U+2118 is General_Category=Math_Symbol (Sm) and U+212E is
General_Category=Other_Symbol (So).
\w doesn't include Mn, Sm or So
Matthew Barnett added the comment:
Expected result is datetime.datetime(2017, 6, 25, 0, 0).
--
nosy: +mrabarnett
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/i
Matthew Barnett added the comment:
See PEP 3131 -- Supporting Non-ASCII Identifiers
It says: """All identifiers are converted into the normal form NFKC while
parsing; comparison of identifiers is based on NFKC."""
>>> import unicodedata
>>> un
Matthew Barnett added the comment:
@Steven: Python 3.6 supports Unicode 9.
Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit
(AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> i
Matthew Barnett added the comment:
IDLE uses tkinter, which wraps tcl/tk. Versions up to tcl/tk 8.6 can't handle
'astral' codepoints.
See also:
Issue #30019: IDLE freezes when opening a file with astral characters
Issue #21084: IDLE can't deal with characters above the range (U+-U+
Matthew Barnett added the comment:
There are 4 patterns. They try to determine the delimiter and quote by looking
for matches. Each pattern supposedly covers one of 4 cases:
1. Delimiter, quote, value, quote, delimiter.
2. Start of line/text, quote, value, quote, delimiter.
3. Delimiter
Matthew Barnett added the comment:
If 'ignores' is '', you get this:
(?:\b(?:extern|G_INLINE_FUNC|%s)\s*)
which can match an empty string, and it's tried repeatedly.
That's inadvisable.
There's also:
(?:\s+|\*)+
which can match whitespace in multiple ways.
That's inadvisable too
Matthew Barnett added the comment:
The function solution does have a larger overhead than a literal.
Could the template be made more accepting of backslashes without breaking
anything? (There's also issue29995 "re.escape() escapes too much", which m
Matthew Barnett added the comment:
Yes, the second argument is a replacement template, not a literal.
This issue does point out a different problem, though: re.escape will add
backslashes that will then be treated as literals in the template, for example:
>>> re.sub(r'a',
Matthew Barnett added the comment:
A slightly shorter form:
/\*(?:(?!\*/).)*\*/
Basically it's:
match start
while not match end:
consume character
match end
If the "match end" is a single character, you can use a negated character set,
for example:
Matthew Barnett added the comment:
If we were doing it today, maybe we wouldn't cache them, but, as you say, it's
been like that for a long time. (The regex module also caches them, because the
re module does.) Unless someone can demonstrate that it's a problem, I'd say
just leave
Matthew Barnett added the comment:
The report says "== encodings: locale=UTF-8, FS=utf-8".
It says that "test_locale_caching" was skipped, but also that
"test_locale_flag" failed.
--
___
Python tracker
Matthew Barnett added the comment:
I'm just wondering whether the problem is just due to the locale's encoding
being UTF-8. The locale support in re really only works with encodings that use
1 byte/character.
--
___
Python tracker <
Matthew Barnett added the comment:
Ah, well, if it hasn't changed after this many years, it never will. Expect one
or two changes to the text. :-)
--
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/i
Matthew Barnett added the comment:
With the VERSION0 flag (the default behaviour), it should behave the same as
the re module, and that's not going to change.
--
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/i
1 - 100 of 551 matches
Mail list logo