subject:"=\+ for strings"

Tkinter and astral characters (was: Decoding bytes to text strings in Python 2)

2024-06-24 Thread Peter J. Holzer via Python-list

On 2024-06-24 01:14:22 +0100, MRAB via Python-list wrote:
> Tkinter in recent versions of Python can handle astral characters, at least
> back to Python 3.8, the oldest I have on my Windows PC.

I just tried modifying
https://docs.python.org/3/library/tkinter.html#a-hello-world-program
to display "Hello World \N{ROCKET}" instead (Python 3.10.12 as included
with Ubuntu 22.04). I don't get a warning or error, but the emoji isn't
displayed either.

I suspect that the default font doesn't include emojis and Tk isn't
smart enough to fall back to a different font (unlike xfce4-terminal
which shows the emoji just fine).

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Decoding bytes to text strings in Python 2

2024-06-23 Thread Chris Angelico via Python-list

On Mon, 24 Jun 2024 at 10:18, MRAB via Python-list
 wrote:
> Tkinter in recent versions of Python can handle astral characters, at
> least back to Python 3.8, the oldest I have on my Windows PC.

Good to know, thanks! I was hoping that would be the case, but I don't
have a Windows system to check on, so I didn't want to speak without
facts.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Decoding bytes to text strings in Python 2

2024-06-23 Thread MRAB via Python-list

On 2024-06-24 00:30, Chris Angelico via Python-list wrote:

On Mon, 24 Jun 2024 at 08:20, Rayner Lucas via Python-list
 wrote:

In article ,
ros...@gmail.com says...
>
> If you switch to a Linux system, it should work correctly, and you'll
> be able to migrate the rest of the way onto Python 3. Once you achieve
> that, you'll be able to operate on Windows or Linux equivalently,
> since Python 3 solved this problem. At least, I *think* it will; my
> current system has a Python 2 installed, but doesn't have tkinter
> (because I never bothered to install it), and it's no longer available
> from the upstream Debian repos, so I only tested it in the console.
> But the decoding certainly worked.

Thank you for the idea of trying it on a Linux system. I did so, and my
example code generated the error:

_tkinter.TclError: character U+1f40d is above the range (U+-U+)
allowed by Tcl

So it looks like the problem is ultimately due to a limitation of
Tcl/Tk.

Yep, that seems to be the case. Not sure if that's still true on a
more recent Python, but it does look like you won't get astral
characters in tkinter on the one you're using.

[snip]
Tkinter in recent versions of Python can handle astral characters, at 
least back to Python 3.8, the oldest I have on my Windows PC.

--
https://mail.python.org/mailman/listinfo/python-list

Re: Decoding bytes to text strings in Python 2

2024-06-23 Thread Chris Angelico via Python-list

On Mon, 24 Jun 2024 at 08:20, Rayner Lucas via Python-list
 wrote:
>
> In article ,
> ros...@gmail.com says...
> >
> > If you switch to a Linux system, it should work correctly, and you'll
> > be able to migrate the rest of the way onto Python 3. Once you achieve
> > that, you'll be able to operate on Windows or Linux equivalently,
> > since Python 3 solved this problem. At least, I *think* it will; my
> > current system has a Python 2 installed, but doesn't have tkinter
> > (because I never bothered to install it), and it's no longer available
> > from the upstream Debian repos, so I only tested it in the console.
> > But the decoding certainly worked.
>
> Thank you for the idea of trying it on a Linux system. I did so, and my
> example code generated the error:
>
> _tkinter.TclError: character U+1f40d is above the range (U+-U+)
> allowed by Tcl
>
> So it looks like the problem is ultimately due to a limitation of
> Tcl/Tk.
Yep, that seems to be the case. Not sure if that's still true on a
more recent Python, but it does look like you won't get astral
characters in tkinter on the one you're using.

> I'm still not sure why it doesn't give an error on Windows and

Because of the aforementioned weirdness of old (that is: pre-3.3)
Python versions on Windows. They were built to use a messy, buggy
hybrid of UCS-2 and UTF-16. Sometimes this got you around problems, or
at least masked them; but it wouldn't be reliable. That's why, in
Python 3.3, all that was fixed :)

> instead either works (when UTF-8 encoding is specified) or converts the
> out-of-range characters to ones it can display (when the encoding isn't
> specified). But now I know what the root of the problem is, I can deal
> with it appropriately (and my curiosity is at least partly satisfied).

Converting out-of-range characters is fairly straightforward, at least
as long as your Python interpreter is correctly built (so, Python 3,
or a Linux build of Python 2).

"".join(c if ord(c) < 65536 else "?" for c in text)

> This has given me a much better understanding of what I need to do in
> order to migrate to Python 3 and add proper support for non-ASCII
> characters, so I'm very grateful for your help!
>

Excellent. Hopefully all this mess is just a transitional state and
you'll get to something that REALLY works, soon!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Decoding bytes to text strings in Python 2

2024-06-23 Thread Rayner Lucas via Python-list

In article , r...@zedat.fu-
berlin.de says...
> 
>   I didn't really do a super thorough deep dive on this,
>   but I'm just giving the initial impression without 
>   actually being familiar with Tkinter under Python 2,
>   so I might be wrong!
> 
>   The Text widget typically expects text in Tcl encoding,
>   which is usually UTF-8. 
> 
>   This is independent of the result returned by sys.get-
>   defaultencoding()! 
> 
>   If a UTF-8 string is inserted directly as a bytes object,
>   its code points will be displayed correctly by the Text 
>   widget as long as they are in the BMP (Basic Multilingual
>   Plane), as you already found out yourself.

Many thanks, you've helped me greatly in understanding what's happening. 
When I tried running my example code on a different system (Python 
2.7.18 on Linux, with Tcl/Tk 8.5), I got the error:

_tkinter.TclError: character U+1f40d is above the range (U+-U+) 
allowed by Tcl

So, as your reply suggests, the problem is ultimately a limitation of 
Tcl/Tk itself. Perhaps I should have spent more time studying the docs 
for that instead of puzzling over the details of character encodings in 
Python! I'm not sure why it doesn't give the same error on Windows, but 
at least now I know where the root of the issue is.

I am now much better informed about how to migrate the code I'm working 
on, so I am very grateful for your help.

Thanks,
Rayner
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Decoding bytes to text strings in Python 2

2024-06-23 Thread Rayner Lucas via Python-list

In article , 
ros...@gmail.com says...
> 
> If you switch to a Linux system, it should work correctly, and you'll
> be able to migrate the rest of the way onto Python 3. Once you achieve
> that, you'll be able to operate on Windows or Linux equivalently,
> since Python 3 solved this problem. At least, I *think* it will; my
> current system has a Python 2 installed, but doesn't have tkinter
> (because I never bothered to install it), and it's no longer available
> from the upstream Debian repos, so I only tested it in the console.
> But the decoding certainly worked.

Thank you for the idea of trying it on a Linux system. I did so, and my 
example code generated the error:

_tkinter.TclError: character U+1f40d is above the range (U+-U+) 
allowed by Tcl

So it looks like the problem is ultimately due to a limitation of 
Tcl/Tk. I'm still not sure why it doesn't give an error on Windows and 
instead either works (when UTF-8 encoding is specified) or converts the 
out-of-range characters to ones it can display (when the encoding isn't 
specified). But now I know what the root of the problem is, I can deal 
with it appropriately (and my curiosity is at least partly satisfied).

This has given me a much better understanding of what I need to do in 
order to migrate to Python 3 and add proper support for non-ASCII 
characters, so I'm very grateful for your help!

Thanks,
Rayner
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Decoding bytes to text strings in Python 2

2024-06-21 Thread Chris Angelico via Python-list

On Sat, 22 Jun 2024 at 03:28, Rayner Lucas via Python-list
 wrote:
> I'm curious about something I've encountered while updating a very old
> Tk app (originally written in Python 1, but I've ported it to Python 2
> as a first step towards getting it running on modern systems).
>
> I am using Python 2.7.18 on a Windows 10 system. If there's any other
> relevant information I should provide please let me know.

Unfortunately, you're running into one of the most annoying problems
from Python 2 and Windows: "narrow builds". You don't actually have
proper Unicode support. You have a broken implementation that works
for UCS-2 but doesn't actually support astral characters.

If you switch to a Linux system, it should work correctly, and you'll
be able to migrate the rest of the way onto Python 3. Once you achieve
that, you'll be able to operate on Windows or Linux equivalently,
since Python 3 solved this problem. At least, I *think* it will; my
current system has a Python 2 installed, but doesn't have tkinter
(because I never bothered to install it), and it's no longer available
from the upstream Debian repos, so I only tested it in the console.
But the decoding certainly worked.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Decoding bytes to text strings in Python 2

2024-06-21 Thread Rayner Lucas via Python-list



I'm curious about something I've encountered while updating a very old 
Tk app (originally written in Python 1, but I've ported it to Python 2 
as a first step towards getting it running on modern systems). The app 
downloads emails from a POP server and displays them. At the moment, the 
code is completely unaware of character encodings (which is something I 
plan to fix), and I have found that I don't understand what Python is 
doing when no character encoding is specified.

To demonstrate, I have written this short example program that displays 
a variety of UTF-8 characters to check whether they are decoded 
properly:

 Example Code 
import Tkinter as tk

window = tk.Tk()

mytext = """
  \xc3\xa9 LATIN SMALL LETTER E WITH ACUTE
  \xc5\x99 LATIN SMALL LETTER R WITH CARON
  \xc4\xb1 LATIN SMALL LETTER DOTLESS I
  \xef\xac\x84 LATIN SMALL LIGATURE FFL
  \xe2\x84\x9a DOUBLE-STRUCK CAPITAL Q
  \xc2\xbd VULGAR FRACTION ONE HALF
  \xe2\x82\xac EURO SIGN
  \xc2\xa5 YEN SIGN
  \xd0\x96 CYRILLIC CAPITAL LETTER ZHE
  \xea\xb8\x80 HANGUL SYLLABLE GEUL
  \xe0\xa4\x93 DEVANAGARI LETTER O
  \xe5\xad\x97 CJK UNIFIED IDEOGRAPH-5B57
  \xe2\x99\xa9 QUARTER NOTE
  \xf0\x9f\x90\x8d SNAKE
  \xf0\x9f\x92\x96 SPARKLING HEART
"""

mytext = mytext.decode(encoding="utf-8")
greeting = tk.Label(text=mytext)
greeting.pack()

window.mainloop()
 End Example Code 

This works exactly as expected, with all the characters displaying 
correctly.

However, if I comment out the line 'mytext = mytext.decode
(encoding="utf-8")', the program still displays *almost* everything 
correctly. All of the characters appear correctly apart from the two 
four-byte emoji characters at the end, which instead display as four 
characters. For example, the "SNAKE" character actually displays as:
U+00F0 LATIN SMALL LETTER ETH
U+FF9F HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK
U+FF90 HALFWIDTH KATAKANA LETTER MI
U+FF8D HALFWIDTH KATAKANA LETTER HE

What's Python 2 doing here? sys.getdefaultencoding() returns 'ascii', 
but it's clearly not attempting to display the bytes as ASCII (or 
cp1252, or ISO-8859-1). How is it deciding on some sort of almost-but-
not-quite UTF-8 decoding?

I am using Python 2.7.18 on a Windows 10 system. If there's any other 
relevant information I should provide please let me know.

Many thanks,
Rayner
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Lprint = ( Lisp-style printing ( of lists and strings (etc.) ) in Python )

2024-06-01 Thread Peter J. Holzer via Python-list

On 2024-05-30 21:47:14 -0700, HenHanna via Python-list wrote:
> [('the', 36225), ('and', 17551), ('of', 16759), ('i', 16696), ('a', 15816),
> ('to', 15722), ('that', 11252), ('in', 10743), ('it', 10687)]
> 
> ((the 36225) (and 17551) (of 16759) (i 16696) (a 15816) (to 15722) (that
> 11252) (in 10743) (it 10687))
> 
> 
> i think the latter is easier-to-read, so i use this code
>(by Peter Norvig)

This doesn't work well if your strings contain spaces:

Lprint(
[
["Just", "three", "words"],
["Just", "three words"],
["Just three", "words"],
["Just three words"],
]
)

prints:

((Just three words) (Just three words) (Just three words) (Just three words))

Output is often a compromise between readability and precision.


> def lispstr(exp):
># "Convert a Python object back into a Lisp-readable string."
> if isinstance(exp, list):

This won't work for your example, since you have a list of tuples, not a
list of lists and a tuple is not an instance of a list.

> return '(' + ' '.join(map(lispstr, exp)) + ')'
> else:
> return str(exp)
> 
> def Lprint(x): print(lispstr(x))

I like to use pprint, but it's lacking support for user-defined types. I
should be able to add a method (maybe __pprint__?) to my classes which
handle proper formatting (with line breaks and indentation).

hp
-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Lprint = ( Lisp-style printing ( of lists and strings (etc.) ) in Python )

2024-05-31 Thread HenHanna via Python-list




 ;;;  Pls tell me about little tricks you use in Python or Lisp.


[('the', 36225), ('and', 17551), ('of', 16759), ('i', 16696), ('a', 
15816), ('to', 15722), ('that', 11252), ('in', 10743), ('it', 10687)]


((the 36225) (and 17551) (of 16759) (i 16696) (a 15816) (to 15722) (that 
11252) (in 10743) (it 10687))



i think the latter is easier-to-read, so i use this code
   (by Peter Norvig)

def lispstr(exp):
   # "Convert a Python object back into a Lisp-readable string."
if isinstance(exp, list):
return '(' + ' '.join(map(lispstr, exp)) + ')'
else:
return str(exp)

def Lprint(x): print(lispstr(x))
--
https://mail.python.org/mailman/listinfo/python-list

Re: Match statement with literal strings

2023-06-07 Thread Jason Friedman via Python-list

>
> The bytecode compiler doesn't know that you intend RANGE
> to be a constant -- it thinks it's a variable to bind a
> value to.
>
> To make this work you need to find a way to refer to the
> value that isn't just a bare name. One way would be to
> define your constants using an enum:
>
> class Options(Enum):
> RANGE = "RANGE"
> MANDATORY = "MANDATORY"
>
> match stuff:
> case Options.RANGE:
>...
> case Options.MANDATORY:
>...
>

Got it, thank you.

On Wed, Jun 7, 2023 at 6:01 PM Greg Ewing via Python-list <
python-list@python.org> wrote:

> On 8/06/23 10:18 am, Jason Friedman wrote:
> > SyntaxError: name capture 'RANGE' makes remaining patterns unreachable
>
> The bytecode compiler doesn't know that you intend RANGE
> to be a constant -- it thinks it's a variable to bind a
> value to.
>
> To make this work you need to find a way to refer to the
> value that isn't just a bare name. One way would be to
> define your constants using an enum:
>
> class Options(Enum):
> RANGE = "RANGE"
> MANDATORY = "MANDATORY"
>
> match stuff:
> case Options.RANGE:
>...
> case Options.MANDATORY:
>...
>
> --
> Greg
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Match statement with literal strings

2023-06-07 Thread Chris Angelico via Python-list

On Thu, 8 Jun 2023 at 08:19, Jason Friedman via Python-list
 wrote:
>
> This gives the expected results:
>
> with open(data_file, newline="") as reader:
> csvreader = csv.DictReader(reader)
> for row in csvreader:
> #print(row)
> match row[RULE_TYPE]:
> case "RANGE":
> print("range")
> case "MANDATORY":
> print("mandatory")
> case _:
> print("nothing to do")
>
> This:
>
> RANGE = "RANGE"
> MANDATORY = "MANDATORY"
> with open(data_file, newline="") as reader:
> csvreader = csv.DictReader(reader)
> for row in csvreader:
> #print(row)
> match row[RULE_TYPE]:
> case RANGE:
> print("range")
> case MANDATORY:
> print("mandatory")
> case _:
> print("nothing to do")
>
> Gives (and I don't understand why):
>
> SyntaxError: name capture 'RANGE' makes remaining patterns unreachable

It's being as clear as it can. When you say "case RANGE:", that is not
a literal, that is a name capture. Check the docs and examples for
case statements for more details.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Match statement with literal strings

2023-06-07 Thread Greg Ewing via Python-list


On 8/06/23 10:18 am, Jason Friedman wrote:

SyntaxError: name capture 'RANGE' makes remaining patterns unreachable


The bytecode compiler doesn't know that you intend RANGE
to be a constant -- it thinks it's a variable to bind a
value to.

To make this work you need to find a way to refer to the
value that isn't just a bare name. One way would be to
define your constants using an enum:

class Options(Enum):
   RANGE = "RANGE"
   MANDATORY = "MANDATORY"

match stuff:
   case Options.RANGE:
  ...
   case Options.MANDATORY:
  ...

--
Greg


--
https://mail.python.org/mailman/listinfo/python-list

Match statement with literal strings

2023-06-07 Thread Jason Friedman via Python-list

This gives the expected results:

with open(data_file, newline="") as reader:
csvreader = csv.DictReader(reader)
for row in csvreader:
#print(row)
match row[RULE_TYPE]:
case "RANGE":
print("range")
case "MANDATORY":
print("mandatory")
case _:
print("nothing to do")

This:

RANGE = "RANGE"
MANDATORY = "MANDATORY"
with open(data_file, newline="") as reader:
csvreader = csv.DictReader(reader)
for row in csvreader:
#print(row)
match row[RULE_TYPE]:
case RANGE:
print("range")
case MANDATORY:
print("mandatory")
case _:
print("nothing to do")

Gives (and I don't understand why):

SyntaxError: name capture 'RANGE' makes remaining patterns unreachable
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-03-02 Thread Grant Edwards

On 2023-03-02, Peter J. Holzer  wrote:


> [1] Personally I'd say you shouldn't use Outlook if you are reading
> mails where line breaks (or other formatting) is important, but ...

I'd shorten that to

   "You shouldn't use Outlook if mail is important."

-- 
https://mail.python.org/mailman/listinfo/python-list

RE: How to escape strings for re.finditer?

2023-03-02 Thread avi.e.gross

Thanks, Peter. Excellent advice, even if only for any of us using Microsoft
Outlook as our mailer. I made the changes and we will see but they should
mainly impact what I see. I did tweak another parameter.

The problem for me was finding where they hid the options menu I needed.
Then, I started translating the menus back into German until I realized I
was being silly! Good practice though. LOL!

The truth is I generally can handle receiving mangled code as most of the
time I can re-edit it into shape, or am just reading it and not
copying/pasting.

What concerns me is to be able to send out the pure text content many seem
to need in a way that does not introduce the anomalies people see. Something
like a least-common denominator.

Or. I could switch mailers. But my guess is reading/responding from the
native gmail editor may also need options changes and yet still impact some
readers.

-Original Message-
From: Python-list  On
Behalf Of Peter J. Holzer
Sent: Thursday, March 2, 2023 3:09 PM
To: python-list@python.org
Subject: Re: How to escape strings for re.finditer?

On 2023-03-01 01:01:42 +0100, Peter J. Holzer wrote:
> On 2023-02-28 15:25:05 -0500, avi.e.gr...@gmail.com wrote:
> > I had no doubt the code you ran was indented properly or it would not
work.
> > 
> > I am merely letting you know that somewhere in the process of 
> > copying the code or the transition between mailers, my version is messed
up.
> 
> The problem seems to be at your end. Jen's code looks ok here.
[...]
> I have no idea why it would join only some lines but not others.

Actually I do have an idea now, since I noticed something similar at work
today: Outlook has an option "remove additional line breaks from text-only
messages" (translated from German) in the the "Email / Message Format"
section. You want to make sure this is off if you are reading mails where
line breaks might be important[1].

hp

[1] Personally I'd say you shouldn't use Outlook if you are reading mails
where line breaks (or other formatting) is important, but ...

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-03-02 Thread Peter J. Holzer

On 2023-03-01 01:01:42 +0100, Peter J. Holzer wrote:
> On 2023-02-28 15:25:05 -0500, avi.e.gr...@gmail.com wrote:
> > I had no doubt the code you ran was indented properly or it would not work.
> > 
> > I am merely letting you know that somewhere in the process of copying
> > the code or the transition between mailers, my version is messed up.
> 
> The problem seems to be at your end. Jen's code looks ok here.
[...]
> I have no idea why it would join only some lines but not others.

Actually I do have an idea now, since I noticed something similar at
work today: Outlook has an option "remove additional line breaks from
text-only messages" (translated from German) in the the "Email / Message
Format" section. You want to make sure this is off if you are reading
mails where line breaks might be important[1].

hp

[1] Personally I'd say you shouldn't use Outlook if you are reading
mails where line breaks (or other formatting) is important, but ...

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-03-01 Thread Thomas Passin


On 3/1/2023 12:04 PM, Grant Edwards wrote:

On 2023-02-28, Cameron Simpson  wrote:


Regexps are:
- cryptic and error prone (you can make them more readable, but the
notation is deliberately both terse and powerful, which means that
small changes can have large effects in behaviour); the "error prone"
part does not mean that a regexp is unreliable, but that writing one
which is _correct_ for your task can be difficult,


The nasty thing is that writing one that _appears_ to be correct for
your task is often fairly easy. It will work as you expect for the
test cases you throw at it, but then fail in confusing ways when
released into the "real world". If you're lucky, it fails frequently
and obviously enough that you notice it right away. If you're not
lucky, it will fail infrequently and subtly for many years to come.

My rule: never use an RE if you can use the normal string methods
(even if it takes a a few lines of code using them to replace a single
RE).


A corollary is that once you get a working regex, don't mess with it if 
you do not absolutely have to.


--
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-03-01 Thread Grant Edwards

On 2023-02-28, Cameron Simpson  wrote:

> Regexps are:
> - cryptic and error prone (you can make them more readable, but the 
>notation is deliberately both terse and powerful, which means that 
>small changes can have large effects in behaviour); the "error prone" 
>part does not mean that a regexp is unreliable, but that writing one 
>which is _correct_ for your task can be difficult,

The nasty thing is that writing one that _appears_ to be correct for
your task is often fairly easy. It will work as you expect for the
test cases you throw at it, but then fail in confusing ways when
released into the "real world". If you're lucky, it fails frequently
and obviously enough that you notice it right away. If you're not
lucky, it will fail infrequently and subtly for many years to come.

My rule: never use an RE if you can use the normal string methods
(even if it takes a a few lines of code using them to replace a single
RE).

--
Grant
-- 
https://mail.python.org/mailman/listinfo/python-list

RE: How to escape strings for re.finditer?

2023-02-28 Thread avi.e.gross

Peter,

Nobody here would appreciate it if I tested it by sending out multiple
copies of each email to see if the same message wraps differently.

I am using a fairly standard mailer in Outlook that interfaces with gmail
and I could try mailing directly from gmail but apparently there are
systemic problems and I experience other complaints when sending directly
from AOL mail too. 

So, if some people don't read me, I can live with that. I mean the right
people, LOL!

Or did I get that wrong?

I do appreciate the feedback. Ironically, when I politely shared how someone
else's email was displaying on my screen, it seems I am equally causing
similar issues for others.

An interesting question is whether any of us reading the archived copies see
different things including with various browsers:

https://mail.python.org/pipermail/python-list/

I am not sure which letters from me had the anomalies you mention but
spot-checking a few of them showed a normal display when I use Chrome.

But none of this is really a python issue except insofar as you never know
what functionality in the network was written for in python.

-Original Message-
From: Python-list  On
Behalf Of Peter J. Holzer
Sent: Tuesday, February 28, 2023 7:26 PM
To: python-list@python.org
Subject: Re: How to escape strings for re.finditer?

On 2023-03-01 01:01:42 +0100, Peter J. Holzer wrote:
> On 2023-02-28 15:25:05 -0500, avi.e.gr...@gmail.com wrote:
> > It happens to be easy for me to fix but I sometimes see garbled code 
> > I then simply ignore.
> 
> Truth to be told, that's one reason why I rarely read your mails to 
> the end. The long lines and the triple-spaced paragraphs make it just 
> too uncomfortable.

Hmm, since I was now paying a bit more attention to formatting problems I
saw that only about half of your messages have those long lines although all
seem to be sent with the same mailer. Don't know what's going on there.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-28 Thread Weatherby,Gerard

Regex is fine if it works for you. The critiques – “difficult to read” –are 
subjective. Unless the code is in a section that has been profiled to be a 
bottleneck, I don’t sweat performance at this level.

For me, using code that has already been written and vetted is the preferred 
approach to writing new code I have to test and maintain. I use an online regex 
tester, https://pythex.org, to get the syntax write before copying pasting it 
into my code.

From: Python-list  on 
behalf of Jen Kris via Python-list 
Date: Tuesday, February 28, 2023 at 1:11 PM
To: Thomas Passin 
Cc: python-list@python.org 
Subject: Re: How to escape strings for re.finditer?
*** Attention: This is an external email. Use caution responding, opening 
attachments or clicking on links. ***

Using str.startswith is a cool idea in this case.  But is it better than regex 
for performance or reliability?  Regex syntax is not a model of simplicity, but 
in my simple case it's not too difficult.


-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-28 Thread Peter J. Holzer

On 2023-03-01 01:01:42 +0100, Peter J. Holzer wrote:
> On 2023-02-28 15:25:05 -0500, avi.e.gr...@gmail.com wrote:
> > It happens to be easy for me to fix but I sometimes see garbled code I
> > then simply ignore.
> 
> Truth to be told, that's one reason why I rarely read your mails to the
> end. The long lines and the triple-spaced paragraphs make it just too
> uncomfortable.

Hmm, since I was now paying a bit more attention to formatting problems
I saw that only about half of your messages have those long lines
although all seem to be sent with the same mailer. Don't know what's
going on there.

hp


-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-28 Thread Peter J. Holzer

On 2023-02-28 15:25:05 -0500, avi.e.gr...@gmail.com wrote:
> Jen,
> 
>  
> 
> I had no doubt the code you ran was indented properly or it would not work.
> 
>  
> 
> I am merely letting you know that somewhere in the process of copying
> the code or the transition between mailers, my version is messed up.

The problem seems to be at your end. Jen's code looks ok here.

The content type is text/plain, no format=flowed or anything which would
affect the interpretation of line endings. However, after
base64-decoding it only contains unix-style LF line endings, not CRLF
line endings. That might throw your mailer off, but I have no idea why
it would join only some lines but not others.

> It happens to be easy for me to fix but I sometimes see garbled code I
> then simply ignore.

Truth to be told, that's one reason why I rarely read your mails to the
end. The long lines and the triple-spaced paragraphs make it just too
uncomfortable.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-28 Thread Cameron Simpson


On 28Feb2023 18:57, Jen Kris  wrote:
One question:  several people have made suggestions other than regex 
(not your terser example with regex you shown below).  Is there a 
reason why regex is not preferred to, for example, a list comp?  


These are different things; I'm not sure a comparison is meaningful.


Performance?  Reliability? 


Regexps are:
- cryptic and error prone (you can make them more readable, but the 
  notation is deliberately both terse and powerful, which means that 
  small changes can have large effects in behaviour); the "error prone" 
  part does not mean that a regexp is unreliable, but that writing one 
  which is _correct_ for your task can be difficult, and also difficult 
  to debug

- have a compile step, which slows things down
- can be slower to execute as well, as a regexp does a bunch of 
  housekeeping for you


The more complex the tool the more... indirection between your solution 
using that tool and the smallest thing which needs to be done, and often 
the slower the solution. This isn't absolute;  there are times for the 
complex tool.


Common opinion here is often that if you're doing simple fixed-string 
things such as your task, which was finding instances of a fixed string, 
just use the existing str methods. You'll end up writing what you need 
directly and overtly.


I've a personal maxim that one should use the "smallest" tool which 
succinctly solves the problem. I usually use it to choose a programming 
language (eg sed vs awk vs shell vs python in loose order of problem 
difficulty), but it applies also to choosing tools within a language.


Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-28 Thread Thomas Passin


On 2/28/2023 2:40 PM, David Raymond wrote:

With a slight tweak to the simple loop code using .find() it becomes a third 
faster than the RE version though.


def using_simple_loop2(key, text):
 matches = []
 keyLen = len(key)
 start = 0
 while (foundSpot := text.find(key, start)) > -1:
 start = foundSpot + keyLen
 matches.append((foundSpot, start))
 return matches


using_simple_loop: [0.1732664997689426, 0.1601669997908175, 
0.15792609984055161, 0.157397349591, 0.15759290009737015]
using_re_finditer: [0.003412699792534113, 0.0032823001965880394, 
0.0033694999292492867, 0.003354900050908327, 0.006998894810677]
using_simple_loop2: [0.00256159994751215, 0.0025471001863479614, 
0.0025424999184906483, 0.0025831996463239193, 0.002999018251896]


On my system the difference is way bigger than that:

KEY = '''it doesn't matter, but in other cases it will.'''

using_simple_loop2: [0.000495502449548, 0.0004844000213779509, 
0.0004862999776378274, 0.0004800999886356294, 0.0004792999825440347]


using_re_finditer: [0.002840900036972016, 0.002833350251794, 
0.002701299963518977, 0.0028105000383220613, 0.0029977999511174858]


Shorter keys show the least differential:

KEY = 'in'

using_simple_loop2: [0.001983499969355762, 0.0019614999764598906, 
0.0019617999787442386, 0.002027600014116615, 0.0020669000223279]


using_re_finditer: [0.002787900040857494, 0.0027620999608188868, 
0.0027723999810405076, 0.002776700013782829, 0.002946800028439611]


Brilliant!

Python 3.10.9
Windows 10 AMD64 (build 10.0.19044) SP0

--
https://mail.python.org/mailman/listinfo/python-list

RE: How to escape strings for re.finditer?

2023-02-28 Thread avi.e.gross

David,

Your results suggest we need to be reminded that lots depends on other
factors. There are multiple versions/implementations of python out there
including some written in C but also other underpinnings. Each can often
have sections of pure python code replaced carefully with libraries of
compiled code, or not. So your results will vary.

Just as an example, assume you derive a type of your own as a subclass of
str and you over-ride the find method by writing it in pure python using
loops and maybe add a few bells and whistles. If you used your improved
algorithm using this variant of str, might it not be quite a bit slower?
Imagine how much slower if your improvement also implemented caching and
logging and the option of ignoring case which are not really needed here.

This type of thing can happen in many other scenarios and some module may be
shared that is slow and a while later is updated but not everyone installs
the update so performance stats can vary wildly. 

Some people advocate using some functional programming tactics, in various
languages, partially because the more general loops are SLOW. But that is
largely because some of the functional stuff is a compiled function that
hides the loops inside a faster environment than the interpreter.

-Original Message-
From: Python-list  On
Behalf Of David Raymond
Sent: Tuesday, February 28, 2023 2:40 PM
To: python-list@python.org
Subject: RE: How to escape strings for re.finditer?

> I wrote my previous message before reading this.  Thank you for the test
you ran -- it answers the question of performance.  You show that
re.finditer is 30x faster, so that certainly recommends that over a simple
loop, which introduces looping overhead.  

>>      def using_simple_loop(key, text):
>>      matches = []
>>      for i in range(len(text)):
>>      if text[i:].startswith(key):
>>      matches.append((i, i + len(key)))
>>      return matches
>>
>>      using_simple_loop: [0.1395295020792, 0.1306313000456,
0.1280345001249, 0.1318618002423, 0.1308461032626]
>>      using_re_finditer: [0.00386140005233, 0.00406190124297,
0.00347899970256, 0.00341310216218, 0.003732001273]


With a slight tweak to the simple loop code using .find() it becomes a third
faster than the RE version though.


def using_simple_loop2(key, text):
matches = []
keyLen = len(key)
start = 0
while (foundSpot := text.find(key, start)) > -1:
start = foundSpot + keyLen
matches.append((foundSpot, start))
return matches


using_simple_loop: [0.1732664997689426, 0.1601669997908175,
0.15792609984055161, 0.157397349591, 0.15759290009737015]
using_re_finditer: [0.003412699792534113, 0.0032823001965880394,
0.0033694999292492867, 0.003354900050908327, 0.006998894810677]
using_simple_loop2: [0.00256159994751215, 0.0025471001863479614,
0.0025424999184906483, 0.0025831996463239193, 0.002999018251896]
-- 
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

RE: How to escape strings for re.finditer?

2023-02-28 Thread avi.e.gross

This message is more for Thomas than Jen,

You made me think of what happens in fairly large cases. What happens if I ask 
you to search a thousand pages looking for your name? 

One solution might be to break the problem into parts that can be run in 
independent threads or processes and perhaps across different CPU's or on many 
machines at once. Think of it as a variant on a merge sort where each chunk 
returns where it found one or more items and then those are gathered together 
and merged upstream.

The problem is you cannot just randomly divide the text.  Any matches across a 
divide are lost. So if you know you are searching for "Thomas Passin" you need 
an overlap big enough to hold enough of that size. It would not be made as 
something like a pure binary tree and if the choices made included variant 
sizes in what might match, you would get duplicates. So the merging part would 
obviously have to eventually remove those.

I have often wondered how Google and other such services are able to find 
millions of things in hardly any time and arguably never show most of them as 
who looks past a few pages/screens?

I think much of that may involve other techniques including quite a bit of 
pre-indexing. But they also seem to enlist lots of processors that each do the 
search on a subset of the problem space and combine and prioritize.

-Original Message-
From: Python-list  On 
Behalf Of Thomas Passin
Sent: Tuesday, February 28, 2023 1:31 PM
To: python-list@python.org
Subject: Re: How to escape strings for re.finditer?

On 2/28/2023 1:07 PM, Jen Kris wrote:
> 
> Using str.startswith is a cool idea in this case.  But is it better 
> than regex for performance or reliability?  Regex syntax is not a 
> model of simplicity, but in my simple case it's not too difficult.

The trouble is that we don't know what your case really is.  If you are talking 
about a short pattern like your example and a small text to search, and you 
don't need to do it too often, then my little code example is probably ideal. 
Reliability wouldn't be an issue, and performance would not be relevant.  If 
your case is going to be much larger, called many times in a loop, or be much 
more complicated in some other way, then a regex or some other approach is 
likely to be much faster.

> Feb 27, 2023, 18:52 by li...@tompassin.net:
> 
> On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote:
> 
> And, just for fun, since there is nothing wrong with your code,
> this minor change is terser:
> 
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> for match in re.finditer(re.escape('abc_degree + 1')
> , example):
> 
> ... print(match.start(), match.end())
> ...
> ...
> 4 18
> 26 40
> 
> 
> Just for more fun :) -
> 
> Without knowing how general your expressions will be, I think the
> following version is very readable, certainly more readable than
> regexes:
> 
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> KEY = 'abc_degree + 1'
> 
> for i in range(len(example)):
> if example[i:].startswith(KEY):
> print(i, i + len(KEY))
> # prints:
> 4 18
> 26 40
> 
> If you may have variable numbers of spaces around the symbols, OTOH,
> the whole situation changes and then regexes would almost certainly
> be the best approach. But the regular expression strings would
> become harder to read.
> -- 
> https://mail.python.org/mailman/listinfo/python-list
> 
> 

--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

RE: How to escape strings for re.finditer?

2023-02-28 Thread avi.e.gross

Jen,

 

I had no doubt the code you ran was indented properly or it would not work.

 

I am merely letting you know that somewhere in the process of copying the code 
or the transition between mailers, my version is messed up. It happens to be 
easy for me to fix but I sometimes see garbled code I then simply ignore.

 

At times what may help is to leave blank lines that python ignores but also 
keeps the line rearrangements minimal.

 

On to your real question.

 

In my OPINION, there are many interesting questions that can get in the way of 
just getting a working solution. Some may be better in some abstract way but 
except for big projects it often hardly matters.

 

So regex is one thing or more a cluster of things and a list comp is something 
completely different. They are both tools you can use and abuse or lose.

 

The distinction I believe we started with was how to find a fixed string inside 
another fixed string in as many places as needed and perhaps return offset 
info. So this can be solved in too many ways using a side of python focused on 
pure text. As discussed, solutions can include explicit loops such as “for” and 
“while” and their syntactic sugar cousin of a list comp. Not mentioned yet are 
other techniques like a recursive function that finds the first and passes on 
the rest of the string to itself to find the rest, or various functional 
programming techniques that may do sort of hidden loops. YOU DO NOT NEED ALL OF 
THEM but it can be interesting to learn.

 

Regex is a completely different universe that is a bit more of MORE. If I ask 
you for a ride to the grocery store, I might expect you to show up with a car 
and not a James Bond vehicle that also is a boat, submarine, airplane, and 
maybe spaceship. Well, Regex is the latter. And in your case, it is this 
complexity that meant you had to convert your text so it will not see what it 
considers commands or hints.

 

In normal use, put a bit too simply, it wants a carefully crafted pattern to be 
spelled out and it weaves an often complex algorithm it then sort of compiles 
that represents the understanding of what you asked for. The simplest pattern 
is to match EXACTLY THIS. That is your case.

 

A more complex pattern may say to match Boston OR Chicago followed by any 
amount of whitespace then a number of digits between 3 and 5 and then should 
not be followed by something specific. Oh, and by the way, save selected parts 
in parentheses to be accessed as \1 or \2 so I can ask you to do things like 
match a word followed by itself. It goes on and on. 

 

Be warned RE is implemented now all over the place including outside the usual 
UNIX roots and there are somewhat different versions. For your need, it does 
not matter.

 

The compiled monstrosity though can be fairly fast and might be a tad hard for 
you to write by yourself as a bunch of if statements nested that are  weirdly 
matching various patterns with some look ahead or look behind. 

 

What you are being told is that despite this being way more than you asked for, 
it not only works but is fairly fast when doing the simple thing you asked for. 
That may be why a text version you are looking for is hard to find.

 

I am not clear what exactly the rest of your project is about but my guess is 
your first priority is completing it decently and not to try umpteen methods 
and compare them. Not today. Of course if the working version is slow and you 
profile it and find this part seems to be holding it back, it may be worth 
examining.

 

 

From: Jen Kris  
Sent: Tuesday, February 28, 2023 12:58 PM
To: avi.e.gr...@gmail.com
Cc: 'Python List' 
Subject: RE: How to escape strings for re.finditer?

 

The code I sent is correct, and it runs here.  Maybe you received it with a 
carriage return removed, but on my copy after posting, it is correct:

 

example = 'X - abc_degree + 1 + qq + abc_degree + 1'

 find_string = re.escape('abc_degree + 1')

 for match in re.finditer(find_string, example):

 print(match.start(), match.end())

 

One question:  several people have made suggestions other than regex (not your 
terser example with regex you shown below).  Is there a reason why regex is not 
preferred to, for example, a list comp?  Performance?  Reliability?  

 

 

 

  

 

 

Feb 27, 2023, 18:16 by avi.e.gr...@gmail.com <mailto:avi.e.gr...@gmail.com> :

Jen,

 

Can you see what SOME OF US see as ASCII text? We can help you better if we get 
code that can be copied and run as-is.

 

What you sent is not terse. It is wrong. It will not run on any python 
interpreter because you somehow lost a carriage return and indent.

 

This is what you sent:

 

example = 'X - abc_degree + 1 + qq + abc_degree + 1'

find_string = re.escape('abc_degree + 1') for match in re.finditer(find_string, 
example):

print(match.start(), match.end())

 

This is code indentedproperly:

 

example = 'X - abc_degree + 1 + qq + abc_degree + 1'

find_string = re.

RE: How to escape strings for re.finditer?

2023-02-28 Thread David Raymond

> I wrote my previous message before reading this.  Thank you for the test you 
> ran -- it answers the question of performance.  You show that re.finditer is 
> 30x faster, so that certainly recommends that over a simple loop, which 
> introduces looping overhead.  

>>      def using_simple_loop(key, text):
>>      matches = []
>>      for i in range(len(text)):
>>      if text[i:].startswith(key):
>>      matches.append((i, i + len(key)))
>>      return matches
>>
>>      using_simple_loop: [0.1395295020792, 0.1306313000456, 
>> 0.1280345001249, 0.1318618002423, 0.1308461032626]
>>      using_re_finditer: [0.00386140005233, 0.00406190124297, 
>> 0.00347899970256, 0.00341310216218, 0.003732001273]


With a slight tweak to the simple loop code using .find() it becomes a third 
faster than the RE version though.


def using_simple_loop2(key, text):
matches = []
keyLen = len(key)
start = 0
while (foundSpot := text.find(key, start)) > -1:
start = foundSpot + keyLen
matches.append((foundSpot, start))
return matches


using_simple_loop: [0.1732664997689426, 0.1601669997908175, 
0.15792609984055161, 0.157397349591, 0.15759290009737015]
using_re_finditer: [0.003412699792534113, 0.0032823001965880394, 
0.0033694999292492867, 0.003354900050908327, 0.006998894810677]
using_simple_loop2: [0.00256159994751215, 0.0025471001863479614, 
0.0025424999184906483, 0.0025831996463239193, 0.002999018251896]
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-28 Thread Thomas Passin


On 2/28/2023 11:48 AM, Jon Ribbens via Python-list wrote:

On 2023-02-28, Thomas Passin  wrote:

...


It is interesting, though, how pre-processing the search pattern can
improve search times if you can afford the pre-processing.  Here's a
paper on rapidly finding matches when there may be up to one misspelled
character.  It's easy enough to implement, though in Python you can't
take the additional step of tuning it to stay in cache.

https://Robert.Muth.Org/Papers/1996-Approx-Multi.Pdf


You've somehow title-cased that URL. The correct URL is:

https://robert.muth.org/Papers/1996-approx-multi.pdf


Thanks, not sure how that happened ...

--
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-28 Thread Thomas Passin


On 2/28/2023 12:57 PM, Jen Kris via Python-list wrote:

The code I sent is correct, and it runs here.  Maybe you received it with a 
carriage return removed, but on my copy after posting, it is correct:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
  find_string = re.escape('abc_degree + 1')
  for match in re.finditer(find_string, example):
  print(match.start(), match.end())

One question:  several people have made suggestions other than regex (not your 
terser example with regex you shown below).  Is there a reason why regex is not 
preferred to, for example, a list comp?  Performance?  Reliability?


"Some people, when confronted with a problem, think 'I know, I'll use 
regular expressions.' Now they have two problems."


- 
https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/


Of course, if you actually read the blog post in the link, there's more 
to it than that...




Feb 27, 2023, 18:16 by avi.e.gr...@gmail.com:


Jen,

Can you see what SOME OF US see as ASCII text? We can help you better if we get 
code that can be copied and run as-is.

  What you sent is not terse. It is wrong. It will not run on any python 
interpreter because you somehow lost a carriage return and indent.

This is what you sent:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
find_string = re.escape('abc_degree + 1') for match in re.finditer(find_string, 
example):
  print(match.start(), match.end())

This is code indentedproperly:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
find_string = re.escape('abc_degree + 1')
for match in re.finditer(find_string, example):
  print(match.start(), match.end())

Of course I am sure you wrote and ran code more like the latter version but 
somewhere in your copy/paste process, 

And, just for fun, since there is nothing wrong with your code, this minor 
change is terser:


example = 'X - abc_degree + 1 + qq + abc_degree + 1'
for match in re.finditer(re.escape('abc_degree + 1') , example):


... print(match.start(), match.end())
...
...
4 18
26 40

But note once you use regular expressions, and not in your case, you might match multiple things 
that are far from the same such as matching two repeated words of any kind in any case including 
"and and" and "so so" or finding words that have multiple doubled letter as in 
the  stereotypical bookkeeper. In those cases, you may want even more than offsets but also show 
the exact text that matched or even show some characters before and/or after for context.


-Original Message-
From: Python-list  On 
Behalf Of Jen Kris via Python-list
Sent: Monday, February 27, 2023 8:36 PM
To: Cameron Simpson 
Cc: Python List 
Subject: Re: How to escape strings for re.finditer?


I haven't tested it either but it looks like it would work.  But for this case 
I prefer the relative simplicity of:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
find_string = re.escape('abc_degree + 1') for match in re.finditer(find_string, 
example):
  print(match.start(), match.end())

4 18
26 40

I don't insist on terseness for its own sake, but it's cleaner this way.

Jen


Feb 27, 2023, 16:55 by c...@cskk.id.au:


On 28Feb2023 01:13, Jen Kris  wrote:


I went to the re module because the specified string may appear more than once 
in the string (in the code I'm writing).



Sure, but writing a `finditer` for plain `str` is pretty easy (untested):

  pos = 0
  while True:
  found = s.find(substring, pos)
  if found < 0:
  break
  start = found
  end = found + len(substring)
  ... do whatever with start and end ...
  pos = end

Many people go straight to the `re` module whenever they're looking for 
strings. It is often cryptic error prone overkill. Just something to keep in 
mind.

Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list



--
https://mail.python.org/mailman/listinfo/python-list





--
https://mail.python.org/mailman/listinfo/python-list

RE: How to escape strings for re.finditer?

2023-02-28 Thread avi.e.gross

Roel,

You make some good points. One to consider is that when you ask a regular 
expression matcher to search using something that uses NO regular expression 
features, much of the complexity disappears and what it creates is probably 
similar enough to what you get with a string search except that loops and all 
are written as something using fast functions probably written in C. 

That is one reason the roll your own versions have a disadvantage unless you 
roll your own in a similar way by writing a similar C function.

Nobody has shown us what really should be out there of a simple but fast text 
search algorithm that does a similar job and it may still be out there, but as 
you point out, perhaps it is not needed as long as people just use the re 
version.

Avi

-Original Message-
From: Python-list  On 
Behalf Of Roel Schroeven
Sent: Tuesday, February 28, 2023 4:33 AM
To: python-list@python.org
Subject: Re: How to escape strings for re.finditer?

Op 28/02/2023 om 3:44 schreef Thomas Passin:
> On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote:
>> And, just for fun, since there is nothing wrong with your code, this 
>> minor change is terser:
>>
>>>>> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
>>>>> for match in re.finditer(re.escape('abc_degree + 1') , example):
>> ... print(match.start(), match.end()) ...
>> ...
>> 4 18
>> 26 40
>
> Just for more fun :) -
>
> Without knowing how general your expressions will be, I think the 
> following version is very readable, certainly more readable than regexes:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> KEY = 'abc_degree + 1'
>
> for i in range(len(example)):
> if example[i:].startswith(KEY):
> print(i, i + len(KEY))
> # prints:
> 4 18
> 26 40
I think it's often a good idea to use a standard library function instead of 
rolling your own. The issue becomes less clear-cut when the standard library 
doesn't do exactly what you need (as here, where
re.finditer() uses regular expressions while the use case only uses simple 
search strings). Ideally there would be a str.finditer() method we could use, 
but in the absence of that I think we still need to consider using the 
almost-but-not-quite fitting re.finditer().

Two reasons:

(1) I think it's clearer: the name tells us what it does (though of course we 
could solve this in a hand-written version by wrapping it in a suitably named 
function).

(2) Searching for a string in another string, in a performant way, is not as 
simple as it first appears. Your version works correctly, but slowly. In some 
situations it doesn't matter, but in other cases it will. For better 
performance, string searching algorithms jump ahead either when they found a 
match or when they know for sure there isn't a match for some time (see e.g. 
the Boyer–Moore string-search algorithm). 
You could write such a more efficient algorithm, but then it becomes more 
complex and more error-prone. Using a well-tested existing function becomes 
quite attractive.

To illustrate the difference performance, I did a simple test (using the 
paragraph above is test text):

 import re
 import timeit

 def using_re_finditer(key, text):
 matches = []
 for match in re.finditer(re.escape(key), text):
 matches.append((match.start(), match.end()))
 return matches

 def using_simple_loop(key, text):
 matches = []
 for i in range(len(text)):
 if text[i:].startswith(key):
 matches.append((i, i + len(key)))
 return matches

 CORPUS = """Searching for a string in another string, in a performant way, 
is
 not as simple as it first appears. Your version works correctly, but 
slowly.
 In some situations it doesn't matter, but in other cases it will. 
For better
 performance, string searching algorithms jump ahead either when they found 
a
 match or when they know for sure there isn't a match for some time (see 
e.g.
 the Boyer–Moore string-search algorithm). You could write such a more
 efficient algorithm, but then it becomes more complex and more error-prone.
 Using a well-tested existing function becomes quite attractive."""
 KEY = 'in'
 print('using_simple_loop:',
timeit.repeat(stmt='using_simple_loop(KEY, CORPUS)', globals=globals(),
number=1000))
 print('using_re_finditer:',
timeit.repeat(stmt='using_re_finditer(KEY, CORPUS)', globals=globals(),
number=1000))

This does 5 runs of 1000 repetitions each, and reports the time in seconds for 
each of those runs.
Result on my machine:

 using_simple_loop: [0.1395295020792, 0.1306313000456, 
0.1280345001249, 0.1318618002423, 0.1308461032626]
 using_re_finditer: [0.00386140005233, 0.00406190124297, 
0.00347899970256, 0.003413102

Re: How to escape strings for re.finditer?

2023-02-28 Thread Thomas Passin


On 2/28/2023 1:07 PM, Jen Kris wrote:


Using str.startswith is a cool idea in this case.  But is it better than 
regex for performance or reliability?  Regex syntax is not a model of 
simplicity, but in my simple case it's not too difficult.


The trouble is that we don't know what your case really is.  If you are 
talking about a short pattern like your example and a small text to 
search, and you don't need to do it too often, then my little code 
example is probably ideal. Reliability wouldn't be an issue, and 
performance would not be relevant.  If your case is going to be much 
larger, called many times in a loop, or be much more complicated in some 
other way, then a regex or some other approach is likely to be much faster.




Feb 27, 2023, 18:52 by li...@tompassin.net:

On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote:

And, just for fun, since there is nothing wrong with your code,
this minor change is terser:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
for match in re.finditer(re.escape('abc_degree + 1')
, example):

... print(match.start(), match.end())
...
...
4 18
26 40


Just for more fun :) -

Without knowing how general your expressions will be, I think the
following version is very readable, certainly more readable than
regexes:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
KEY = 'abc_degree + 1'

for i in range(len(example)):
if example[i:].startswith(KEY):
print(i, i + len(KEY))
# prints:
4 18
26 40

If you may have variable numbers of spaces around the symbols, OTOH,
the whole situation changes and then regexes would almost certainly
be the best approach. But the regular expression strings would
become harder to read.
-- 
https://mail.python.org/mailman/listinfo/python-list





--
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-28 Thread Jon Ribbens via Python-list

On 2023-02-28, Thomas Passin  wrote:
> On 2/28/2023 10:05 AM, Roel Schroeven wrote:
>> Op 28/02/2023 om 14:35 schreef Thomas Passin:
>>> On 2/28/2023 4:33 AM, Roel Schroeven wrote:
>>>> [...]
>>>> (2) Searching for a string in another string, in a performant way, is 
>>>> not as simple as it first appears. Your version works correctly, but 
>>>> slowly. In some situations it doesn't matter, but in other cases it 
>>>> will. For better performance, string searching algorithms jump ahead 
>>>> either when they found a match or when they know for sure there isn't 
>>>> a match for some time (see e.g. the Boyer–Moore string-search 
>>>> algorithm). You could write such a more efficient algorithm, but then 
>>>> it becomes more complex and more error-prone. Using a well-tested 
>>>> existing function becomes quite attractive.
>>>
>>> Sure, it all depends on what the real task will be.  That's why I 
>>> wrote "Without knowing how general your expressions will be". For the 
>>> example string, it's unlikely that speed will be a factor, but who 
>>> knows what target strings and keys will turn up in the future?
>> On hindsight I think it was overthinking things a bit. "It all depends 
>> on what the real task will be" you say, and indeed I think that should 
>> be the main conclusion here.
>
> It is interesting, though, how pre-processing the search pattern can 
> improve search times if you can afford the pre-processing.  Here's a 
> paper on rapidly finding matches when there may be up to one misspelled 
> character.  It's easy enough to implement, though in Python you can't 
> take the additional step of tuning it to stay in cache.
>
> https://Robert.Muth.Org/Papers/1996-Approx-Multi.Pdf

You've somehow title-cased that URL. The correct URL is:

https://robert.muth.org/Papers/1996-approx-multi.pdf
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-28 Thread Jen Kris via Python-list


I wrote my previous message before reading this.  Thank you for the test you 
ran -- it answers the question of performance.  You show that re.finditer is 
30x faster, so that certainly recommends that over a simple loop, which 
introduces looping overhead.  


Feb 28, 2023, 05:44 by li...@tompassin.net:

> On 2/28/2023 4:33 AM, Roel Schroeven wrote:
>
>> Op 28/02/2023 om 3:44 schreef Thomas Passin:
>>
>>> On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote:
>>>
>>>> And, just for fun, since there is nothing wrong with your code, this minor 
>>>> change is terser:
>>>>
>>>>>>> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
>>>>>>> for match in re.finditer(re.escape('abc_degree + 1') , example):
>>>>>>>
>>>> ... print(match.start(), match.end())
>>>> ...
>>>> ...
>>>> 4 18
>>>> 26 40
>>>>
>>>
>>> Just for more fun :) -
>>>
>>> Without knowing how general your expressions will be, I think the following 
>>> version is very readable, certainly more readable than regexes:
>>>
>>> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
>>> KEY = 'abc_degree + 1'
>>>
>>> for i in range(len(example)):
>>>     if example[i:].startswith(KEY):
>>>     print(i, i + len(KEY))
>>> # prints:
>>> 4 18
>>> 26 40
>>>
>> I think it's often a good idea to use a standard library function instead of 
>> rolling your own. The issue becomes less clear-cut when the standard library 
>> doesn't do exactly what you need (as here, where re.finditer() uses regular 
>> expressions while the use case only uses simple search strings). Ideally 
>> there would be a str.finditer() method we could use, but in the absence of 
>> that I think we still need to consider using the almost-but-not-quite 
>> fitting re.finditer().
>>
>> Two reasons:
>>
>> (1) I think it's clearer: the name tells us what it does (though of course 
>> we could solve this in a hand-written version by wrapping it in a suitably 
>> named function).
>>
>> (2) Searching for a string in another string, in a performant way, is not as 
>> simple as it first appears. Your version works correctly, but slowly. In 
>> some situations it doesn't matter, but in other cases it will. For better 
>> performance, string searching algorithms jump ahead either when they found a 
>> match or when they know for sure there isn't a match for some time (see e.g. 
>> the Boyer–Moore string-search algorithm). You could write such a more 
>> efficient algorithm, but then it becomes more complex and more error-prone. 
>> Using a well-tested existing function becomes quite attractive.
>>
>
> Sure, it all depends on what the real task will be.  That's why I wrote 
> "Without knowing how general your expressions will be". For the example 
> string, it's unlikely that speed will be a factor, but who knows what target 
> strings and keys will turn up in the future?
>
>> To illustrate the difference performance, I did a simple test (using the 
>> paragraph above is test text):
>>
>>      import re
>>      import timeit
>>
>>      def using_re_finditer(key, text):
>>      matches = []
>>      for match in re.finditer(re.escape(key), text):
>>      matches.append((match.start(), match.end()))
>>      return matches
>>
>>
>>      def using_simple_loop(key, text):
>>      matches = []
>>      for i in range(len(text)):
>>      if text[i:].startswith(key):
>>      matches.append((i, i + len(key)))
>>      return matches
>>
>>
>>      CORPUS = """Searching for a string in another string, in a performant 
>> way, is
>>      not as simple as it first appears. Your version works correctly, but 
>> slowly.
>>      In some situations it doesn't matter, but in other cases it will. For 
>> better
>>      performance, string searching algorithms jump ahead either when they 
>> found a
>>      match or when they know for sure there isn't a match for some time (see 
>> e.g.
>>      the Boyer–Moore string-search algorithm). You could write such a more
>>      efficient algorithm, but then it becomes more complex and more 
>> error-prone.
>>      Using a well-tested existing function becomes quite attractive."""
>>      KEY = 'in'
>>      print('using_simple_loop:', timeit.repeat(stmt='using_simple_lo

Re: How to escape strings for re.finditer?

2023-02-28 Thread Jen Kris via Python-list


Using str.startswith is a cool idea in this case.  But is it better than regex 
for performance or reliability?  Regex syntax is not a model of simplicity, but 
in my simple case it's not too difficult.  


Feb 27, 2023, 18:52 by li...@tompassin.net:

> On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote:
>
>> And, just for fun, since there is nothing wrong with your code, this minor 
>> change is terser:
>>
>>>>> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
>>>>> for match in re.finditer(re.escape('abc_degree + 1') , example):
>>>>>
>> ... print(match.start(), match.end())
>> ...
>> ...
>> 4 18
>> 26 40
>>
>
> Just for more fun :) -
>
> Without knowing how general your expressions will be, I think the following 
> version is very readable, certainly more readable than regexes:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> KEY = 'abc_degree + 1'
>
> for i in range(len(example)):
>  if example[i:].startswith(KEY):
>  print(i, i + len(KEY))
> # prints:
> 4 18
> 26 40
>
> If you may have variable numbers of spaces around the symbols, OTOH, the 
> whole situation changes and then regexes would almost certainly be the best 
> approach.  But the regular expression strings would become harder to read.
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

RE: How to escape strings for re.finditer?

2023-02-28 Thread Jen Kris via Python-list

The code I sent is correct, and it runs here.  Maybe you received it with a 
carriage return removed, but on my copy after posting, it is correct:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
 find_string = re.escape('abc_degree + 1')
 for match in re.finditer(find_string, example):
 print(match.start(), match.end())

One question:  several people have made suggestions other than regex (not your 
terser example with regex you shown below).  Is there a reason why regex is not 
preferred to, for example, a list comp?  Performance?  Reliability?  



  


Feb 27, 2023, 18:16 by avi.e.gr...@gmail.com:

> Jen,
>
> Can you see what SOME OF US see as ASCII text? We can help you better if we 
> get code that can be copied and run as-is.
>
>  What you sent is not terse. It is wrong. It will not run on any python 
> interpreter because you somehow lost a carriage return and indent.
>
> This is what you sent:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> find_string = re.escape('abc_degree + 1') for match in 
> re.finditer(find_string, example):
>  print(match.start(), match.end())
>
> This is code indentedproperly:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> find_string = re.escape('abc_degree + 1') 
> for match in re.finditer(find_string, example):
>  print(match.start(), match.end())
>
> Of course I am sure you wrote and ran code more like the latter version but 
> somewhere in your copy/paste process, 
>
> And, just for fun, since there is nothing wrong with your code, this minor 
> change is terser:
>
>>>> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
>>>> for match in re.finditer(re.escape('abc_degree + 1') , example):
>>>>
> ... print(match.start(), match.end())
> ... 
> ... 
> 4 18
> 26 40
>
> But note once you use regular expressions, and not in your case, you might 
> match multiple things that are far from the same such as matching two 
> repeated words of any kind in any case including "and and" and "so so" or 
> finding words that have multiple doubled letter as in the  stereotypical 
> bookkeeper. In those cases, you may want even more than offsets but also show 
> the exact text that matched or even show some characters before and/or after 
> for context.
>
>
> -Original Message-
> From: Python-list  On 
> Behalf Of Jen Kris via Python-list
> Sent: Monday, February 27, 2023 8:36 PM
> To: Cameron Simpson 
> Cc: Python List 
> Subject: Re: How to escape strings for re.finditer?
>
>
> I haven't tested it either but it looks like it would work.  But for this 
> case I prefer the relative simplicity of:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> find_string = re.escape('abc_degree + 1') for match in 
> re.finditer(find_string, example):
>  print(match.start(), match.end())
>
> 4 18
> 26 40
>
> I don't insist on terseness for its own sake, but it's cleaner this way. 
>
> Jen
>
>
> Feb 27, 2023, 16:55 by c...@cskk.id.au:
>
>> On 28Feb2023 01:13, Jen Kris  wrote:
>>
>>> I went to the re module because the specified string may appear more than 
>>> once in the string (in the code I'm writing).
>>>
>>
>> Sure, but writing a `finditer` for plain `str` is pretty easy (untested):
>>
>>  pos = 0
>>  while True:
>>  found = s.find(substring, pos)
>>  if found < 0:
>>  break
>>  start = found
>>  end = found + len(substring)
>>  ... do whatever with start and end ...
>>  pos = end
>>
>> Many people go straight to the `re` module whenever they're looking for 
>> strings. It is often cryptic error prone overkill. Just something to keep in 
>> mind.
>>
>> Cheers,
>> Cameron Simpson 
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-28 Thread Thomas Passin


On 2/28/2023 10:05 AM, Roel Schroeven wrote:

Op 28/02/2023 om 14:35 schreef Thomas Passin:

On 2/28/2023 4:33 AM, Roel Schroeven wrote:

[...]
(2) Searching for a string in another string, in a performant way, is 
not as simple as it first appears. Your version works correctly, but 
slowly. In some situations it doesn't matter, but in other cases it 
will. For better performance, string searching algorithms jump ahead 
either when they found a match or when they know for sure there isn't 
a match for some time (see e.g. the Boyer–Moore string-search 
algorithm). You could write such a more efficient algorithm, but then 
it becomes more complex and more error-prone. Using a well-tested 
existing function becomes quite attractive.


Sure, it all depends on what the real task will be.  That's why I 
wrote "Without knowing how general your expressions will be". For the 
example string, it's unlikely that speed will be a factor, but who 
knows what target strings and keys will turn up in the future?
On hindsight I think it was overthinking things a bit. "It all depends 
on what the real task will be" you say, and indeed I think that should 
be the main conclusion here.



It is interesting, though, how pre-processing the search pattern can 
improve search times if you can afford the pre-processing.  Here's a 
paper on rapidly finding matches when there may be up to one misspelled 
character.  It's easy enough to implement, though in Python you can't 
take the additional step of tuning it to stay in cache.


https://Robert.Muth.Org/Papers/1996-Approx-Multi.Pdf

--
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-28 Thread Roel Schroeven


Op 28/02/2023 om 14:35 schreef Thomas Passin:

On 2/28/2023 4:33 AM, Roel Schroeven wrote:

[...]
(2) Searching for a string in another string, in a performant way, is 
not as simple as it first appears. Your version works correctly, but 
slowly. In some situations it doesn't matter, but in other cases it 
will. For better performance, string searching algorithms jump ahead 
either when they found a match or when they know for sure there isn't 
a match for some time (see e.g. the Boyer–Moore string-search 
algorithm). You could write such a more efficient algorithm, but then 
it becomes more complex and more error-prone. Using a well-tested 
existing function becomes quite attractive.


Sure, it all depends on what the real task will be.  That's why I 
wrote "Without knowing how general your expressions will be". For the 
example string, it's unlikely that speed will be a factor, but who 
knows what target strings and keys will turn up in the future?
On hindsight I think it was overthinking things a bit. "It all depends 
on what the real task will be" you say, and indeed I think that should 
be the main conclusion here.


--
"Man had always assumed that he was more intelligent than dolphins because
he had achieved so much — the wheel, New York, wars and so on — whilst all
the dolphins had ever done was muck about in the water having a good time.
But conversely, the dolphins had always believed that they were far more
intelligent than man — for precisely the same reasons."
-- Douglas Adams

--
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-28 Thread Thomas Passin


On 2/28/2023 4:33 AM, Roel Schroeven wrote:

Op 28/02/2023 om 3:44 schreef Thomas Passin:

On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote:
And, just for fun, since there is nothing wrong with your code, this 
minor change is terser:



example = 'X - abc_degree + 1 + qq + abc_degree + 1'
for match in re.finditer(re.escape('abc_degree + 1') , example):

... print(match.start(), match.end())
...
...
4 18
26 40


Just for more fun :) -

Without knowing how general your expressions will be, I think the 
following version is very readable, certainly more readable than regexes:


example = 'X - abc_degree + 1 + qq + abc_degree + 1'
KEY = 'abc_degree + 1'

for i in range(len(example)):
    if example[i:].startswith(KEY):
    print(i, i + len(KEY))
# prints:
4 18
26 40
I think it's often a good idea to use a standard library function 
instead of rolling your own. The issue becomes less clear-cut when the 
standard library doesn't do exactly what you need (as here, where 
re.finditer() uses regular expressions while the use case only uses 
simple search strings). Ideally there would be a str.finditer() method 
we could use, but in the absence of that I think we still need to 
consider using the almost-but-not-quite fitting re.finditer().


Two reasons:

(1) I think it's clearer: the name tells us what it does (though of 
course we could solve this in a hand-written version by wrapping it in a 
suitably named function).


(2) Searching for a string in another string, in a performant way, is 
not as simple as it first appears. Your version works correctly, but 
slowly. In some situations it doesn't matter, but in other cases it 
will. For better performance, string searching algorithms jump ahead 
either when they found a match or when they know for sure there isn't a 
match for some time (see e.g. the Boyer–Moore string-search algorithm). 
You could write such a more efficient algorithm, but then it becomes 
more complex and more error-prone. Using a well-tested existing function 
becomes quite attractive.


Sure, it all depends on what the real task will be.  That's why I wrote 
"Without knowing how general your expressions will be". For the example 
string, it's unlikely that speed will be a factor, but who knows what 
target strings and keys will turn up in the future?


To illustrate the difference performance, I did a simple test (using the 
paragraph above is test text):


     import re
     import timeit

     def using_re_finditer(key, text):
     matches = []
     for match in re.finditer(re.escape(key), text):
     matches.append((match.start(), match.end()))
     return matches


     def using_simple_loop(key, text):
     matches = []
     for i in range(len(text)):
     if text[i:].startswith(key):
     matches.append((i, i + len(key)))
     return matches


     CORPUS = """Searching for a string in another string, in a 
performant way, is
     not as simple as it first appears. Your version works correctly, 
but slowly.
     In some situations it doesn't matter, but in other cases it will. 
For better
     performance, string searching algorithms jump ahead either when 
they found a
     match or when they know for sure there isn't a match for some time 
(see e.g.

     the Boyer–Moore string-search algorithm). You could write such a more
     efficient algorithm, but then it becomes more complex and more 
error-prone.

     Using a well-tested existing function becomes quite attractive."""
     KEY = 'in'
     print('using_simple_loop:', 
timeit.repeat(stmt='using_simple_loop(KEY, CORPUS)', globals=globals(), 
number=1000))
     print('using_re_finditer:', 
timeit.repeat(stmt='using_re_finditer(KEY, CORPUS)', globals=globals(), 
number=1000))


This does 5 runs of 1000 repetitions each, and reports the time in 
seconds for each of those runs.

Result on my machine:

     using_simple_loop: [0.1395295020792, 0.1306313000456, 
0.1280345001249, 0.1318618002423, 0.1308461032626]
     using_re_finditer: [0.00386140005233, 0.00406190124297, 
0.00347899970256, 0.00341310216218, 0.003732001273]


We find that in this test re.finditer() is more than 30 times faster 
(despite the overhead of regular expressions.


While speed isn't everything in programming, with such a large 
difference in performance and (to me) no real disadvantages of using 
re.finditer(), I would prefer re.finditer() over writing my own.




--
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-28 Thread Roel Schroeven


Op 28/02/2023 om 3:44 schreef Thomas Passin:

On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote:
And, just for fun, since there is nothing wrong with your code, this 
minor change is terser:



example = 'X - abc_degree + 1 + qq + abc_degree + 1'
for match in re.finditer(re.escape('abc_degree + 1') , example):

... print(match.start(), match.end())
...
...
4 18
26 40


Just for more fun :) -

Without knowing how general your expressions will be, I think the 
following version is very readable, certainly more readable than regexes:


example = 'X - abc_degree + 1 + qq + abc_degree + 1'
KEY = 'abc_degree + 1'

for i in range(len(example)):
    if example[i:].startswith(KEY):
    print(i, i + len(KEY))
# prints:
4 18
26 40
I think it's often a good idea to use a standard library function 
instead of rolling your own. The issue becomes less clear-cut when the 
standard library doesn't do exactly what you need (as here, where 
re.finditer() uses regular expressions while the use case only uses 
simple search strings). Ideally there would be a str.finditer() method 
we could use, but in the absence of that I think we still need to 
consider using the almost-but-not-quite fitting re.finditer().


Two reasons:

(1) I think it's clearer: the name tells us what it does (though of 
course we could solve this in a hand-written version by wrapping it in a 
suitably named function).


(2) Searching for a string in another string, in a performant way, is 
not as simple as it first appears. Your version works correctly, but 
slowly. In some situations it doesn't matter, but in other cases it 
will. For better performance, string searching algorithms jump ahead 
either when they found a match or when they know for sure there isn't a 
match for some time (see e.g. the Boyer–Moore string-search algorithm). 
You could write such a more efficient algorithm, but then it becomes 
more complex and more error-prone. Using a well-tested existing function 
becomes quite attractive.


To illustrate the difference performance, I did a simple test (using the 
paragraph above is test text):


    import re
    import timeit

    def using_re_finditer(key, text):
    matches = []
    for match in re.finditer(re.escape(key), text):
    matches.append((match.start(), match.end()))
    return matches


    def using_simple_loop(key, text):
    matches = []
    for i in range(len(text)):
    if text[i:].startswith(key):
    matches.append((i, i + len(key)))
    return matches


    CORPUS = """Searching for a string in another string, in a 
performant way, is
    not as simple as it first appears. Your version works correctly, 
but slowly.
    In some situations it doesn't matter, but in other cases it will. 
For better
    performance, string searching algorithms jump ahead either when 
they found a
    match or when they know for sure there isn't a match for some time 
(see e.g.

    the Boyer–Moore string-search algorithm). You could write such a more
    efficient algorithm, but then it becomes more complex and more 
error-prone.

    Using a well-tested existing function becomes quite attractive."""
    KEY = 'in'
    print('using_simple_loop:', 
timeit.repeat(stmt='using_simple_loop(KEY, CORPUS)', globals=globals(), 
number=1000))
    print('using_re_finditer:', 
timeit.repeat(stmt='using_re_finditer(KEY, CORPUS)', globals=globals(), 
number=1000))


This does 5 runs of 1000 repetitions each, and reports the time in 
seconds for each of those runs.

Result on my machine:

    using_simple_loop: [0.1395295020792, 0.1306313000456, 
0.1280345001249, 0.1318618002423, 0.1308461032626]
    using_re_finditer: [0.00386140005233, 0.00406190124297, 
0.00347899970256, 0.00341310216218, 0.003732001273]


We find that in this test re.finditer() is more than 30 times faster 
(despite the overhead of regular expressions.


While speed isn't everything in programming, with such a large 
difference in performance and (to me) no real disadvantages of using 
re.finditer(), I would prefer re.finditer() over writing my own.


--
"The saddest aspect of life right now is that science gathers knowledge
faster than society gathers wisdom."
-- Isaac Asimov

--
https://mail.python.org/mailman/listinfo/python-list

RE: How to escape strings for re.finditer?

2023-02-27 Thread avi.e.gross

I think by now we have given all that is needed by the OP but Dave's answer
strikes me as being able to be a tad faster as a while loop if you are
searching  larger corpus such as an entire ebook or all books as you can do
on books.google.com

I think I mentioned earlier that some assumptions need to apply. The text
needs to be something like an ASCII encoding or seen as code points rather
than bytes. We assume a match should move forward by the length of the
match. And, clearly, there cannot be a match too close to the end.

So a while loop would begin with a variable set to zero to mark the current
location of the search. The condition for repeating the loop is that this
variable is less than or equal to len(searched_text) - len(key)

In the loop, each comparison is done the same way as David uses, or anything
similar enough but the twist is a failure increments the variable by 1 while
success increments by len(key).

Will this make much difference? It might as the simpler algorithm counts
overlapping matches and wastes some time hunting where perhaps it shouldn't.

And, of course, if you made something like this into a search function, you
can easily add features such as asking that you only return the first N
matches or the next N, simply by making it a generator.
So tying this into an earlier discussion, do you want the LAST match info
visible when the While loop has completed? If it was available, it opens up
possibilities for running the loop again but starting from where you left
off.

-Original Message-
From: Python-list  On
Behalf Of Thomas Passin
Sent: Monday, February 27, 2023 9:44 PM
To: python-list@python.org
Subject: Re: How to escape strings for re.finditer?

On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote:
> And, just for fun, since there is nothing wrong with your code, this minor
change is terser:
> 
>>>> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
>>>> for match in re.finditer(re.escape('abc_degree + 1') , example):
> ... print(match.start(), match.end())
> ...
> ...
> 4 18
> 26 40

Just for more fun :) -

Without knowing how general your expressions will be, I think the following
version is very readable, certainly more readable than regexes:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
KEY = 'abc_degree + 1'

for i in range(len(example)):
 if example[i:].startswith(KEY):
 print(i, i + len(KEY))
# prints:
4 18
26 40

If you may have variable numbers of spaces around the symbols, OTOH, the
whole situation changes and then regexes would almost certainly be the best
approach.  But the regular expression strings would become harder to read.
--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-27 Thread Thomas Passin


On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote:

And, just for fun, since there is nothing wrong with your code, this minor 
change is terser:


example = 'X - abc_degree + 1 + qq + abc_degree + 1'
for match in re.finditer(re.escape('abc_degree + 1') , example):

... print(match.start(), match.end())
...
...
4 18
26 40


Just for more fun :) -

Without knowing how general your expressions will be, I think the 
following version is very readable, certainly more readable than regexes:


example = 'X - abc_degree + 1 + qq + abc_degree + 1'
KEY = 'abc_degree + 1'

for i in range(len(example)):
if example[i:].startswith(KEY):
print(i, i + len(KEY))
# prints:
4 18
26 40

If you may have variable numbers of spaces around the symbols, OTOH, the 
whole situation changes and then regexes would almost certainly be the 
best approach.  But the regular expression strings would become harder 
to read.

--
https://mail.python.org/mailman/listinfo/python-list

RE: How to escape strings for re.finditer?

2023-02-27 Thread avi.e.gross

Jen,

Can you see what SOME OF US see as ASCII text? We can help you better if we get 
code that can be copied and run as-is.

 What you sent is not terse. It is wrong. It will not run on any python 
interpreter because you somehow lost a carriage return and indent.

This is what you sent:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
find_string = re.escape('abc_degree + 1') for match in re.finditer(find_string, 
example):
print(match.start(), match.end())

This is code indentedproperly:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
find_string = re.escape('abc_degree + 1') 
for match in re.finditer(find_string, example):
print(match.start(), match.end())

Of course I am sure you wrote and ran code more like the latter version but 
somewhere in your copy/paste process, 

And, just for fun, since there is nothing wrong with your code, this minor 
change is terser:

>>> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
>>> for match in re.finditer(re.escape('abc_degree + 1') , example):
... print(match.start(), match.end())
... 
... 
4 18
26 40

But note once you use regular expressions, and not in your case, you might 
match multiple things that are far from the same such as matching two repeated 
words of any kind in any case including "and and" and "so so" or finding words 
that have multiple doubled letter as in the  stereotypical bookkeeper. In those 
cases, you may want even more than offsets but also show the exact text that 
matched or even show some characters before and/or after for context.

-Original Message-
From: Python-list  On 
Behalf Of Jen Kris via Python-list
Sent: Monday, February 27, 2023 8:36 PM
To: Cameron Simpson 
Cc: Python List 
Subject: Re: How to escape strings for re.finditer?

I haven't tested it either but it looks like it would work.  But for this case 
I prefer the relative simplicity of:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
find_string = re.escape('abc_degree + 1') for match in re.finditer(find_string, 
example):
print(match.start(), match.end())

4 18
26 40

I don't insist on terseness for its own sake, but it's cleaner this way.  

Jen

Feb 27, 2023, 16:55 by c...@cskk.id.au:

> On 28Feb2023 01:13, Jen Kris  wrote:
>
>> I went to the re module because the specified string may appear more than 
>> once in the string (in the code I'm writing).
>>
>
> Sure, but writing a `finditer` for plain `str` is pretty easy (untested):
>
>  pos = 0
>  while True:
>  found = s.find(substring, pos)
>  if found < 0:
>  break
>  start = found
>  end = found + len(substring)
>  ... do whatever with start and end ...
>  pos = end
>
> Many people go straight to the `re` module whenever they're looking for 
> strings. It is often cryptic error prone overkill. Just something to keep in 
> mind.
>
> Cheers,
> Cameron Simpson 
> --
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

RE: How to escape strings for re.finditer?

2023-02-27 Thread avi.e.gross

Jen,

What you just described is why that tool is not the right tool for the job, 
albeit it may help you confirm if whatever method you choose does work 
correctly and finds the same number of matches.

Sometimes you simply do some searching and roll your own.

Consider this code using a sort of list comprehension feature:

>>> short = "hello world"
>>> longer =  "hello world is how many programs start for novices but some use 
>>> hello world! to show how happy they are to say hello world"

>>> short in longer
True
>>> howLong = len(short)

>>> res = [(offset, offset + howLong)  for offset  in range(len(longer)) if 
>>> longer.startswith(short, offset)]
>>> res
[(0, 11), (64, 75), (111, 122)]
>>> len(res)
3

I could do a bit more but it seems to work. Did I get the offsets right? 
Checking:

>>> print( [ longer[res[index][0]:res[index][1]] for index in range(len(res))])
['hello world', 'hello world', 'hello world']

Seems to work but thrown together quickly so can likely be done much nicer.

But as noted, the above has flaws such as matching overlaps like:

>>> short = "good good"
>>> longer = "A good good good but not douple plus good good good goody"
>>> howLong = len(short)
>>> res = [(offset, offset + howLong)  for offset  in range(len(longer)) if 
>>> longer.startswith(short, offset)]
>>> res
[(2, 11), (7, 16), (37, 46), (42, 51), (47, 56)]

It matched five times as sometimes we had three of four good in a row. Some 
other method might match only three.

What some might do can get long and you clearly want one answer and not 
tutorials. For example, people can make a loop that finds a match and either 
sabotages the area by replacing or deleting it, or keeps track and searched 
again on a substring offset from the beginning. 

When you do not find a tool, consider making one. You can take (better) code 
than I show above and make it info a function and now you have a tool. Even 
better, you can make it return whatever you want.

-Original Message-
From: Python-list  On 
Behalf Of Jen Kris via Python-list
Sent: Monday, February 27, 2023 7:40 PM
To: Bob van der Poel 
Cc: Python List 
Subject: Re: How to escape strings for re.finditer?


string.count() only tells me there are N instances of the string; it does not 
say where they begin and end, as does re.finditer.  

Feb 27, 2023, 16:20 by bobmellow...@gmail.com:

> Would string.count() work for you then?
>
> On Mon, Feb 27, 2023 at 5:16 PM Jen Kris via Python-list <> 
> python-list@python.org> > wrote:
>
>>
>> I went to the re module because the specified string may appear more 
>> than once in the string (in the code I'm writing).  For example:
>>  
>>  a = "X - abc_degree + 1 + qq + abc_degree + 1"
>>   b = "abc_degree + 1"
>>   q = a.find(b)
>>  
>>  print(q)
>>  4
>>  
>>  So it correctly finds the start of the first instance, but not the 
>> second one.  The re code finds both instances.  If I knew that the substring 
>> occurred only once then the str.find would be best.
>>  
>>  I changed my re code after MRAB's comment, it now works.
>>  
>>  Thanks much.
>>  
>>  Jen
>>  
>>  
>>  Feb 27, 2023, 15:56 by >> c...@cskk.id.au>> :
>>  
>>  > On 28Feb2023 00:11, Jen Kris <>> jenk...@tutanota.com>> > wrote:
>>  >
>>  >> When matching a string against a longer string, where both 
>> strings have spaces in them, we need to escape the spaces.  >>  >> 
>> This works (no spaces):
>>  >>
>>  >> import re
>>  >> example = 'abcdefabcdefabcdefg'
>>  >> find_string = "abc"
>>  >> for match in re.finditer(find_string, example):
>>  >> print(match.start(), match.end())  >>  >> That gives me the 
>> start and end character positions, which is what I want.
>>  >>
>>  >> However, this does not work:
>>  >>
>>  >> import re
>>  >> example = re.escape('X - cty_degrees + 1 + qq')  >> find_string = 
>> re.escape('cty_degrees + 1')  >> for match in 
>> re.finditer(find_string, example):
>>  >> print(match.start(), match.end())  >>  >> I’ve tried several 
>> other attempts based on my reseearch, but still no match.
>>  >>
>>  >
>>  > You need to print those strings out. You're escaping the _example_ 
>> string, which would make it:
>>  >
>>  >  X - cty_degrees \+ 1 \+ qq
>>  >
>>  > because `+` is a special character in

Re: How to escape strings for re.finditer?

2023-02-27 Thread Jen Kris via Python-list


I haven't tested it either but it looks like it would work.  But for this case 
I prefer the relative simplicity of:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
find_string = re.escape('abc_degree + 1')
for match in re.finditer(find_string, example):
    print(match.start(), match.end())

4 18
26 40

I don't insist on terseness for its own sake, but it's cleaner this way.  

Jen


Feb 27, 2023, 16:55 by c...@cskk.id.au:

> On 28Feb2023 01:13, Jen Kris  wrote:
>
>> I went to the re module because the specified string may appear more than 
>> once in the string (in the code I'm writing).
>>
>
> Sure, but writing a `finditer` for plain `str` is pretty easy (untested):
>
>  pos = 0
>  while True:
>  found = s.find(substring, pos)
>  if found < 0:
>  break
>  start = found
>  end = found + len(substring)
>  ... do whatever with start and end ...
>  pos = end
>
> Many people go straight to the `re` module whenever they're looking for 
> strings. It is often cryptic error prone overkill. Just something to keep in 
> mind.
>
> Cheers,
> Cameron Simpson 
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-27 Thread Cameron Simpson


On 28Feb2023 00:57, Jen Kris  wrote:
Yes, that's it.  I don't know how long it would have taken to find that 
detail with research through the voluminous re documentation.  Thanks 
very much. 


You find things like this by printing out the strings you're actually 
working with. Not the original strings, but the strings when you're 
invoking `finditer` i.e. in your case, escaped strings.


Then you might have seen that what you were searching no longer 
contained what you were searching for.


Don't underestimate the value of the debugging print call. It lets you 
see what your programme is actually working with, instead of what you 
thought it was working with.


Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-27 Thread Cameron Simpson


On 28Feb2023 01:13, Jen Kris  wrote:
I went to the re module because the specified string may appear more 
than once in the string (in the code I'm writing).


Sure, but writing a `finditer` for plain `str` is pretty easy 
(untested):


pos = 0
while True:
found = s.find(substring, pos)
if found < 0:
break
start = found
end = found + len(substring)
... do whatever with start and end ...
pos = end

Many people go straight to the `re` module whenever they're looking for 
strings. It is often cryptic error prone overkill. Just something to 
keep in mind.


Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-27 Thread Jen Kris via Python-list


string.count() only tells me there are N instances of the string; it does not 
say where they begin and end, as does re.finditer.  

Feb 27, 2023, 16:20 by bobmellow...@gmail.com:

> Would string.count() work for you then?
>
> On Mon, Feb 27, 2023 at 5:16 PM Jen Kris via Python-list <> 
> python-list@python.org> > wrote:
>
>>
>> I went to the re module because the specified string may appear more than 
>> once in the string (in the code I'm writing).  For example: 
>>  
>>  a = "X - abc_degree + 1 + qq + abc_degree + 1"
>>   b = "abc_degree + 1"
>>   q = a.find(b)
>>  
>>  print(q)
>>  4
>>  
>>  So it correctly finds the start of the first instance, but not the second 
>> one.  The re code finds both instances.  If I knew that the substring 
>> occurred only once then the str.find would be best.  
>>  
>>  I changed my re code after MRAB's comment, it now works.  
>>  
>>  Thanks much.  
>>  
>>  Jen
>>  
>>  
>>  Feb 27, 2023, 15:56 by >> c...@cskk.id.au>> :
>>  
>>  > On 28Feb2023 00:11, Jen Kris <>> jenk...@tutanota.com>> > wrote:
>>  >
>>  >> When matching a string against a longer string, where both strings have 
>> spaces in them, we need to escape the spaces. 
>>  >>
>>  >> This works (no spaces):
>>  >>
>>  >> import re
>>  >> example = 'abcdefabcdefabcdefg'
>>  >> find_string = "abc"
>>  >> for match in re.finditer(find_string, example):
>>  >>     print(match.start(), match.end())
>>  >>
>>  >> That gives me the start and end character positions, which is what I 
>> want. 
>>  >>
>>  >> However, this does not work:
>>  >>
>>  >> import re
>>  >> example = re.escape('X - cty_degrees + 1 + qq')
>>  >> find_string = re.escape('cty_degrees + 1')
>>  >> for match in re.finditer(find_string, example):
>>  >>     print(match.start(), match.end())
>>  >>
>>  >> I’ve tried several other attempts based on my reseearch, but still no 
>> match. 
>>  >>
>>  >
>>  > You need to print those strings out. You're escaping the _example_ 
>> string, which would make it:
>>  >
>>  >  X - cty_degrees \+ 1 \+ qq
>>  >
>>  > because `+` is a special character in regexps and so `re.escape` escapes 
>> it. But you don't want to mangle the string you're searching! After all, the 
>> text above does not contain the string `cty_degrees + 1`.
>>  >
>>  > My secondary question is: if you're escaping the thing you're searching 
>> _for_, then you're effectively searching for a _fixed_ string, not a 
>> pattern/regexp. So why on earth are you using regexps to do your searching?
>>  >
>>  > The `str` type has a `find(substring)` function. Just use that! It'll be 
>> faster and the code simpler!
>>  >
>>  > Cheers,
>>  > Cameron Simpson <>> c...@cskk.id.au>> >
>>  > -- 
>>  > >> https://mail.python.org/mailman/listinfo/python-list
>>  >
>>  
>>  -- 
>>  >> https://mail.python.org/mailman/listinfo/python-list
>>
>
>
> -- 
>  Listen to my CD at > http://www.mellowood.ca/music/cedars>  
> Bob van der Poel ** Wynndel, British Columbia, CANADA **
> EMAIL: > b...@mellowood.ca
> WWW:   > http://www.mellowood.ca
>

-- 
https://mail.python.org/mailman/listinfo/python-list

RE: How to escape strings for re.finditer?

2023-02-27 Thread avi.e.gross

Just FYI, Jen, there are times a sledgehammer works but perhaps is not the only 
way. These days people worry less about efficiency and more about programmer 
time and education and that can be fine.

But it you looked at methods available in strings or in some other modules, 
your situation is quite common. Some may use another RE front end called 
finditer().

I am NOT suggesting you do what I say next, but imagine writing a loop that 
takes a substring of what you are searching for of the same length as your 
search string. Near the end, it stops as there is too little left.

You can now simply test your searched for string against that substring for 
equality and it tends to return rapidly when they are not equal early on.

Your loop would return whatever data structure or results you want such as that 
it matched it three times at offsets a, b and c.

But do you allow overlaps? If not, your loop needs to skip len(search_str) 
after a match.

What you may want to consider is another form of pre-processing. Do you care if 
"abc_degree + 1" has missing or added spaces at the tart or end or anywhere in 
middle as in " abc_degree +1"?

Do you care if stuff is a different case like "Abc_Degree + 1"?

Some such searches can be done if both the pattern and searched string are 
first converted to a canonical format that maps to the same output. But that 
complicates things a bit and you may to display what you match differently.

And are you also willing to match this: "myabc_degree + 1"?

When using a crafter RE there is a way to ask for a word boundary so abc will 
only be matched if before that is a space or the start of the string and not 
"my".

So this may be a case where you can solve an easy version with the chance it 
can be fooled or overengineer it. If you are allowing the user to type in what 
to search for, as many programs including editors, do, you will often find such 
false positives unless the user knows RE syntax and applies it and you do not 
escape it. I have experienced havoc when doing a careless global replace that 
matched more than I expected, including making changes in comments or constant 
strings rather than just the name of a function. Adding a paren is helpful as 
is not replacing them all but one at a time and skipping any that are not 
wanted.

Good luck.

-Original Message-
From: Python-list  On 
Behalf Of Jen Kris via Python-list
Sent: Monday, February 27, 2023 7:14 PM
To: Cameron Simpson 
Cc: Python List 
Subject: Re: How to escape strings for re.finditer?

I went to the re module because the specified string may appear more than once 
in the string (in the code I'm writing).  For example:  

a = "X - abc_degree + 1 + qq + abc_degree + 1"
 b = "abc_degree + 1"
 q = a.find(b)

print(q)
4

So it correctly finds the start of the first instance, but not the second one.  
The re code finds both instances.  If I knew that the substring occurred only 
once then the str.find would be best.  

I changed my re code after MRAB's comment, it now works.  

Thanks much.  

Jen

Feb 27, 2023, 15:56 by c...@cskk.id.au:

> On 28Feb2023 00:11, Jen Kris  wrote:
>
>> When matching a string against a longer string, where both strings 
>> have spaces in them, we need to escape the spaces.
>>
>> This works (no spaces):
>>
>> import re
>> example = 'abcdefabcdefabcdefg'
>> find_string = "abc"
>> for match in re.finditer(find_string, example):
>> print(match.start(), match.end())
>>
>> That gives me the start and end character positions, which is what I 
>> want.
>>
>> However, this does not work:
>>
>> import re
>> example = re.escape('X - cty_degrees + 1 + qq') find_string = 
>> re.escape('cty_degrees + 1') for match in re.finditer(find_string, 
>> example):
>> print(match.start(), match.end())
>>
>> I’ve tried several other attempts based on my reseearch, but still no 
>> match.
>>
>
> You need to print those strings out. You're escaping the _example_ string, 
> which would make it:
>
>  X - cty_degrees \+ 1 \+ qq
>
> because `+` is a special character in regexps and so `re.escape` escapes it. 
> But you don't want to mangle the string you're searching! After all, the text 
> above does not contain the string `cty_degrees + 1`.
>
> My secondary question is: if you're escaping the thing you're searching 
> _for_, then you're effectively searching for a _fixed_ string, not a 
> pattern/regexp. So why on earth are you using regexps to do your searching?
>
> The `str` type has a `find(substring)` function. Just use that! It'll be 
> faster and the code simpler!
>
> Cheers,
> Cameron Simpson 
> --
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-27 Thread Jen Kris via Python-list


I went to the re module because the specified string may appear more than once 
in the string (in the code I'm writing).  For example:  

a = "X - abc_degree + 1 + qq + abc_degree + 1"
 b = "abc_degree + 1"
 q = a.find(b)

print(q)
4

So it correctly finds the start of the first instance, but not the second one.  
The re code finds both instances.  If I knew that the substring occurred only 
once then the str.find would be best.  

I changed my re code after MRAB's comment, it now works.  

Thanks much.  

Jen


Feb 27, 2023, 15:56 by c...@cskk.id.au:

> On 28Feb2023 00:11, Jen Kris  wrote:
>
>> When matching a string against a longer string, where both strings have 
>> spaces in them, we need to escape the spaces. 
>>
>> This works (no spaces):
>>
>> import re
>> example = 'abcdefabcdefabcdefg'
>> find_string = "abc"
>> for match in re.finditer(find_string, example):
>>     print(match.start(), match.end())
>>
>> That gives me the start and end character positions, which is what I want. 
>>
>> However, this does not work:
>>
>> import re
>> example = re.escape('X - cty_degrees + 1 + qq')
>> find_string = re.escape('cty_degrees + 1')
>> for match in re.finditer(find_string, example):
>>     print(match.start(), match.end())
>>
>> I’ve tried several other attempts based on my reseearch, but still no match. 
>>
>
> You need to print those strings out. You're escaping the _example_ string, 
> which would make it:
>
>  X - cty_degrees \+ 1 \+ qq
>
> because `+` is a special character in regexps and so `re.escape` escapes it. 
> But you don't want to mangle the string you're searching! After all, the text 
> above does not contain the string `cty_degrees + 1`.
>
> My secondary question is: if you're escaping the thing you're searching 
> _for_, then you're effectively searching for a _fixed_ string, not a 
> pattern/regexp. So why on earth are you using regexps to do your searching?
>
> The `str` type has a `find(substring)` function. Just use that! It'll be 
> faster and the code simpler!
>
> Cheers,
> Cameron Simpson 
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

RE: How to escape strings for re.finditer?

2023-02-27 Thread avi.e.gross

MRAB makes a valid point. The regular expression compiled is only done on the 
pattern you are looking for and it it contains anything that might be a 
command, such as an ^ at the start or [12] in middle, you want that converted 
so NONE OF THAT is one. It will be compiled to something that looks for an ^, 
including later in the string, and look for a real [ then a real 1 and a real 2 
and a real ], not for one of the choices of 1 or 2. 

Your example was 'cty_degrees + 1' which can have a subtle bug introduced. The 
special character is "+" which means match greedily as many copies of the 
previous entity as possible. In this case, the previous entity was a single 
space. So the regular expression will match 'cty degrees' then match the single 
space it sees because it sees a space followed ny a plus  then not looking for 
a plus, hits a plus and fails. If your example is rewritten in whatever way 
re.escape uses, it might be 'cty_degrees \+ 1' and then it should work fine.

But converting what you are searching for just breaks that as the result will 
have a '\+" whish is being viewed as two unrelated symbols and the backslash 
breaks the match from going further.

-Original Message-
From: Python-list  On 
Behalf Of MRAB
Sent: Monday, February 27, 2023 6:46 PM
To: python-list@python.org
Subject: Re: How to escape strings for re.finditer?

On 2023-02-27 23:11, Jen Kris via Python-list wrote:
> When matching a string against a longer string, where both strings have 
> spaces in them, we need to escape the spaces.
> 
> This works (no spaces):
> 
> import re
> example = 'abcdefabcdefabcdefg'
> find_string = "abc"
> for match in re.finditer(find_string, example):
>  print(match.start(), match.end())
> 
> That gives me the start and end character positions, which is what I want.
> 
> However, this does not work:
> 
> import re
> example = re.escape('X - cty_degrees + 1 + qq') find_string = 
> re.escape('cty_degrees + 1') for match in re.finditer(find_string, 
> example):
>  print(match.start(), match.end())
> 
> I’ve tried several other attempts based on my reseearch, but still no match.
> 
> I don’t have much experience with regex, so I hoped a reg-expert might help.
> 
You need to escape only the pattern, not the string you're searching.
--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-27 Thread Jen Kris via Python-list

Yes, that's it.  I don't know how long it would have taken to find that detail 
with research through the voluminous re documentation.  Thanks very much.  

Feb 27, 2023, 15:47 by pyt...@mrabarnett.plus.com:

> On 2023-02-27 23:11, Jen Kris via Python-list wrote:
>
>> When matching a string against a longer string, where both strings have 
>> spaces in them, we need to escape the spaces.
>>
>> This works (no spaces):
>>
>> import re
>> example = 'abcdefabcdefabcdefg'
>> find_string = "abc"
>> for match in re.finditer(find_string, example):
>>      print(match.start(), match.end())
>>
>> That gives me the start and end character positions, which is what I want.
>>
>> However, this does not work:
>>
>> import re
>> example = re.escape('X - cty_degrees + 1 + qq')
>> find_string = re.escape('cty_degrees + 1')
>> for match in re.finditer(find_string, example):
>>      print(match.start(), match.end())
>>
>> I’ve tried several other attempts based on my reseearch, but still no match.
>>
>> I don’t have much experience with regex, so I hoped a reg-expert might help.
>>
> You need to escape only the pattern, not the string you're searching.
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-27 Thread Cameron Simpson


On 28Feb2023 00:11, Jen Kris  wrote:

When matching a string against a longer string, where both strings have spaces 
in them, we need to escape the spaces. 

This works (no spaces):

import re
example = 'abcdefabcdefabcdefg'
find_string = "abc"
for match in re.finditer(find_string, example):
    print(match.start(), match.end())

That gives me the start and end character positions, which is what I want. 

However, this does not work:

import re
example = re.escape('X - cty_degrees + 1 + qq')
find_string = re.escape('cty_degrees + 1')
for match in re.finditer(find_string, example):
    print(match.start(), match.end())

I’ve tried several other attempts based on my reseearch, but still no 
match. 


You need to print those strings out. You're escaping the _example_ 
string, which would make it:


X - cty_degrees \+ 1 \+ qq

because `+` is a special character in regexps and so `re.escape` escapes 
it. But you don't want to mangle the string you're searching! After all, 
the text above does not contain the string `cty_degrees + 1`.


My secondary question is: if you're escaping the thing you're searching 
_for_, then you're effectively searching for a _fixed_ string, not a 
pattern/regexp. So why on earth are you using regexps to do your 
searching?


The `str` type has a `find(substring)` function. Just use that! It'll be 
faster and the code simpler!


Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-27 Thread MRAB


On 2023-02-27 23:11, Jen Kris via Python-list wrote:

When matching a string against a longer string, where both strings have spaces 
in them, we need to escape the spaces.

This works (no spaces):

import re
example = 'abcdefabcdefabcdefg'
find_string = "abc"
for match in re.finditer(find_string, example):
     print(match.start(), match.end())

That gives me the start and end character positions, which is what I want.

However, this does not work:

import re
example = re.escape('X - cty_degrees + 1 + qq')
find_string = re.escape('cty_degrees + 1')
for match in re.finditer(find_string, example):
     print(match.start(), match.end())

I’ve tried several other attempts based on my reseearch, but still no match.

I don’t have much experience with regex, so I hoped a reg-expert might help.


You need to escape only the pattern, not the string you're searching.
--
https://mail.python.org/mailman/listinfo/python-list

How to escape strings for re.finditer?

2023-02-27 Thread Jen Kris via Python-list

When matching a string against a longer string, where both strings have spaces 
in them, we need to escape the spaces.  

This works (no spaces):

import re
example = 'abcdefabcdefabcdefg'
find_string = "abc"
for match in re.finditer(find_string, example):
    print(match.start(), match.end())

That gives me the start and end character positions, which is what I want. 

However, this does not work:

import re
example = re.escape('X - cty_degrees + 1 + qq')
find_string = re.escape('cty_degrees + 1')
for match in re.finditer(find_string, example):
    print(match.start(), match.end())

I’ve tried several other attempts based on my reseearch, but still no match. 

I don’t have much experience with regex, so I hoped a reg-expert might help. 

Thanks,

Jen

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Ref-strings in logging messages (was: Performance issue with CPython 3.10 + Cython)

2022-10-24 Thread Barry Scott



> On 8 Oct 2022, at 11:50, Weatherby,Gerard  wrote:
> 
> Logging does support passing a callable, if indirectly. It only calls __str__ 
> on the object passed if debugging is enabled.
>  
> class Defer:
> 
> def __init__(self,fn):
> self.fn = fn
> 
> def __str__(self):
> return self.fn()
> 
> def some_expensive_function():
> return "hello"
> 
> logging.basicConfig()
> logging.debug(Defer(some_expensive_function))

Oh what a clever hack. Took a few minutes of code reading to see why this works.
You are exploiting the str(msg) that is in class LogRecords getMessage().

```
def getMessage(self):
"""
Return the message for this LogRecord.

Return the message for this LogRecord after merging any user-supplied
arguments with the message.
"""
msg = str(self.msg)
if self.args:
msg = msg % self.args
return msg
```

Barry


>  
>  
> From: Python-list  <mailto:python-list-bounces+gweatherby=uchc@python.org>> on behalf of 
> Barry mailto:ba...@barrys-emacs.org>>
> Date: Friday, October 7, 2022 at 1:30 PM
> To: MRAB mailto:pyt...@mrabarnett.plus.com>>
> Cc: python-list@python.org <mailto:python-list@python.org> 
> mailto:python-list@python.org>>
> Subject: Re: Ref-strings in logging messages (was: Performance issue with 
> CPython 3.10 + Cython)
> 
> *** Attention: This is an external email. Use caution responding, opening 
> attachments or clicking on links. ***
> 
> > On 7 Oct 2022, at 18:16, MRAB  wrote:
> >
> > On 2022-10-07 16:45, Skip Montanaro wrote:
> >>> On Fri, Oct 7, 2022 at 9:42 AM Andreas Ames 
> >>> 
> >>> wrote:
> >>> 1. The culprit was me. As lazy as I am, I have used f-strings all over the
> >>> place in calls to `logging.logger.debug()` and friends, evaluating all
> >>> arguments regardless of whether the logger was enabled or not.
> >>>
> >> I thought there was some discussion about whether and how to efficiently
> >> admit f-strings to the logging package. I'm guessing that's not gone
> >> anywhere (yet).
> > Letting you pass in a callable to call might help because that you could 
> > use lambda.
> 
> Yep, that’s the obvious way to avoid expensive log data generation.
> Would need logging module to support that use case.
> 
> Barry
> 
> > --
> > https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!mrESxAj9YCHsdtNAfkNiY-Zf6U3WTIqaNrgBmbw1ELlQy51ilob43dD0ONsqvg4a94MEdOdwomgyqfyABbvRnA$
> >  
> > <https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!mrESxAj9YCHsdtNAfkNiY-Zf6U3WTIqaNrgBmbw1ELlQy51ilob43dD0ONsqvg4a94MEdOdwomgyqfyABbvRnA$>
> >
> 
> --
> https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!mrESxAj9YCHsdtNAfkNiY-Zf6U3WTIqaNrgBmbw1ELlQy51ilob43dD0ONsqvg4a94MEdOdwomgyqfyABbvRnA$
>  
> <https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!mrESxAj9YCHsdtNAfkNiY-Zf6U3WTIqaNrgBmbw1ELlQy51ilob43dD0ONsqvg4a94MEdOdwomgyqfyABbvRnA$>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Ref-strings in logging messages (was: Performance issue with CPython 3.10 + Cython)

2022-10-08 Thread Weatherby,Gerard

Logging does support passing a callable, if indirectly. It only calls __str__ 
on the object passed if debugging is enabled.

class Defer:

def __init__(self,fn):
self.fn = fn

def __str__(self):
return self.fn()

def some_expensive_function():
return "hello"

logging.basicConfig()
logging.debug(Defer(some_expensive_function))


From: Python-list  on 
behalf of Barry 
Date: Friday, October 7, 2022 at 1:30 PM
To: MRAB 
Cc: python-list@python.org 
Subject: Re: Ref-strings in logging messages (was: Performance issue with 
CPython 3.10 + Cython)
*** Attention: This is an external email. Use caution responding, opening 
attachments or clicking on links. ***

> On 7 Oct 2022, at 18:16, MRAB  wrote:
>
> On 2022-10-07 16:45, Skip Montanaro wrote:
>>> On Fri, Oct 7, 2022 at 9:42 AM Andreas Ames 
>>> wrote:
>>> 1. The culprit was me. As lazy as I am, I have used f-strings all over the
>>> place in calls to `logging.logger.debug()` and friends, evaluating all
>>> arguments regardless of whether the logger was enabled or not.
>>>
>> I thought there was some discussion about whether and how to efficiently
>> admit f-strings to the logging package. I'm guessing that's not gone
>> anywhere (yet).
> Letting you pass in a callable to call might help because that you could use 
> lambda.

Yep, that’s the obvious way to avoid expensive log data generation.
Would need logging module to support that use case.

Barry

> --
> https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!mrESxAj9YCHsdtNAfkNiY-Zf6U3WTIqaNrgBmbw1ELlQy51ilob43dD0ONsqvg4a94MEdOdwomgyqfyABbvRnA$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!mrESxAj9YCHsdtNAfkNiY-Zf6U3WTIqaNrgBmbw1ELlQy51ilob43dD0ONsqvg4a94MEdOdwomgyqfyABbvRnA$>
>

--
https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!mrESxAj9YCHsdtNAfkNiY-Zf6U3WTIqaNrgBmbw1ELlQy51ilob43dD0ONsqvg4a94MEdOdwomgyqfyABbvRnA$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!mrESxAj9YCHsdtNAfkNiY-Zf6U3WTIqaNrgBmbw1ELlQy51ilob43dD0ONsqvg4a94MEdOdwomgyqfyABbvRnA$>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Ref-strings in logging messages (was: Performance issue with CPython 3.10 + Cython)

2022-10-07 Thread Julian Smith

On Fri, 7 Oct 2022 18:28:06 +0100
Barry  wrote:

> > On 7 Oct 2022, at 18:16, MRAB  wrote:
> > 
> > On 2022-10-07 16:45, Skip Montanaro wrote:  
> >>> On Fri, Oct 7, 2022 at 9:42 AM Andreas Ames 
> >>> 
> >>> wrote:
> >>> 1. The culprit was me. As lazy as I am, I have used f-strings all over the
> >>> place in calls to `logging.logger.debug()` and friends, evaluating all
> >>> arguments regardless of whether the logger was enabled or not.
> >>>   
> >> I thought there was some discussion about whether and how to efficiently
> >> admit f-strings to the logging package. I'm guessing that's not gone
> >> anywhere (yet).  
> > Letting you pass in a callable to call might help because that you could 
> > use lambda.  
> 
> Yep, that’s the obvious way to avoid expensive log data generation.
> Would need logging module to support that use case.

I have some logging code that uses eval() to evaluate expressions using
locals and globals in a parent stack frame, together with a parser to
find `{...}` items in a string.

I guess this constitutes a (basic) runtime implementation of f-strings.
As such it can avoid expensive evaluation/parsing when disabled, though
it's probably slow when enabled compared to native f-strings. It seems
to work quite well in practise, and also allows one to add some extra
formatting features.

For details see:

https://git.ghostscript.com/?p=mupdf.git;a=blob;f=scripts/jlib.py;h=e85e9f2c4;hb=HEAD#l41

- Jules

-- 
http://op59.net
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Ref-strings in logging messages (was: Performance issue with CPython 3.10 + Cython)

2022-10-07 Thread Barry


> On 7 Oct 2022, at 19:09, Weatherby,Gerard  wrote:
> The obvious way to avoid log generation is:
> 
> if logger.isEnableFor(logging.DEBUG):
>logger.debug( expensive processing )
> 
> 
> Of course, having logging alter program flow could lead to hard to debug bugs.

Altered flow is less of an issue the the verbosity of the above.
We discussed ways to improve this pattern a few years ago.
That lead to no changes.

What I have used is a class that defines __bool__ to report if logging is 
enabled and __call__ to log. Then you can do this:

log_debug = logger_from(DEBUG)

log_debug and log_debug(‘expensive %s’ % (complex(),))

Barry

> 
> From: Python-list  on 
> behalf of Barry 
> Date: Friday, October 7, 2022 at 1:30 PM
> To: MRAB 
> Cc: python-list@python.org 
> Subject: Re: Ref-strings in logging messages (was: Performance issue with 
> CPython 3.10 + Cython)
> *** Attention: This is an external email. Use caution responding, opening 
> attachments or clicking on links. ***
> 
>> On 7 Oct 2022, at 18:16, MRAB  wrote:
>> 
>> On 2022-10-07 16:45, Skip Montanaro wrote:
>>>> On Fri, Oct 7, 2022 at 9:42 AM Andreas Ames 
>>>> wrote:
>>>> 1. The culprit was me. As lazy as I am, I have used f-strings all over the
>>>> place in calls to `logging.logger.debug()` and friends, evaluating all
>>>> arguments regardless of whether the logger was enabled or not.
>>> I thought there was some discussion about whether and how to efficiently
>>> admit f-strings to the logging package. I'm guessing that's not gone
>>> anywhere (yet).
>> Letting you pass in a callable to call might help because that you could use 
>> lambda.
> 
> Yep, that’s the obvious way to avoid expensive log data generation.
> Would need logging module to support that use case.
> 
> Barry
> 
>> --
>> https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!mrESxAj9YCHsdtNAfkNiY-Zf6U3WTIqaNrgBmbw1ELlQy51ilob43dD0ONsqvg4a94MEdOdwomgyqfyABbvRnA$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!mrESxAj9YCHsdtNAfkNiY-Zf6U3WTIqaNrgBmbw1ELlQy51ilob43dD0ONsqvg4a94MEdOdwomgyqfyABbvRnA$>
> 
> --
> https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!mrESxAj9YCHsdtNAfkNiY-Zf6U3WTIqaNrgBmbw1ELlQy51ilob43dD0ONsqvg4a94MEdOdwomgyqfyABbvRnA$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!mrESxAj9YCHsdtNAfkNiY-Zf6U3WTIqaNrgBmbw1ELlQy51ilob43dD0ONsqvg4a94MEdOdwomgyqfyABbvRnA$>
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Ref-strings in logging messages (was: Performance issue with CPython 3.10 + Cython)

2022-10-07 Thread Weatherby,Gerard

The obvious way to avoid log generation is:

if logger.isEnableFor(logging.DEBUG):
logger.debug( expensive processing )


Of course, having logging alter program flow could lead to hard to debug bugs.

From: Python-list  on 
behalf of Barry 
Date: Friday, October 7, 2022 at 1:30 PM
To: MRAB 
Cc: python-list@python.org 
Subject: Re: Ref-strings in logging messages (was: Performance issue with 
CPython 3.10 + Cython)
*** Attention: This is an external email. Use caution responding, opening 
attachments or clicking on links. ***

> On 7 Oct 2022, at 18:16, MRAB  wrote:
>
> On 2022-10-07 16:45, Skip Montanaro wrote:
>>> On Fri, Oct 7, 2022 at 9:42 AM Andreas Ames 
>>> wrote:
>>> 1. The culprit was me. As lazy as I am, I have used f-strings all over the
>>> place in calls to `logging.logger.debug()` and friends, evaluating all
>>> arguments regardless of whether the logger was enabled or not.
>>>
>> I thought there was some discussion about whether and how to efficiently
>> admit f-strings to the logging package. I'm guessing that's not gone
>> anywhere (yet).
> Letting you pass in a callable to call might help because that you could use 
> lambda.

Yep, that’s the obvious way to avoid expensive log data generation.
Would need logging module to support that use case.

Barry

> --
> https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!mrESxAj9YCHsdtNAfkNiY-Zf6U3WTIqaNrgBmbw1ELlQy51ilob43dD0ONsqvg4a94MEdOdwomgyqfyABbvRnA$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!mrESxAj9YCHsdtNAfkNiY-Zf6U3WTIqaNrgBmbw1ELlQy51ilob43dD0ONsqvg4a94MEdOdwomgyqfyABbvRnA$>
>

--
https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!mrESxAj9YCHsdtNAfkNiY-Zf6U3WTIqaNrgBmbw1ELlQy51ilob43dD0ONsqvg4a94MEdOdwomgyqfyABbvRnA$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!mrESxAj9YCHsdtNAfkNiY-Zf6U3WTIqaNrgBmbw1ELlQy51ilob43dD0ONsqvg4a94MEdOdwomgyqfyABbvRnA$>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Ref-strings in logging messages (was: Performance issue with CPython 3.10 + Cython)

2022-10-07 Thread Barry



> On 7 Oct 2022, at 18:16, MRAB  wrote:
> 
> On 2022-10-07 16:45, Skip Montanaro wrote:
>>> On Fri, Oct 7, 2022 at 9:42 AM Andreas Ames 
>>> wrote:
>>> 1. The culprit was me. As lazy as I am, I have used f-strings all over the
>>> place in calls to `logging.logger.debug()` and friends, evaluating all
>>> arguments regardless of whether the logger was enabled or not.
>>> 
>> I thought there was some discussion about whether and how to efficiently
>> admit f-strings to the logging package. I'm guessing that's not gone
>> anywhere (yet).
> Letting you pass in a callable to call might help because that you could use 
> lambda.

Yep, that’s the obvious way to avoid expensive log data generation.
Would need logging module to support that use case.

Barry

> -- 
> https://mail.python.org/mailman/listinfo/python-list
> 

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Ref-strings in logging messages (was: Performance issue with CPython 3.10 + Cython)

2022-10-07 Thread MRAB


On 2022-10-07 16:45, Skip Montanaro wrote:

On Fri, Oct 7, 2022 at 9:42 AM Andreas Ames 
wrote:


1. The culprit was me. As lazy as I am, I have used f-strings all over the
place in calls to `logging.logger.debug()` and friends, evaluating all
arguments regardless of whether the logger was enabled or not.



I thought there was some discussion about whether and how to efficiently
admit f-strings to the logging package. I'm guessing that's not gone
anywhere (yet).

Letting you pass in a callable to call might help because that you could 
use lambda.

--
https://mail.python.org/mailman/listinfo/python-list

Re: Ref-strings in logging messages (was: Performance issue with CPython 3.10 + Cython)

2022-10-07 Thread Barry



> On 7 Oct 2022, at 16:48, Skip Montanaro  wrote:
> 
> On Fri, Oct 7, 2022 at 9:42 AM Andreas Ames 
> wrote:
> 
>> 1. The culprit was me. As lazy as I am, I have used f-strings all over the
>> place in calls to `logging.logger.debug()` and friends, evaluating all
>> arguments regardless of whether the logger was enabled or not.
>> 
> 
> I thought there was some discussion about whether and how to efficiently
> admit f-strings to the logging package. I'm guessing that's not gone
> anywhere (yet).

That cannot be done as the f-string is computed before the log call.

Maybe you are thinking of the lazy expression idea for this. That idea
seems to have got no where as its not clear how to implement it without
performance issues.

Barry

> 
> Skip
> -- 
> https://mail.python.org/mailman/listinfo/python-list
> 

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Ref-strings in logging messages (was: Performance issue with CPython 3.10 + Cython)

2022-10-07 Thread Skip Montanaro

Dang autocorrect. Subject first word was supposed to be "f-strings" not
"ref-strings." Sorry about that.

S

On Fri, Oct 7, 2022, 10:45 AM Skip Montanaro 
wrote:

>
>
> On Fri, Oct 7, 2022 at 9:42 AM Andreas Ames 
> wrote:
>
>> 1. The culprit was me. As lazy as I am, I have used f-strings all over the
>> place in calls to `logging.logger.debug()` and friends, evaluating all
>> arguments regardless of whether the logger was enabled or not.
>>
>
> I thought there was some discussion about whether and how to efficiently
> admit f-strings to the logging package. I'm guessing that's not gone
> anywhere (yet).
>
> Skip
>
-- 
https://mail.python.org/mailman/listinfo/python-list

Ref-strings in logging messages (was: Performance issue with CPython 3.10 + Cython)

2022-10-07 Thread Skip Montanaro

On Fri, Oct 7, 2022 at 9:42 AM Andreas Ames 
wrote:

> 1. The culprit was me. As lazy as I am, I have used f-strings all over the
> place in calls to `logging.logger.debug()` and friends, evaluating all
> arguments regardless of whether the logger was enabled or not.
>

I thought there was some discussion about whether and how to efficiently
admit f-strings to the logging package. I'm guessing that's not gone
anywhere (yet).

Skip
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: on str.format and f-strings

2022-09-06 Thread Meredith Montgomery

Chris Angelico  writes:

> On Wed, 7 Sept 2022 at 03:52, Meredith Montgomery  
> wrote:
>>
>> It seems to me that str.format is not completely made obsolete by the
>> f-strings that appeared in Python 3.6.  But I'm not thinking that this
>> was the objective of the introduction of f-strings: the PEP at
>>
>>   https://peps.python.org/pep-0498/#id11
>>
>> says so explicitly.
>
> Precisely. It was never meant to obsolete str.format, and it does not.
>
>> My question is whether f-strings can do the
>> following nice thing with dictionaries that str.format can do:
>>
>> --8<---cut here---start->8---
>> def f():
>>   d = { "name": "Meredith", "email": "mmontgom...@levado.to" }
>>   return "The name is {name} and the email is {email}".format(**d)
>> --8<---cut here---end--->8---
>>
>> Is there a way to do this with f-strings?
>
> No. That's not their job. That's str.format's job.

Chris!  So good to see you around here again.  Thank you so much for
your input on this. 
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: on str.format and f-strings

2022-09-06 Thread Chris Angelico

On Wed, 7 Sept 2022 at 03:52, Meredith Montgomery  wrote:
>
> It seems to me that str.format is not completely made obsolete by the
> f-strings that appeared in Python 3.6.  But I'm not thinking that this
> was the objective of the introduction of f-strings: the PEP at
>
>   https://peps.python.org/pep-0498/#id11
>
> says so explicitly.

Precisely. It was never meant to obsolete str.format, and it does not.

> My question is whether f-strings can do the
> following nice thing with dictionaries that str.format can do:
>
> --8<---cut here---start->8---
> def f():
>   d = { "name": "Meredith", "email": "mmontgom...@levado.to" }
>   return "The name is {name} and the email is {email}".format(**d)
> --8<---cut here---end--->8---
>
> Is there a way to do this with f-strings?

No. That's not their job. That's str.format's job.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: on str.format and f-strings

2022-09-06 Thread Meredith Montgomery

Julio Di Egidio  writes:

> On Tuesday, 6 September 2022 at 01:03:02 UTC+2, Meredith Montgomery wrote:
>> Julio Di Egidio  writes: 
>> > On Monday, 5 September 2022 at 22:18:58 UTC+2, Meredith Montgomery wrote: 
>> >> r...@zedat.fu-berlin.de (Stefan Ram) writes: 
>> > 
>> >> > , but with the spaces removed, it's even one character 
>> >> > shorter than the format expression: 
>> >> > 
>> >> > eval('f"The name is {name} and the email is {email}"',d) 
>> >> > "The name is {name} and the email is {email}".format(**d) 
>> >> 
>> >> Lol. That's brilliant! Thanks very much! 
>> > 
>> > Calling eval for that is like shooting a fly with a cannon.
>> 
>> Indeed! But we're not looking for production-quality code. Just an 
>> extreme way to satisfy a silly requirement.
>
> Indeed, as far as programming goes, even the premise is
> totally nonsensical.  Maybe you too better go to the pub?

It surely isn't precise, but Stefan Ram caught my meaning.  It's hard to
be precise.  I wanted to avoid having to write things like d['key'].
Stefam Ram provided a solution.  I did not care whether it was something
sensible to do in Python from a serious-programming perspective.  Thank
you for thoughts anyhow.  I appreciate it.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: on str.format and f-strings

2022-09-06 Thread Meredith Montgomery

Julio Di Egidio  writes:

> On Monday, 5 September 2022 at 22:18:58 UTC+2, Meredith Montgomery wrote:
>> r...@zedat.fu-berlin.de (Stefan Ram) writes: 
>
>> > , but with the spaces removed, it's even one character 
>> > shorter than the format expression: 
>> > 
>> > eval('f"The name is {name} and the email is {email}"',d)
>> > "The name is {name} and the email is {email}".format(**d) 
>> 
>> Lol. That's brilliant! Thanks very much!
>
> Calling eval for that is like shooting a fly with a cannon.

Indeed!  But we're not looking for production-quality code.  Just an
extreme way to satisfy a silly requirement.

> Besides, this one is even shorter:
>
> f"The name is {d['name']} and the email is {d['email']}"

This doesn't quite satisfy the requeriments.  We're trying to specify
only the keys, not the dictionary.  (But maybe the requirements did not
say that explicitly.  I'd have to look it up again --- it's been
snipped.  It's not important.  Thanks much for your thoughts!)
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: on str.format and f-strings

2022-09-06 Thread Meredith Montgomery

r...@zedat.fu-berlin.de (Stefan Ram) writes:

> Meredith Montgomery  writes:
> ...
>>  d = { "name": "Meredith", "email": "mmontgom...@levado.to" }
>>  return "The name is {name} and the email is {email}".format(**d)
>>--8<---cut here---end--->8---
>>Is there a way to do this with f-strings?
>
>   I cannot think of anything shorter now than:
>
> eval( 'f"The name is {name} and the email is {email}"', d )
>
>   , but with the spaces removed, it's even one character
>   shorter than the format expression:
>
> eval('f"The name is {name} and the email is {email}"',d)
> "The name is {name} and the email is {email}".format(**d)
>
>   . 

Lol.  That's brilliant!  Thanks very much!
-- 
https://mail.python.org/mailman/listinfo/python-list

on str.format and f-strings

2022-09-06 Thread Meredith Montgomery

It seems to me that str.format is not completely made obsolete by the
f-strings that appeared in Python 3.6.  But I'm not thinking that this
was the objective of the introduction of f-strings: the PEP at 

  https://peps.python.org/pep-0498/#id11

says so explicitly.  My question is whether f-strings can do the
following nice thing with dictionaries that str.format can do:

--8<---cut here---start->8---
def f():
  d = { "name": "Meredith", "email": "mmontgom...@levado.to" }
  return "The name is {name} and the email is {email}".format(**d)
--8<---cut here---end--->8---

Is there a way to do this with f-strings?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Printing Unicode strings in a list

2022-04-30 Thread Chris Angelico

On Sun, 1 May 2022 at 00:03, Vlastimil Brom  wrote:
> (Even the redundant u prefix from your python2 sample is apparently
> accepted, maybe for compatibility reasons.)

Yes, for compatibility reasons. It wasn't accepted in Python 3.0, but
3.3 re-added it to make porting easier. It doesn't do anything.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Printing Unicode strings in a list

2022-04-30 Thread Vlastimil Brom

čt 28. 4. 2022 v 13:33 odesílatel Stephen Tucker
 napsal:
>
> Hi PythonList Members,
>
> Consider the following log from a run of IDLE:
>
> ==
>
> Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)]
> on win32
> Type "copyright", "credits" or "license()" for more information.
> >>> print (u"\u2551")
> ║
> >>> print ([u"\u2551"])
> [u'\u2551']
> >>>
>
> ==
>
> Yes, I am still using Python 2.x - I have good reasons for doing so and
> will be moving to Python 3.x in due course.
>
> I have the following questions arising from the log:
>
> 1. Why does the second print statement not produce [ ║]  or ["║"] ?
>
> 2. Should the second print statement produce [ ║]  or ["║"] ?
>
> 3. Given that I want to print a list of Unicode strings so that their
> characters are displayed (instead of their Unicode codepoint definitions),
> is there a more Pythonic way of doing it than concatenating them into a
> single string and printing that?
>
> 4. Does Python 3.x exhibit the same behaviour as Python 2.x in this respect?
>
> Thanks in anticipation.
>
> Stephen Tucker.
> --
> https://mail.python.org/mailman/listinfo/python-list

Hi,
I'm not sure, whether I am not misunderstanding the 4th question or
the answers to it (it is not clear to me, whether the focus is on
character printing or the quotation marks...);
in either case, in python3 the character glyphs are printed in these
cases, instead of the codepoint number notation, cf.:
==
Python 3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:48:03) [MSC v.1928 64 bit (
AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print ([u"\u2551"])
['║']
>>>
>>> print([u"\u2551"])
['║']
>>> print("\u2551")
║
>>> print("║")
║
>>> print(repr("\u2551"))
'║'
>>> print(ascii("\u2551"))
'\u2551'
>>>
==

(Even the redundant u prefix from your python2 sample is apparently
accepted, maybe for compatibility reasons.)

hth,
   vbr
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Printing Unicode strings in a list

2022-04-28 Thread Rob Cliffe via Python-list




On 28/04/2022 14:27, Stephen Tucker wrote:

To Cameron Simpson,

Thanks for your in-depth and helpful reply. I have noted it and will be
giving it close attention when I can.

The main reason why I am still using Python 2.x is that my colleagues are
still using a GIS system that has a Python programmer's interface - and
that interface uses Python 2.x.

The team are moving to an updated version of the system whose Python
interface is Python 3.x.

However, I am expecting to retire over the next 8 months or so, so I do not
need to be concerned with Python 3.x - my successor will be doing that.


Still, if you're feeling noble, you could start the work of making your 
code Python 3 compatible.

Best wishes
Rob Cliffe
--
https://mail.python.org/mailman/listinfo/python-list

Re: Printing Unicode strings in a list

2022-04-28 Thread Jon Ribbens via Python-list

On 2022-04-28, Stephen Tucker  wrote:
> Hi PythonList Members,
>
> Consider the following log from a run of IDLE:
>
>==
>
> Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)]
> on win32
> Type "copyright", "credits" or "license()" for more information.
>>>> print (u"\u2551")
> ║
>>>> print ([u"\u2551"])
> [u'\u2551']
>>>>
>
>==
>
> Yes, I am still using Python 2.x - I have good reasons for doing so and
> will be moving to Python 3.x in due course.
>
> I have the following questions arising from the log:
>
> 1. Why does the second print statement not produce [ ║]  or ["║"] ?

print(x) implicitly calls str(x) to convert 'x' to a string for output.
lists don't have their own str converter, so fall back to repr instead,
which outputs '[', followed by the repr of each list item separated by
', ', followed by ']'.

> 2. Should the second print statement produce [ ║]  or ["║"] ?

There's certainly no obvious reason why it *should*, and pretty decent
reasons why it shouldn't (it would be a hybrid mess of Python-syntax
repr output and raw string output).

> 3. Given that I want to print a list of Unicode strings so that their
> characters are displayed (instead of their Unicode codepoint definitions),
> is there a more Pythonic way of doing it than concatenating them into a
> single string and printing that?

print(' '.join(list_of_strings)) is probably most common. I suppose you
could do print(*list_of_strings) if you like, but I'm not sure I'd call
it "pythonic" as I've never seen anyone do that (that doesn't mean of
course that other people haven't seen it done!) Personally I only tend
to use print() for debugging output.

> 4. Does Python 3.x exhibit the same behaviour as Python 2.x in this respect?

Yes.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Printing Unicode strings in a list

2022-04-28 Thread Stephen Tucker

To Cameron Simpson,

Thanks for your in-depth and helpful reply. I have noted it and will be
giving it close attention when I can.

The main reason why I am still using Python 2.x is that my colleagues are
still using a GIS system that has a Python programmer's interface - and
that interface uses Python 2.x.

The team are moving to an updated version of the system whose Python
interface is Python 3.x.

However, I am expecting to retire over the next 8 months or so, so I do not
need to be concerned with Python 3.x - my successor will be doing that.

Stephen.


On Thu, Apr 28, 2022 at 2:07 PM Cameron Simpson  wrote:

> On 28Apr2022 12:32, Stephen Tucker  wrote:
> >Consider the following log from a run of IDLE:
> >==
> >
> >Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)]
> >on win32
> >Type "copyright", "credits" or "license()" for more information.
> >>>> print (u"\u2551")
> >║
> >>>> print ([u"\u2551"])
> >[u'\u2551']
> >>>>
> >==
> >
> >Yes, I am still using Python 2.x - I have good reasons for doing so and
> >will be moving to Python 3.x in due course.
>
> Love to hear those reasons. Not suggesting that they are invalid.
>
> >I have the following questions arising from the log:
> >1. Why does the second print statement not produce [ ║]  or ["║"] ?
>
> Because print() prints the str() or each of its arguments, and str() of
> a list if the same as its repr(), which is a list of the repr()s of
> every item in the list. Repr of a Unicode string looks like what you
> have in Python 2.
>
> >2. Should the second print statement produce [ ║]  or ["║"] ?
>
> Well, to me its behaviour is correct. Do you _want_ to get your Unicode
> glyph? in quotes? That is up to you. But consider: what would be sane
> output if the list contained the string "], [3," ?
>
> >3. Given that I want to print a list of Unicode strings so that their
> >characters are displayed (instead of their Unicode codepoint definitions),
> >is there a more Pythonic way of doing it than concatenating them into a
> >single string and printing that?
>
> You could print them with empty separators:
>
> print(s1, s2, .., sep='')
>
> To do that in Python 2 you need to:
>
> from __future__ import print_function
>
> at the top of your Python file. Then you've have a Python 3 string print
> function. In Python 2, pint is normally a statement and you don't need
> the brackets:
>
> print u"\u2551"
>
> but print() is genuinely better as a function anyway.
>
> >4. Does Python 3.x exhibit the same behaviour as Python 2.x in this
> respect?
>
> Broadly yes, except that all strings are Unicode strings and we don't
> bothing with the leading "u" prefix.
>
> Cheers,
> Cameron Simpson 
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Printing Unicode strings in a list

2022-04-28 Thread Cameron Simpson

On 28Apr2022 12:32, Stephen Tucker  wrote:
>Consider the following log from a run of IDLE:
>==
>
>Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)]
>on win32
>Type "copyright", "credits" or "license()" for more information.
>>>> print (u"\u2551")
>║
>>>> print ([u"\u2551"])
>[u'\u2551']
>>>>
>==
>
>Yes, I am still using Python 2.x - I have good reasons for doing so and
>will be moving to Python 3.x in due course.

Love to hear those reasons. Not suggesting that they are invalid.

>I have the following questions arising from the log:
>1. Why does the second print statement not produce [ ║]  or ["║"] ?

Because print() prints the str() or each of its arguments, and str() of 
a list if the same as its repr(), which is a list of the repr()s of 
every item in the list. Repr of a Unicode string looks like what you 
have in Python 2.

>2. Should the second print statement produce [ ║]  or ["║"] ?

Well, to me its behaviour is correct. Do you _want_ to get your Unicode 
glyph? in quotes? That is up to you. But consider: what would be sane 
output if the list contained the string "], [3," ?

>3. Given that I want to print a list of Unicode strings so that their
>characters are displayed (instead of their Unicode codepoint definitions),
>is there a more Pythonic way of doing it than concatenating them into a
>single string and printing that?

You could print them with empty separators:

print(s1, s2, .., sep='')

To do that in Python 2 you need to:

from __future__ import print_function

at the top of your Python file. Then you've have a Python 3 string print 
function. In Python 2, pint is normally a statement and you don't need 
the brackets:

print u"\u2551"

but print() is genuinely better as a function anyway.

>4. Does Python 3.x exhibit the same behaviour as Python 2.x in this respect?

Broadly yes, except that all strings are Unicode strings and we don't 
bothing with the leading "u" prefix.

Cheers,
Cameron Simpson 
-- 
https://mail.python.org/mailman/listinfo/python-list

Printing Unicode strings in a list

2022-04-28 Thread Stephen Tucker

Hi PythonList Members,

Consider the following log from a run of IDLE:

==

Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)]
on win32
Type "copyright", "credits" or "license()" for more information.
>>> print (u"\u2551")
║
>>> print ([u"\u2551"])
[u'\u2551']
>>>

==

Yes, I am still using Python 2.x - I have good reasons for doing so and
will be moving to Python 3.x in due course.

I have the following questions arising from the log:

1. Why does the second print statement not produce [ ║]  or ["║"] ?

2. Should the second print statement produce [ ║]  or ["║"] ?

3. Given that I want to print a list of Unicode strings so that their
characters are displayed (instead of their Unicode codepoint definitions),
is there a more Pythonic way of doing it than concatenating them into a
single string and printing that?

4. Does Python 3.x exhibit the same behaviour as Python 2.x in this respect?

Thanks in anticipation.

Stephen Tucker.
-- 
https://mail.python.org/mailman/listinfo/python-list

[issue221207] % operator on strings not documented

2022-04-10 Thread admin



Change by admin :


--
github: None -> 33443

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue456420] no methods for lists, strings etc.

2022-04-10 Thread admin



Change by admin :


--
github: None -> 35072

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue426740] pydoc: check bases for doc strings?

2022-04-10 Thread admin



Change by admin :


--
github: None -> 34536

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue431848] mathmodule.c: doc strings & conversion

2022-04-10 Thread admin



Change by admin :


--
github: None -> 34601

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue217004] Tools/compiler does not create doc strings

2022-04-10 Thread admin



Change by admin :


___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue210661] cgi parsing of query strings (PR#356)

2022-04-10 Thread admin



Change by admin :


___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue401309] Specialization of dictionaries for strings.

2022-04-10 Thread admin



Change by admin :


___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue401299] add missing test for strings in file_writelines

2022-04-10 Thread admin



Change by admin :


___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue505375] Make doc strings optional

2022-04-10 Thread admin



Change by admin :


--
github: None -> 35941

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue502503] pickle interns strings

2022-04-10 Thread admin



Change by admin :


--
github: None -> 35908

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue495601] Documentation strings should end with a

2022-04-10 Thread admin



Change by admin :


--
github: None -> 35802

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue400912] modified \x behaviour for unicode strings

2022-04-10 Thread admin



Change by admin :


___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue462849] Printing arbitrary Unicode strings

2022-04-10 Thread admin



Change by admin :


--
github: None -> 35204

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue463093] File methods need doc strings

2022-04-10 Thread admin



Change by admin :


--
github: None -> 35207

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue421214] splitlist() for raw and unicode strings

2022-04-10 Thread admin



Change by admin :


--
github: None -> 34455

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue424335] richcompare for strings

2022-04-10 Thread admin



Change by admin :


--
github: None -> 34507

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue410465] Allow pre-encoded strings as filenames

2022-04-10 Thread admin



Change by admin :


--
github: None -> 34213

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue410336] pydoc patch for strings & lists

2022-04-10 Thread admin



Change by admin :


--
github: None -> 34210

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue407538] pickling fails on Unicode strings with newlines

2022-04-10 Thread admin



Change by admin :


--
github: None -> 34130

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue403349] Allow pickle.py to be using with Jython unicode strings

2022-04-10 Thread admin



Change by admin :


--
github: None -> 33771

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue403636] Fix for null byte strings in SSL support of socketmodule.c

2022-04-10 Thread admin



Change by admin :


--
github: None -> 33857

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 7299 matches

Mail list logo