Re: [Python-Dev] (no subject)
On Tue, Dec 26, 2017 at 2:01 AM, Yogev Hendelwrote: > > I don't know if this is the right place to put this, > but I've found the following lines of code results in an incredibly long > processing time. > Perhaps it might be of use to someone. > > import re > pat = re.compile('^/?(?:\\w+)/(?:[%\\w-]+/?)+/?$') > pat.match('/t/a-aa-aa-a-aa-aa-aa-aa-aa-aa./') (I think the correct place is python-list. python-dev is primarily for the developers of Python itself. python-ideas is for proposing new features and changes to the language. python-list is for general discussion. Bug reports and feature requests belong in https://bugs.python.org/ (where your post could also have gone).) The textbook regular expression algorithm (which I believe grep uses) runs in linear time with respect to the text length. The algorithm used by Perl, Java, Python, JavaScript, Ruby, and many other languages instead use a backtracking algorithm, which can run up to exponential time with respect to text length. This worst-case is in fact necessary (assuming P != NP): Perl allows (introduced?) backreferences, which are NP-hard[1]. Perl also added some other features which complicate things, but backreferences are enough. The user-level solution is to understand how regexes are executed, and to work around it. Here are library-level solutions for your example: 1. Perl now has a regex optimizer, which will eliminate some redundancies. Something similar can be added to Python, at first as a third-party library. 2. In theory, we can use the textbook algorithm when possible, and the backtracking algorithm when necessary. However, the textbook version won't necessarily be faster, and may take more time to create, so there's a tradeoff here. 3. To go even further, I believe it's possible to use the textbook algorithm for subexpressions, while the overall expression uses backtracking, internally iterating through the matches of the textbook algorithm. There's a series of articles by Russ Cox that try to get us back to the textbook (see [2]). He and others implemented the ideas in the C++ library RE2[3], which has Python bindings[4]. RE2 was made for and used on Google Code Search[5] (described in his articles), a (now discontinued) search engine for open-source repos which allowed regular expressions in the queries. You can get a whiff of the limitations of the textbook algorithm by checking out RE2's syntax[6] and seeing what features aren't supported, though some features may be unsupported for different reasons (such as being redundant syntax). - Backreferences and lookaround assertions don't have a known solution.[7] - Bounded repetition is only supported up to a limit (1000), because each possible repetition needs its own set of states. - Possessive quantifiers aren't supported. Greedy and reluctant quantifiers are. - Groups and named groups _are_ supported. See the second and third Russ Cox articles, with the term "submatch".[2] (Apologies: I am making up reference syntax on-the-fly.) [1] "Perl Regular Expression Matching is NP-Hard" https://perl.plover.com/NPC/ [2] "Regular Expression Matching Can Be Simple And Fast" https://swtch.com/~rsc/regexp/regexp1.html "Regular Expression Matching: the Virtual Machine Approach" https://swtch.com/~rsc/regexp/regexp2.html "Regular Expression Matching in the Wild" https://swtch.com/~rsc/regexp/regexp3.html "Regular Expression Matching with a Trigram Index" https://swtch.com/~rsc/regexp/regexp4.html [3] RE2: https://github.com/google/re2 [4] pyre2: https://github.com/facebook/pyre2/ Also see re2 and re3 on PyPI, which intend to be a drop-in replacement. re3 is a Py3-compatible fork of re2, which last updated in 2015. [5] https://en.wikipedia.org/wiki/Google_Code_Search [6] https://github.com/google/re2/wiki/Syntax [7] Quote: "As a matter of principle, RE2 does not support constructs for which only backtracking solutions are known to exist. Thus, backreferences and look-around assertions are not supported." https://github.com/google/re2/wiki/WhyRE2 ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Is static typing still optional?
On 12/26/17 1:49 PM, Chris Barker wrote: On Sat, Dec 23, 2017 at 5:54 PM, Nick Coghlan> wrote: I still wonder about the "fields *must* be annotated" constraint though. I can understand a constraint that the style be *consistent* (i.e. all fields as annotations, or all fields as field instances), since that's needed to determine the field order, but I don't see the problem with the "no annotations" style otherwise. IIUC, without annotations, there is no way to set a field with no default. And supporting both approaches violates "only one way to do it" in, I think, a confusing manner -- particularly if you can't mix and match them. Also, could does using class attributes without annotations make a mess when subclassing? -- no I haven't thought that out yet. I have not been following the design of dataclasses, and maybe I'm misunderstanding the state of the work. My impression is that attrs was a thing, and lots of people loved it, so we wanted something like it in the stdlib. Data Classes is that thing, but it is a new thing being designed from scratch. There are still big questions about how it should work, but it is already a part of 3.7. Often when people propose new things, we say, "Put it on PyPI first, and let's see how people like it." Why isn't that the path for Data Classes? Why are they already part of 3.7 when we have no practical experience with them? Wouldn't it be better to let the design mature with real experience? Especially since some of the questions being asked are about how it interrelates with another large new feature with little practical use yet (typing)? --Ned. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Is static typing still optional?
On 12/21/2017 6:36 AM, Ivan Levkivskyi wrote: On 21 December 2017 at 11:22, Terry Reedy> wrote: On 12/21/2017 4:22 AM, Eric V. Smith wrote: On 12/21/2017 1:46 AM, Chris Barker wrote: I suggest that it be clear in the docs, and ideally in the PEP, that the dataclass decorator is using the *annotation" syntax, and that the the only relevant part it uses is that an annotation exists, but the value of the annotation is essentially (completely?) ignored. I think the PEP is very clear about this: "The dataclass decorator examines the class to find fields. A field is defined as any variable identified in __annotations__. That is, a variable that has a type annotation. With two exceptions described below, none of the Data Class machinery examines the type specified in the annotation." This seems clear enough. It could come after describing what a dataclass *is*. I agree the docs should also be clear about this. So we should have examples like: @dataclass class C: a: ... # field with no default b: ... = 0 # filed with a default value Then maybe: @dataclass class C: a: "the a parameter" # field with no default b: "another, different parameter" = 0.0 # field with a default Then the docs can go to say that if the user wants to specify a type for use with a static type checking pre-processor, they can do it like so: @dataclass class C: a: int # integer field with no default b: float = 0.0 # float field with a default And the types will be recognized by type checkers such as mypy. And I think the non-typed examples should go first in the docs. Module some bike-shedding, the above seems pretty good to me. For me, the three options for "don't care" have a bit different meaning: * typing.Any: this class is supposed to be used with static type checkers, but this field is too dynamic * ... (ellipsis): this class may or may not be used with static type checkers, use the inferred type in the latter case * "field docstring": this class should not be used with static type checkers Assuming this, the second option would be the "real" "don't care". If this makes sense, then we can go the way proposed in https://github.com/python/typing/issues/276 and make ellipsis semantics "official" in PEP 484. (pending Guido's approval) In https://github.com/ericvsmith/dataclasses/issues/2#issuecomment-353918024, Guido has suggested using `object`, which has the benefit of not needing an import. And to me, it communicates the "don't care" aspect well enough. I do understand the difference if you're using a type checker (see for example https://stackoverflow.com/questions/39817081/typing-any-vs-object), but if you care about that, use typing.Any. Eric. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Is static typing still optional?
On 22 December 2017 at 20:55, Brett Cannonwrote: > > > On Fri, Dec 22, 2017, 11:38 Chris Barker, wrote: > >> On Fri, Dec 22, 2017 at 8:49 AM, Brett Cannon wrote: >> >>> I think it's worth reminding people that if they don't like the fact dataclasses (ab)use type hints for their succinct syntax that you can always use attrs instead to avoid type hints. >>> >> sure -- but this doesn't really address the issue, the whole reason this >> is even a discussion is because dataclasses is going into the standard >> library. Third party packages can do whatever they want, of course. >> >> And the concern is that people (in particular newbies) will get confused >> / the wrong impression / other-negative-response by the (semi) use of >> typing in a standard library module. >> > > I'm still not worried. Type hints are part of the syntax and so are no > worse off than async/await and asyncio IMO. > > >> >>> As for those who feel dataclasses will force them to teach type hints >>> and they simply don't want to, maybe we could help land protocols >>> >> >> Could you please clarify what this is about ??? >> > > There's a PEP by Ivan (on my phone else I would look up the number). > > If anyone is curious this is PEP 544. It is actually already fully supported by mypy, so that one can play with it (you will need to also install typing_extensions, where Protocol class lives until the PEP is approved). -- Ivan ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Supporting functools.singledispatch with classes.
On 26 December 2017 at 01:41, Nick Coghlanwrote: > On 25 December 2017 at 12:32, Ethan Smith wrote: > > So at the moment, I don't think it is possible to implement > singledispatch > > on classmethod or staticmethod decorated functions. > > I've posted this to the PR, but adding it here as well: I think this > is a situation very similar to the case with functools.partialmethod, > where you're going to need to write a separate > functools.singledispatchmethod class that's aware of the descriptor > protocol, rather than trying to add the functionality directly to > functools.singledispatch. > I agree with Nick here. Adding a separate decorator looks like the right approach, especially taking into account the precedent of @partialmethod. -- Ivan ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (no subject)
On 2017-12-26 07:01, Yogev Hendel wrote: I don't know if this is the right place to put this, but I've found the following lines of code results in an incredibly long processing time. Perhaps it might be of use to someone. /import re/ /pat = re.compile('^/?(?:\\w+)/(?:[%\\w-]+/?)+/?$')/ /pat.match('/t/a-aa-aa-a-aa-aa-aa-aa-aa-aa./')/ The pattern has a repeated repeat, which results in catastrophic backtracking. As an example, think about how the pattern (?:a+)+b would try to match the string 'aaac'. Match 'aaa', but not 'c'. Match 'aa' and 'a', but not 'c'. Match 'a' and 'aa', but not 'c'. Match 'a' and 'a' and 'a', but not 'c'. That's 4 failed attempts. Now try match the string 'c'. Match '', but not 'c'. Match 'aaa' and 'a', but not 'c'. Match 'aa' and 'aa', but not 'c'. Match 'aa' and 'a a', but not 'c'. Match 'a' and 'aaa', but not 'c'. Match 'a' and 'aa' and 'a', but not 'c'. Match 'a' and 'a aa', but not 'c'. Match 'a' and 'a a' and 'a', but not 'c'. That's 8 failed attempts. Each additional 'a' in the string to match will double the number of attempts. Your pattern has (?:[%\w-]+/?)+, and the '/' is optional. The string has a '.', which the pattern can't match, but it'll keep trying until it finally fails. If you add just 1 more 'a' or '-' to the string, it'll take twice as long as it does now. You need to think more carefully about how the pattern matches and what it'll do when it doesn't match. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Is static typing still optional?
On Sat, Dec 23, 2017 at 5:54 PM, Nick Coghlanwrote: > > I still wonder about the "fields *must* be annotated" constraint though. I > can understand a constraint that the style be *consistent* (i.e. all fields > as annotations, or all fields as field instances), since that's needed to > determine the field order, but I don't see the problem with the "no > annotations" style otherwise. > IIUC, without annotations, there is no way to set a field with no default. And supporting both approaches violates "only one way to do it" in, I think, a confusing manner -- particularly if you can't mix and match them. Also, could does using class attributes without annotations make a mess when subclassing? -- no I haven't thought that out yet. -CHB > > Cheers, > Nick. > > > > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Heap allocate type structs in native extension modules?
I imagine Cython already takes care of this? On Tue, Dec 26, 2017, at 02:16, Hugh Fisher wrote: > I have a Python program which generates the boilerplate code for > native extension modules from a Python source definition. > (http://bitbucket.org/hugh_fisher/fullofeels if interested.) > > The examples in the Python doco and the "Python Essential Reference" > book all use a statically declared PyTypeObject struct and > PyType_Ready in the module init func, so I'm doing the same. Then > Python 3.5 added a check for statically allocated types inheriting > from heap types, which broke a couple of my classes. And now I'm > trying to add a __dict__ to native classes so end users can add their> own > attributes, and this is turning out to be painful with static > PyTypeObject structs > > Would it be better to use dynamically allocated type structs in native > modules?> > -- > > cheers, > Hugh Fisher > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/benjamin%40python.org ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Heap allocate type structs in native extension modules?
I have a Python program which generates the boilerplate code for native extension modules from a Python source definition. (http://bitbucket.org/hugh_fisher/fullofeels if interested.) The examples in the Python doco and the "Python Essential Reference" book all use a statically declared PyTypeObject struct and PyType_Ready in the module init func, so I'm doing the same. Then Python 3.5 added a check for statically allocated types inheriting from heap types, which broke a couple of my classes. And now I'm trying to add a __dict__ to native classes so end users can add their own attributes, and this is turning out to be painful with static PyTypeObject structs Would it be better to use dynamically allocated type structs in native modules? -- cheers, Hugh Fisher ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com