Re: [Python-Dev] (no subject)

2017-12-26 Thread Franklin? Lee
On Tue, Dec 26, 2017 at 2:01 AM, Yogev Hendel  wrote:
>
> I don't know if this is the right place to put this,
> but I've found the following lines of code results in an incredibly long
> processing time.
> Perhaps it might be of use to someone.
>
> import re
> pat = re.compile('^/?(?:\\w+)/(?:[%\\w-]+/?)+/?$')
> pat.match('/t/a-aa-aa-a-aa-aa-aa-aa-aa-aa./')

(I think the correct place is python-list. python-dev is primarily for
the developers of Python itself. python-ideas is for proposing new
features and changes to the language. python-list is for general
discussion. Bug reports and feature requests belong in
https://bugs.python.org/ (where your post could also have gone).)

The textbook regular expression algorithm (which I believe grep uses)
runs in linear time with respect to the text length. The algorithm
used by Perl, Java, Python, JavaScript, Ruby, and many other languages
instead use a backtracking algorithm, which can run up to exponential
time with respect to text length. This worst-case is in fact necessary
(assuming P != NP): Perl allows (introduced?) backreferences, which
are NP-hard[1]. Perl also added some other features which complicate
things, but backreferences are enough.

The user-level solution is to understand how regexes are executed, and
to work around it.

Here are library-level solutions for your example:
1. Perl now has a regex optimizer, which will eliminate some
redundancies. Something similar can be added to Python, at first as a
third-party library.
2. In theory, we can use the textbook algorithm when possible, and the
backtracking algorithm when necessary. However, the textbook version
won't necessarily be faster, and may take more time to create, so
there's a tradeoff here.
3. To go even further, I believe it's possible to use the textbook
algorithm for subexpressions, while the overall expression uses
backtracking, internally iterating through the matches of the textbook
algorithm.

There's a series of articles by Russ Cox that try to get us back to
the textbook (see [2]). He and others implemented the ideas in the C++
library RE2[3], which has Python bindings[4]. RE2 was made for and
used on Google Code Search[5] (described in his articles), a (now
discontinued) search engine for open-source repos which allowed
regular expressions in the queries.

You can get a whiff of the limitations of the textbook algorithm by
checking out RE2's syntax[6] and seeing what features aren't
supported, though some features may be unsupported for different
reasons (such as being redundant syntax).
- Backreferences and lookaround assertions don't have a known solution.[7]
- Bounded repetition is only supported up to a limit (1000), because
each possible repetition needs its own set of states.
- Possessive quantifiers aren't supported. Greedy and reluctant quantifiers are.
- Groups and named groups _are_ supported. See the second and third
Russ Cox articles, with the term "submatch".[2]

(Apologies: I am making up reference syntax on-the-fly.)
[1] "Perl Regular Expression Matching is NP-Hard"
https://perl.plover.com/NPC/
[2] "Regular Expression Matching Can Be Simple And Fast"
https://swtch.com/~rsc/regexp/regexp1.html
"Regular Expression Matching: the Virtual Machine Approach"
https://swtch.com/~rsc/regexp/regexp2.html
"Regular Expression Matching in the Wild"
https://swtch.com/~rsc/regexp/regexp3.html
"Regular Expression Matching with a Trigram Index"
https://swtch.com/~rsc/regexp/regexp4.html
[3] RE2: https://github.com/google/re2
[4] pyre2: https://github.com/facebook/pyre2/
Also see re2 and re3 on PyPI, which intend to be a drop-in
replacement. re3 is a Py3-compatible fork of re2, which last updated
in 2015.
[5] https://en.wikipedia.org/wiki/Google_Code_Search
[6] https://github.com/google/re2/wiki/Syntax
[7] Quote: "As a matter of principle, RE2 does not support constructs
for which only backtracking solutions are known to exist. Thus,
backreferences and look-around assertions are not supported."
https://github.com/google/re2/wiki/WhyRE2
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is static typing still optional?

2017-12-26 Thread Ned Batchelder

On 12/26/17 1:49 PM, Chris Barker wrote:
On Sat, Dec 23, 2017 at 5:54 PM, Nick Coghlan > wrote:



I still wonder about the "fields *must* be annotated" constraint
though. I can understand a constraint that the style be
*consistent* (i.e. all fields as annotations, or all fields as
field instances), since that's needed to determine the field
order, but I don't see the problem with the "no annotations" style
otherwise.


IIUC, without annotations, there is no way to set a field with no default.

And supporting both approaches violates "only one way to do it" in, I 
think, a confusing manner -- particularly if you can't mix and match them.


Also, could does using class attributes without annotations make a 
mess when subclassing? -- no I haven't thought that out yet.





I have not been following the design of dataclasses, and maybe I'm 
misunderstanding the state of the work.  My impression is that attrs was 
a thing, and lots of people loved it, so we wanted something like it in 
the stdlib.  Data Classes is that thing, but it is a new thing being 
designed from scratch.  There are still big questions about how it 
should work, but it is already a part of 3.7.


Often when people propose new things, we say, "Put it on PyPI first, and 
let's see how people like it."  Why isn't that the path for Data 
Classes?  Why are they already part of 3.7 when we have no practical 
experience with them?  Wouldn't it be better to let the design mature 
with real experience?  Especially since some of the questions being 
asked are about how it interrelates with another large new feature with 
little practical use yet (typing)?


--Ned.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is static typing still optional?

2017-12-26 Thread Eric V. Smith

On 12/21/2017 6:36 AM, Ivan Levkivskyi wrote:
On 21 December 2017 at 11:22, Terry Reedy > wrote:


On 12/21/2017 4:22 AM, Eric V. Smith wrote:

On 12/21/2017 1:46 AM, Chris Barker wrote:


I suggest that it be clear in the docs, and ideally in the
PEP, that the dataclass decorator is using the *annotation"
syntax, and that the the only relevant part it uses is that
an annotation exists, but the value of the annotation is
essentially (completely?) ignored.


I think the PEP is very clear about this: "The dataclass
decorator examines the class to find fields. A field is defined
as any variable identified in __annotations__. That is, a
variable that has a type annotation. With two exceptions
described below, none of the Data Class machinery examines the
type specified in the annotation."


This seems clear enough.  It could come after describing what a
dataclass *is*.

I agree the docs should also be clear about this.



So we should have examples like:

@dataclass
class C:
  a: ...  # field with no default
  b: ... = 0 # filed with a default value

Then maybe:

@dataclass
class C:
  a: "the a parameter" # field with no default
  b: "another, different parameter" = 0.0 # field with a
default

Then the docs can go to say that if the user wants to
specify a type for use with a static type checking
pre-processor, they can do it like so:

@dataclass
class C:
  a: int # integer field with no default
  b: float = 0.0 # float field with a default

And the types will be recognized by type checkers such as mypy.

And I think the non-typed examples should go first in the docs.


Module some bike-shedding, the above seems pretty good to me.


For me, the three options for "don't care" have a bit different meaning:

* typing.Any: this class is supposed to be used with static type 
checkers, but this field is too dynamic
* ... (ellipsis): this class may or may not be used with static type 
checkers, use the inferred type in the latter case

* "field docstring": this class should not be used with static type checkers

Assuming this, the second option would be the "real" "don't care". If 
this makes sense,
then we can go the way proposed in 
https://github.com/python/typing/issues/276 and make ellipsis semantics 
"official" in PEP 484.

(pending Guido's approval)


In 
https://github.com/ericvsmith/dataclasses/issues/2#issuecomment-353918024, 
Guido has suggested using `object`, which has the benefit of not needing 
an import. And to me, it communicates the "don't care" aspect well 
enough. I do understand the difference if you're using a type checker 
(see for example 
https://stackoverflow.com/questions/39817081/typing-any-vs-object), but 
if you care about that, use typing.Any.


Eric.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is static typing still optional?

2017-12-26 Thread Ivan Levkivskyi
On 22 December 2017 at 20:55, Brett Cannon  wrote:

>
>
> On Fri, Dec 22, 2017, 11:38 Chris Barker,  wrote:
>
>> On Fri, Dec 22, 2017 at 8:49 AM, Brett Cannon  wrote:
>>
>>> I  think it's worth reminding people that if they don't like the fact
 dataclasses (ab)use type hints for their succinct syntax that you can
 always use attrs instead to avoid type hints.

>>>
>> sure -- but this doesn't really address the issue, the whole reason this
>> is even a discussion is because dataclasses is going into the standard
>> library. Third party packages can do whatever they want, of course.
>>
>> And the concern is that people (in particular newbies) will get confused
>> / the wrong impression / other-negative-response by the (semi) use of
>> typing in a standard library module.
>>
>
> I'm still not worried. Type hints are part of the syntax and so are no
> worse off than async/await and asyncio IMO.
>
>
>>
>>> As for those who feel dataclasses will force them to teach type hints
>>> and they simply don't want to, maybe we could help land protocols
>>>
>>
>> Could you please clarify what this is about ???
>>
>
> There's a PEP by Ivan (on my phone else I would look up the number).
>
>
If anyone is curious this is PEP 544. It is actually already fully
supported by mypy, so that one can play with it
(you will need to also install typing_extensions, where Protocol class
lives until the PEP is approved).

--
Ivan
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Supporting functools.singledispatch with classes.

2017-12-26 Thread Ivan Levkivskyi
On 26 December 2017 at 01:41, Nick Coghlan  wrote:

> On 25 December 2017 at 12:32, Ethan Smith  wrote:
> > So at the moment, I don't think it is possible to implement
> singledispatch
> > on classmethod or staticmethod decorated functions.
>
> I've posted this to the PR, but adding it here as well: I think this
> is a situation very similar to the case with functools.partialmethod,
> where you're going to need to write a separate
> functools.singledispatchmethod class that's aware of the descriptor
> protocol, rather than trying to add the functionality directly to
> functools.singledispatch.
>

I agree with Nick here. Adding a separate decorator looks like the right
approach,
especially taking into account the precedent of @partialmethod.

--
Ivan
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] (no subject)

2017-12-26 Thread MRAB

On 2017-12-26 07:01, Yogev Hendel wrote:


I don't know if this is the right place to put this,
but I've found the following lines of code results in an incredibly long 
processing time.

Perhaps it might be of use to someone.

/import re/
/pat = re.compile('^/?(?:\\w+)/(?:[%\\w-]+/?)+/?$')/
/pat.match('/t/a-aa-aa-a-aa-aa-aa-aa-aa-aa./')/

The pattern has a repeated repeat, which results in catastrophic 
backtracking.


As an example, think about how the pattern (?:a+)+b would try to match 
the string 'aaac'.


Match 'aaa', but not 'c'.

Match 'aa' and 'a', but not 'c'.

Match 'a' and 'aa', but not 'c'.

Match 'a' and 'a' and 'a', but not 'c'.

That's 4 failed attempts.

Now try match the string 'c'.

Match '', but not 'c'.

Match 'aaa' and 'a', but not 'c'.

Match 'aa' and 'aa', but not 'c'.

Match 'aa' and 'a a', but not 'c'.

Match 'a' and 'aaa', but not 'c'.

Match 'a' and 'aa' and 'a', but not 'c'.

Match 'a' and 'a aa', but not 'c'.

Match 'a' and 'a a' and 'a', but not 'c'.

That's 8 failed attempts.

Each additional 'a' in the string to match will double the number of 
attempts.


Your pattern has (?:[%\w-]+/?)+, and the '/' is optional. The string has 
a '.', which the pattern can't match, but it'll keep trying until it 
finally fails.


If you add just 1 more 'a' or '-' to the string, it'll take twice as 
long as it does now.


You need to think more carefully about how the pattern matches and what 
it'll do when it doesn't match.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is static typing still optional?

2017-12-26 Thread Chris Barker
On Sat, Dec 23, 2017 at 5:54 PM, Nick Coghlan  wrote:

>
> I still wonder about the "fields *must* be annotated" constraint though. I
> can understand a constraint that the style be *consistent* (i.e. all fields
> as annotations, or all fields as field instances), since that's needed to
> determine the field order, but I don't see the problem with the "no
> annotations" style otherwise.
>

IIUC, without annotations, there is no way to set a field with no default.

And supporting both approaches violates "only one way to do it" in, I
think, a confusing manner -- particularly if you can't mix and match them.

Also, could does using class attributes without annotations make a mess
when subclassing? -- no I haven't thought that out yet.

-CHB








>
> Cheers,
> Nick.
>
>
>
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Heap allocate type structs in native extension modules?

2017-12-26 Thread Benjamin Peterson
I imagine Cython already takes care of this?


On Tue, Dec 26, 2017, at 02:16, Hugh Fisher wrote:
> I have a Python program which generates the boilerplate code for
> native extension modules from a Python source definition.
> (http://bitbucket.org/hugh_fisher/fullofeels if interested.)
>
> The examples in the Python doco and the "Python Essential Reference"
> book all use a statically declared PyTypeObject struct and
> PyType_Ready in the module init func, so I'm doing the same. Then
> Python 3.5 added a check for statically allocated types inheriting
> from heap types, which broke a couple of my classes. And now I'm
> trying to add a __dict__ to native classes so end users can add their> own 
> attributes, and this is turning out to be painful with static
> PyTypeObject structs
>
> Would it be better to use dynamically allocated type structs in native
> modules?>
> --
>
> cheers,
> Hugh Fisher
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/benjamin%40python.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Heap allocate type structs in native extension modules?

2017-12-26 Thread Hugh Fisher
I have a Python program which generates the boilerplate code for
native extension modules from a Python source definition.
(http://bitbucket.org/hugh_fisher/fullofeels if interested.)

The examples in the Python doco and the "Python Essential Reference"
book all use a statically declared PyTypeObject struct and
PyType_Ready in the module init func, so I'm doing the same. Then
Python 3.5 added a check for statically allocated types inheriting
from heap types, which broke a couple of my classes. And now I'm
trying to add a __dict__ to native classes so end users can add their
own attributes, and this is turning out to be painful with static
PyTypeObject structs

Would it be better to use dynamically allocated type structs in native modules?

-- 

cheers,
Hugh Fisher
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com