Re: [Python-Dev] re performance

2017-01-26 Thread Vlastimil Brom
2017-01-26 22:13 GMT+01:00 Sven R. Kunze :
> Hi folks,
>
> I recently refreshed regular expressions theoretical basics *indulging in
> reminiscences* So, I read https://swtch.com/~rsc/regexp/regexp1.html
>
> However, reaching the chart in the lower third of the article, I saw Python
> 2.4 measured against a naive Thompson matching implementation. And I was
> surprised about how bad it performed compared to an unoptimized version of
> an older than dirt algorithm.
>
> So, I decided to give it a try with Python2.7 and Python3.5. Eh, voilà, 100%
> cpu and no results so far. Nothing has changed at all since 2007.
>
 import re
 re.match(r'a?'*30 + r'a'*30, 'a'*30)
>  (still waiting)
>
> Quoting from the article: " Some might argue that this test is unfair to the
> backtracking implementations, since it focuses on an uncommon corner case.
> This argument misses the point: given a choice between an implementation
> with a predictable, consistent, fast running time on all inputs or one that
> usually runs quickly but can take years of CPU time (or more) on some
> inputs, the decision should be easy."
>
> Victor, as the head of Python performance department, and Matthew, as the
> maintainer of the new regex module, what is your stance on this particular
> issue?
>
> From my perspective, I can say, that regular expressions might worth
> optimizing especially for web applications (url matching usually uses
> regexes) but also for other applications where I've seen many tight loops
> using regexes as well. So, I am probing interest on this topic here.
>
> Best,
> Sven
>

Hi,
I can't speak about the details of mrab's implementation, but using
regex, I get the resulting match instantly:

Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 07:18:10) [MSC v.1900
32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import regex
>>> regex.match(r'a?'*30 + r'a'*30, 'a'*30)

>>>

(I personally prefer to use regex for other advantages, than this
artificial case, but it certainly doesn't hurt to have better
performance here too:)

regards,
vbr
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] collections.Counter __add__ implementation quirk

2015-11-23 Thread Vlastimil Brom
2015-11-23 7:21 GMT+01:00 Alexander Walters :
> collections.Counter.__add__ as a bit of a quirk.
>
> Counters allow for negative numbers.  You can subtract from a counter into
> the negative no problem.  However, if you have a counter with a negative
> value and add it to another counter, and if that value, after addition,
> would still be negative... that value is not included in the resulting
> Counter object.  This is kind of weird, to the point of thinking I had a bug
> in other code for several hours until I went and checked how Counters are
> implemented.
>
> Is there any particular reason counters drop negative values when you add
> them together?  I definitely expected them to act like ints do when you add
> negatives, and had to subclass it to get what I think is the obvious
> behavior.
> ___
> Python-Dev mailing list
...
Hi,
this is probably more appropriate for the general python list rathere
then this developers' maillist, however, as I asked a similar question
some time ago, I got some detailed explanations for the the current
design decissions from the original developer; cf.:
https://mail.python.org/pipermail/python-list/2010-March/570618.html

(I didn't check possible changes in Counter since that version (3.1 at
that time).)

hth,
  vbr
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-12-07 Thread Vlastimil Brom
2010/12/7 Alexander Belopolsky :
> On Tue, Dec 7, 2010 at 8:02 AM, Vlastimil Brom  
> wrote:
> ..
>> Do you know of any re engine fully complying to to tr18, even at the
>> first level: "Basic Unicode Support"?
>>
> """
> ICU Regular Expressions conform to Unicode Technical Standard #18 ,
> Unicode Regular Expressions, level 1, and in addition include Default
> Word boundaries and Name Properties from level 2.
> """ http://userguide.icu-project.org/strings/regexp
>

Thanks for the pointer, I wasn't aware of that project.
Anyway I am quite happy with the mentioned regex library and can live
with trading this full compliance for some non-unicode goodies (like
unbounded lookbehinds etc.), but I see, it's beyond the point here.
Not that my opinion matters, but I can't think of, say, "union,
intersection and set-difference of Unicode sets" as a basic feature or
consider it a part of "a minimal level for useful Unicode support."

vbr
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-12-07 Thread Vlastimil Brom
2010/12/7 Alexander Belopolsky :
> On Sat, Dec 4, 2010 at 5:58 PM, "Martin v. Löwis"  wrote:
>>> I actually wonder if Python's re module can claim to provide even
>>> Basic Unicode Support.
>>
>> Do you really wonder? Most definitely it does not.
>>
>
> Were you more optimistic four years ago?
>
> http://bugs.python.org/issue1528154#msg54864
>
> I was hoping that regex syntax would be useful in
> explaining/documenting Python text processing routines (including
> string to number conversions) without a heavy dose of Unicode
> terminology.
>

The new regex version
http://bugs.python.org/issue2636
supports much more features, including unicode properties, and the
mentioned possix classes etc. but definitely not all of the
requirements of that rather "generous" list.
http://www.unicode.org/reports/tr18/
It seems, e.g. in Perl, there are some omissions too
http://perldoc.perl.org/perlunicode.html#Unicode-Regular-Expression-Support-Level

Do you know of any re engine fully complying to to tr18, even at the
first level: "Basic Unicode Support"?

vbr
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New regex module for 3.2?

2010-07-16 Thread Vlastimil Brom
2010/7/9 Georg Brandl :
> Am 09.07.2010 02:35, schrieb MRAB:
>
>>
>> 1. Some of the inline flags are scoped; for example, putting "(?i)" at
>> the end of a regex will now have no effect because it's no longer a
>> global, all-or-nothing, flag.
>
> That is problematic.  I've often seen people put these flags at the end
> of a regex, probably for readability purposes.  IMHO it would be better
> to limit flag scoping to the explicit (?flags-flags: ) groups.
>

I just noticed the formulation on the reference page
regular-expressions.info on this kind of flags:
"(?i)   Turn on case insensitivity for the remainder of the regular
expression. (Older regex flavors may turn it on for the entire
regex.)" and likewise for other flags.

http://www.regular-expressions.info/refadv.html

I am not sure, how "authoritative" this page by Jan Goyvaerts is for
various implementations, but it looks like a very comprehensive
reference.
I think with a new regex implementation, not all of this "historical"
semantics must be copied, unless there are major real usecases, which
would be affected by this.
Just a thought;
Vlastimil Brom
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New regex module for 3.2?

2010-07-13 Thread Vlastimil Brom
2010/7/8 MRAB :
> Hi all,
>
> I re-implemented the re module, adding new features and speed
> improvements. It's available at:
>
>    http://pypi.python.org/pypi/regex
>
> under the name "regex" so that it can be tried alongside "re".
>
> I'd be interested in any comments or feedback. How does it compare with
> "re" in terms of speed on real-world data? The benchmarks suggest it
> should be faster, or at worst comparable.
>
> How much interest would there be in putting it in Python 3.2?
>

Hi,
please, let me apologize for posting here, not being a python developer;
I'd like to support the inclusion of the new regex library in the standard lib.
I use it since the early development versions in my internal app for
searching, modifying, ordering, extracting data from text - mainly
using the manually created regex patterns. I see, that it is only one
specific usecase, and the app isn't  time or space critical (input
texts up to a few MB, mostly smaller; the processing time is often
rather negligible compared to the gui presentation, styling etc.)
However, I see it as a great enhancement both in terms of regex
features (listed on http://pypi.python.org/pypi/regex ) as well as the
behaviour in some cornercases, which aren't effectively usable in the
current re (e.g. nested subexpressions with quantifiers - while many
of these are more effectively solved with the added possesive
quantifiers).
I think, this is a far more feature complete engine, which doesn't
induce any significant drawbacks (IMO) comparing to the current re and
is backward compatible.
(The mentioned exception in the scoped flags might be fixable by
allowing only explicit scoping (?flags)...(?-flags) or using the
current paren, if possible (?flag:...) and treating the bare flag
setting parens as global;
however, I would consider it quite misleading in the current re, if
these flags are set at some other place than at the beginning of the
pattern. I don't see the readability enhanced in any way with these
flags set at the end, while they should apply from the beginning of
the pattern.)

Having seen MRABs commitment in the development phase in both -
bugfixes and feature additions - including some rather complex ones
(in my opinion) like unicode properties, I am also confident, that he
could be a competent maintainer of this package in the standardlib as
well.

just my biased opinion...
Regards,

   Vlastimil Brom
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com