Re: [Python-Dev] re performance
2017-01-26 22:13 GMT+01:00 Sven R. Kunze : > Hi folks, > > I recently refreshed regular expressions theoretical basics *indulging in > reminiscences* So, I read https://swtch.com/~rsc/regexp/regexp1.html > > However, reaching the chart in the lower third of the article, I saw Python > 2.4 measured against a naive Thompson matching implementation. And I was > surprised about how bad it performed compared to an unoptimized version of > an older than dirt algorithm. > > So, I decided to give it a try with Python2.7 and Python3.5. Eh, voilà, 100% > cpu and no results so far. Nothing has changed at all since 2007. > import re re.match(r'a?'*30 + r'a'*30, 'a'*30) > (still waiting) > > Quoting from the article: " Some might argue that this test is unfair to the > backtracking implementations, since it focuses on an uncommon corner case. > This argument misses the point: given a choice between an implementation > with a predictable, consistent, fast running time on all inputs or one that > usually runs quickly but can take years of CPU time (or more) on some > inputs, the decision should be easy." > > Victor, as the head of Python performance department, and Matthew, as the > maintainer of the new regex module, what is your stance on this particular > issue? > > From my perspective, I can say, that regular expressions might worth > optimizing especially for web applications (url matching usually uses > regexes) but also for other applications where I've seen many tight loops > using regexes as well. So, I am probing interest on this topic here. > > Best, > Sven > Hi, I can't speak about the details of mrab's implementation, but using regex, I get the resulting match instantly: Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 07:18:10) [MSC v.1900 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. >>> import regex >>> regex.match(r'a?'*30 + r'a'*30, 'a'*30) >>> (I personally prefer to use regex for other advantages, than this artificial case, but it certainly doesn't hurt to have better performance here too:) regards, vbr ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] collections.Counter __add__ implementation quirk
2015-11-23 7:21 GMT+01:00 Alexander Walters : > collections.Counter.__add__ as a bit of a quirk. > > Counters allow for negative numbers. You can subtract from a counter into > the negative no problem. However, if you have a counter with a negative > value and add it to another counter, and if that value, after addition, > would still be negative... that value is not included in the resulting > Counter object. This is kind of weird, to the point of thinking I had a bug > in other code for several hours until I went and checked how Counters are > implemented. > > Is there any particular reason counters drop negative values when you add > them together? I definitely expected them to act like ints do when you add > negatives, and had to subclass it to get what I think is the obvious > behavior. > ___ > Python-Dev mailing list ... Hi, this is probably more appropriate for the general python list rathere then this developers' maillist, however, as I asked a similar question some time ago, I got some detailed explanations for the the current design decissions from the original developer; cf.: https://mail.python.org/pipermail/python-list/2010-March/570618.html (I didn't check possible changes in Counter since that version (3.1 at that time).) hth, vbr ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
2010/12/7 Alexander Belopolsky : > On Tue, Dec 7, 2010 at 8:02 AM, Vlastimil Brom > wrote: > .. >> Do you know of any re engine fully complying to to tr18, even at the >> first level: "Basic Unicode Support"? >> > """ > ICU Regular Expressions conform to Unicode Technical Standard #18 , > Unicode Regular Expressions, level 1, and in addition include Default > Word boundaries and Name Properties from level 2. > """ http://userguide.icu-project.org/strings/regexp > Thanks for the pointer, I wasn't aware of that project. Anyway I am quite happy with the mentioned regex library and can live with trading this full compliance for some non-unicode goodies (like unbounded lookbehinds etc.), but I see, it's beyond the point here. Not that my opinion matters, but I can't think of, say, "union, intersection and set-difference of Unicode sets" as a basic feature or consider it a part of "a minimal level for useful Unicode support." vbr ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
2010/12/7 Alexander Belopolsky : > On Sat, Dec 4, 2010 at 5:58 PM, "Martin v. Löwis" wrote: >>> I actually wonder if Python's re module can claim to provide even >>> Basic Unicode Support. >> >> Do you really wonder? Most definitely it does not. >> > > Were you more optimistic four years ago? > > http://bugs.python.org/issue1528154#msg54864 > > I was hoping that regex syntax would be useful in > explaining/documenting Python text processing routines (including > string to number conversions) without a heavy dose of Unicode > terminology. > The new regex version http://bugs.python.org/issue2636 supports much more features, including unicode properties, and the mentioned possix classes etc. but definitely not all of the requirements of that rather "generous" list. http://www.unicode.org/reports/tr18/ It seems, e.g. in Perl, there are some omissions too http://perldoc.perl.org/perlunicode.html#Unicode-Regular-Expression-Support-Level Do you know of any re engine fully complying to to tr18, even at the first level: "Basic Unicode Support"? vbr ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New regex module for 3.2?
2010/7/9 Georg Brandl : > Am 09.07.2010 02:35, schrieb MRAB: > >> >> 1. Some of the inline flags are scoped; for example, putting "(?i)" at >> the end of a regex will now have no effect because it's no longer a >> global, all-or-nothing, flag. > > That is problematic. I've often seen people put these flags at the end > of a regex, probably for readability purposes. IMHO it would be better > to limit flag scoping to the explicit (?flags-flags: ) groups. > I just noticed the formulation on the reference page regular-expressions.info on this kind of flags: "(?i) Turn on case insensitivity for the remainder of the regular expression. (Older regex flavors may turn it on for the entire regex.)" and likewise for other flags. http://www.regular-expressions.info/refadv.html I am not sure, how "authoritative" this page by Jan Goyvaerts is for various implementations, but it looks like a very comprehensive reference. I think with a new regex implementation, not all of this "historical" semantics must be copied, unless there are major real usecases, which would be affected by this. Just a thought; Vlastimil Brom ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New regex module for 3.2?
2010/7/8 MRAB : > Hi all, > > I re-implemented the re module, adding new features and speed > improvements. It's available at: > > http://pypi.python.org/pypi/regex > > under the name "regex" so that it can be tried alongside "re". > > I'd be interested in any comments or feedback. How does it compare with > "re" in terms of speed on real-world data? The benchmarks suggest it > should be faster, or at worst comparable. > > How much interest would there be in putting it in Python 3.2? > Hi, please, let me apologize for posting here, not being a python developer; I'd like to support the inclusion of the new regex library in the standard lib. I use it since the early development versions in my internal app for searching, modifying, ordering, extracting data from text - mainly using the manually created regex patterns. I see, that it is only one specific usecase, and the app isn't time or space critical (input texts up to a few MB, mostly smaller; the processing time is often rather negligible compared to the gui presentation, styling etc.) However, I see it as a great enhancement both in terms of regex features (listed on http://pypi.python.org/pypi/regex ) as well as the behaviour in some cornercases, which aren't effectively usable in the current re (e.g. nested subexpressions with quantifiers - while many of these are more effectively solved with the added possesive quantifiers). I think, this is a far more feature complete engine, which doesn't induce any significant drawbacks (IMO) comparing to the current re and is backward compatible. (The mentioned exception in the scoped flags might be fixable by allowing only explicit scoping (?flags)...(?-flags) or using the current paren, if possible (?flag:...) and treating the bare flag setting parens as global; however, I would consider it quite misleading in the current re, if these flags are set at some other place than at the beginning of the pattern. I don't see the readability enhanced in any way with these flags set at the end, while they should apply from the beginning of the pattern.) Having seen MRABs commitment in the development phase in both - bugfixes and feature additions - including some rather complex ones (in my opinion) like unicode properties, I am also confident, that he could be a competent maintainer of this package in the standardlib as well. just my biased opinion... Regards, Vlastimil Brom ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com