Feature Requests item #1701452, was opened at 2007-04-16 07:28 Message generated for change (Comment added) made by rhettinger You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1701452&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: None >Status: Closed >Resolution: Rejected Priority: 5 Private: No Submitted By: Thomas Dybdahl Ahle (thomasda) Assigned to: Gustavo Niemeyer (niemeyer) Summary: Feature request: Comparing regexps Initial Comment: It would be very nice with a function in the re module to compare two regular expressions and see how they overlap. Return value would perhaps be an a constant like DISJOINT, SUBSET etc. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2007-04-19 01:44 Message: Logged In: YES user_id=80475 Originator: NO Am closing the feature request because it has very little chance of making it into the core distribution without being a third-party module first. The existing implementation is already somewhat difficult to maintain and we don't want to make that worse. Also, the new functionality is of interest to only a small subset of the re module users. If you do write it, respect the GPL license on the SML code and take some care to make sure you can license your new code in any way you see fit. ---------------------------------------------------------------------- Comment By: Thomas Dybdahl Ahle (thomasda) Date: 2007-04-18 09:33 Message: Logged In: YES user_id=1304417 Originator: YES I talked with the guy who wrote the ZZ comparator. """I can give you the source code under the GPL if you like. However, I think it would be difficult to port to python. It's 1100 lines of very SML-style mostly-uncommented code. Do you know SML? It's available at svn://mlton.org/mltonlib/trunk/ca/terpstra/regexp. I would expect credit for the algorithm. :-) Let me know if you succeed in porting it.""" I don't know any SML though. If anybody does or can write a python equaliant of the algorithm: """1. Parse the regular expression (in GNU regular expression syntax) 2. Convert that parse tree into an NFA 3. Convert the NFA into a DFA by an algorithm I invented (it's why I wrote this program; I thought of the algorithm and found it amusing to see it in action) For comparing regular expressions I take the following additional steps 4. Combine the two DFA's (with the cross product) 4a. For finding common words, I intersect them 4b. For finding words in A, but not B, I intersect A with the negated DFA of B 4c. ... 5. Find the minimal DFA corresponding to this intersection 6. Run a weighted depth-first search (similar to Dijkstra's) to find the least-weighted path from the initial state to an accepting state (the weighting makes it prefer human readable characters in the examples)""" It could be cool. Otherwise I can also try to learn sml and port it. ---------------------------------------------------------------------- Comment By: Thomas Dybdahl Ahle (thomasda) Date: 2007-04-17 02:51 Message: Logged In: YES user_id=1304417 Originator: YES I found this page with the function to do so: http://terpstra.ca/compare.html I also found this thread: http://bumppo.net/lists/fun-with-perl/1999/09/msg00000.html which discusses how to do this with some canonical formed expressions. A function like this would really be able to speed some applications up, which matches a lot of strings with a number of expressions. If you know which ones are disjoint, you can break the test when one test matches. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2007-04-16 15:43 Message: Logged In: YES user_id=80475 Originator: NO Can this be done in the existing implementation by comparing the racetrack diagrams (character transitions)? Thomas, do you know of anywhere this have been done (third-party modules or in other languages)? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1701452&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com