SSL Server Socket Support in Python?
I'm trying to create a SSL-enabled server in Python, and in the doc for the socket module: ssl(sock[, keyfile, certfile]) Initiate a SSL connection over the socket sock. keyfile is the name of a PEM formatted file that contains your private key. certfile is a PEM formatted certificate chain file. On success, a new SSLObject is returned. So: listen_socket = socket.socket() listen_socket.bind((addr, port)) listen_socket.listen(10) s, addr = listen_socket.accept() ssl_s = socket.ssl(s, key.pem, cert.pem) socket.sslerror: (1, 'error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol') Am I missing something? -- http://mail.python.org/mailman/listinfo/python-list
Re: RE Engine error with sub()
Instead of using regular expressions, you could perhaps use a multiple keyword matcher, and then for each match, replace it with the correct string. http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/ contains the Aho-Corasick algorithm written in C with a Python extension. Maurice LING wrote: Hi, I have the following codes: from __future__ import nested_scopes import re from UserDict import UserDict class Replacer(UserDict): An all-in-one multiple string substitution class. This class was contributed by Xavier Defrang to the ASPN Python Cookbook (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/81330) and [EMAIL PROTECTED] Copyright: The methods _make_regex(), __call__() and substitute() were the work of Xavier Defrang, __init__() was the work of [EMAIL PROTECTED], all others were the work of Maurice Ling def __init__(self, dict = None, file = None): Constructor. It calls for the compilation of regular expressions from either a dictionary object or a replacement rule file. @param dict: dictionary object containing replacement rules with the string to be replaced as keys. @param file: file name of replacement rule file self.re = None self.regex = None if file == None: UserDict.__init__(self, dict) self._make_regex() else: UserDict.__init__(self, self.readDictionaryFile(file)) self._make_regex() def cleanDictionaryFile(self, file): Method to clean up the replacement rule dictionary file and write the cleaned file as the same name as the original file. import os dict = self.readDictionaryFile(file) f = open(file, 'w') for key in dict.keys(): f.write(str(key) + '=' + str(dict[key]) + os.linesep) f.close() def readDictionaryFile(self, file): Method to parse a replacement rule file (file) into a dictionary for regular expression processing. Each rule in the rule file is in the form: string to be replaced=string to replace with import string import os f = open(file, 'r') data = f.readlines() f.close() dict = {} for rule in data: rule = rule.split('=') if rule[1][-1] == os.linesep: rule[1] = rule[1][:-1] dict[str(rule[0])] = str(rule[1]) print '%s replacement rule(s) read from %s' % (str(len(dict.keys())), str(file)) return dict def _make_regex(self): Build a regular expression object based on the keys of the current dictionary self.re = (%s) % |.join(map(re.escape, self.keys())) self.regex = re.compile(self.re) def __call__(self, mo): This handler will be invoked for each regex match # Count substitutions self.count += 1 # Look-up string return self[mo.string[mo.start():mo.end()]] def substitute(self, text): Translate text, returns the modified text. # Reset substitution counter self.count = 0 # Process text #return self._make_regex().sub(self, text) return self.regex.sub(self, text) def rmBracketDuplicate(self, text): Removes the bracketed text in occurrences of 'text-x (text-x)' regex = re.compile(r'(\w+)\s*(\(\1\))') return regex.sub(r'\1', text) def substituteMultiple(self, text): Similar to substitute() method except that this method loops round the same text multiple times until no more substitutions can be made or when it had looped 10 times. This is to pre-ampt for cases of recursive abbreviations. count = 1 # to get into the loop run = 0 # counter for number of runs thru the text while count 0 and run 10: count = 0 text = self.rmBracketDuplicate(self.substitute(text)) count = count + self.count run = run + 1 print Pass %d: Changed %d things(s) % (run, count) return text Normally I will use the following to instantiate my module: replace = Replacer('', 'rule.mdf') rule.mdf is in the format of string to be replaced=string to replace with\n Then using replace.substituteMultiple('my text') to carry out multiple replacements. It all works well for rule count up to 800+ but when my replacement rules swells up to 1800+, it gives me a runtime error that says Internal error in regular expression engine... traceable to return self.regex.sub(self, text) in substitute() method. Any ideas or workarounds? Thanks in advance. Cheers, Maurice -- http://mail.python.org/mailman/listinfo/python-list
Re: RE Engine error with sub()
The Internal error in regular expression engine occurs also in Python 2.4.0 when creating a regular expression containing more than or's (|). Dennis Benzinger wrote: Maurice LING schrieb: Hi, I have the following codes: from __future__ import nested_scopes [...] Are you still using Python 2.1? In every later version you don't need the from __future__ import nested_scopes line. So, if you are using Python 2.1 I strongly recommend upgrading to Python 2.4.1. [...] It all works well for rule count up to 800+ but when my replacement rules swells up to 1800+, it gives me a runtime error that says Internal error in regular expression engine... traceable to return self.regex.sub(self, text) in substitute() method. [...] I didn't read your code, but this sounds like you have a problem with the regular expression engine being recursive in Python versions 2.4. Try again using Python 2.4 or later (i.e. Python 2.4.1). The new regular expression engine is not recursive anymore. Bye, Dennis -- http://mail.python.org/mailman/listinfo/python-list
Overlapping matches in Regular Expressions
With the re/sre module included with Python 2.4: pattern = (?Pid1avi)|(?Pid2avi|mp3) string2match = some string with avi in it matches = re.finditer(pattern, string2match) ... matches[0].groupdict() {'id2': None, 'id1': 'avi'} Which was expected since overlapping matches are ignored. But I would also like to know if other groups had a match. What modifications to the re/sre module is needed to allow overlapping matches? -- http://mail.python.org/mailman/listinfo/python-list
Re: monitoring folder in python
Raghul wrote: Is it possible to monitor a folder in the python?My question is if I put any file in it that particular folder my script should monitor the folder and read the file name.If so what function can I use? Thanx in advance If you do not want to poll (check for changes yourself regularly), here are some pointers: In Windows: With the win32 extensions, it's quite simple: http://tgolden.sc.sabren.com/python/win32_how_do_i/watch_directory_for_changes.html In Linux: http://www.edoceo.com/creo/inotify/ http://www.student.lu.se/~nbi98oli/dnotify.html http://www.lambda-computing.com/projects/dnotify/ Don't know of any Python extensions for the above, so you might have to write your own if you want to use them from Python. Also, you probably need to reconfigure and compile a new kernel. Have tested inotify myself and it seems to work, you also get the filename with the event. Only problem is it don't support monitoring a directory recursive (subdirectories) like in Windows with ReadDirectoryChangesW. -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular Expressions: large amount of or's
Bill Mill wrote: On Tue, 01 Mar 2005 22:04:15 +0100, André Søreng [EMAIL PROTECTED] wrote: Kent Johnson wrote: André Søreng wrote: Hi! Given a string, I want to find all ocurrences of certain predefined words in that string. Problem is, the list of words that should be detected can be in the order of thousands. With the re module, this can be solved something like this: import re r = re.compile(word1|word2|word3|...|wordN) r.findall(some_string) Unfortunately, when having more than about 10 000 words in the regexp, I get a regular expression runtime error when trying to execute the findall function (compile works fine, but slow). I don't know if using the re module is the right solution here, any suggestions on alternative solutions or data structures which could be used to solve the problem? If you can split some_string into individual words, you could look them up in a set of known words: known_words = set(word1 word2 word3 ... wordN.split()) found_words = [ word for word in some_string.split() if word in known_words ] Kent André That is not exactly what I want. It should discover if some of the predefined words appear as substrings, not only as equal words. For instance, after matching word2sgjoisejfisaword1yguyg, word2 and word1 should be detected. Show some initiative, man! known_words = set([word1, word2]) found_words = [word for word in known_words if word in word2sgjoisejfisawo rd1yguyg] found_words ['word1', 'word2'] Peace Bill Mill bill.mill at gmail.com Yes, but I was looking for a solution which would scale. Searching through the same string 1+++ times does not seem like a suitable solution. André -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular Expressions: large amount of or's
Daniel Yoo wrote: Kent Johnson [EMAIL PROTECTED] wrote: : Given a string, I want to find all ocurrences of : certain predefined words in that string. Problem is, the list of : words that should be detected can be in the order of thousands. : : With the re module, this can be solved something like this: : : import re : : r = re.compile(word1|word2|word3|...|wordN) : r.findall(some_string) The internal data structure that encodes that set of keywords is probably humongous. An alternative approach to this problem is to tokenize your string into words, and then check to see if each word is in a defined list of keywords. This works if your keywords are single words: ### keywords = set([word1, word2, ...]) matchingWords = set(re.findall(r'\w+')).intersection(keywords) ### Would this approach work for you? Otherwise, you may want to look at a specialized data structure for doing mutiple keyword matching; I had an older module that wrapped around a suffix tree: http://hkn.eecs.berkeley.edu/~dyoo/python/suffix_trees/ It looks like other folks, thankfully, have written other implementations of suffix trees: http://cs.haifa.ac.il/~shlomo/suffix_tree/ Another approach is something called the Aho-Corasick algorithm: http://portal.acm.org/citation.cfm?doid=360825.360855 though I haven't been able to find a nice Python module for this yet. Best of wishes to you! Thanks, seems like the Aho-Corasick algorithm is along the lines of what I was looking for, but have not read the article completely yet. Also: http://alexandria.tue.nl/extra1/wskrap/publichtml/200407.pdf provided several alternative algorithms. André -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular Expressions: large amount of or's
Ola Natvig wrote: André Søreng wrote: Yes, but I was looking for a solution which would scale. Searching through the same string 1+++ times does not seem like a suitable solution. André Just for curiosity, what would a regexp do? Perhaps it's a clue in how you could do this in the way regexp's are executed. ola I think this article provides me with what I was looking for: http://alexandria.tue.nl/extra1/wskrap/publichtml/200407.pdf Enough info there to keep me going for some while. -- http://mail.python.org/mailman/listinfo/python-list
Regular Expressions: large amount of or's
Hi! Given a string, I want to find all ocurrences of certain predefined words in that string. Problem is, the list of words that should be detected can be in the order of thousands. With the re module, this can be solved something like this: import re r = re.compile(word1|word2|word3|...|wordN) r.findall(some_string) Unfortunately, when having more than about 10 000 words in the regexp, I get a regular expression runtime error when trying to execute the findall function (compile works fine, but slow). I don't know if using the re module is the right solution here, any suggestions on alternative solutions or data structures which could be used to solve the problem? André -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular Expressions: large amount of or's
Kent Johnson wrote: André Søreng wrote: Hi! Given a string, I want to find all ocurrences of certain predefined words in that string. Problem is, the list of words that should be detected can be in the order of thousands. With the re module, this can be solved something like this: import re r = re.compile(word1|word2|word3|...|wordN) r.findall(some_string) Unfortunately, when having more than about 10 000 words in the regexp, I get a regular expression runtime error when trying to execute the findall function (compile works fine, but slow). I don't know if using the re module is the right solution here, any suggestions on alternative solutions or data structures which could be used to solve the problem? If you can split some_string into individual words, you could look them up in a set of known words: known_words = set(word1 word2 word3 ... wordN.split()) found_words = [ word for word in some_string.split() if word in known_words ] Kent André That is not exactly what I want. It should discover if some of the predefined words appear as substrings, not only as equal words. For instance, after matching word2sgjoisejfisaword1yguyg, word2 and word1 should be detected. -- http://mail.python.org/mailman/listinfo/python-list