SSL Server Socket Support in Python?

2005-04-22 Thread André Søreng
I'm trying to create a SSL-enabled server in Python, and in the doc for
the socket module:
ssl(sock[, keyfile, certfile])
 Initiate a SSL connection over the socket sock. keyfile is the 
name of a PEM formatted
 file that contains your private key. certfile is a PEM formatted 
certificate chain file.
 On success, a new SSLObject is returned.

So:
listen_socket = socket.socket()
listen_socket.bind((addr, port))
listen_socket.listen(10)
s, addr = listen_socket.accept()
ssl_s = socket.ssl(s, key.pem, cert.pem)
socket.sslerror: (1, 'error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol')
Am I missing something?
--
http://mail.python.org/mailman/listinfo/python-list


Re: RE Engine error with sub()

2005-04-15 Thread André Søreng
Instead of using regular expressions, you could perhaps
use a multiple keyword matcher, and then for each match,
replace it with the correct string.
http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/
contains the Aho-Corasick algorithm written in C with
a Python extension.
Maurice LING wrote:
Hi,
I have the following codes:
from __future__ import nested_scopes
import re
from UserDict import UserDict
class Replacer(UserDict):

An all-in-one multiple string substitution class. This class was 
contributed by Xavier
Defrang to the ASPN Python Cookbook 
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/81330)
and [EMAIL PROTECTED]

Copyright: The methods _make_regex(), __call__() and substitute() 
were the work of Xavier Defrang,
__init__() was the work of [EMAIL PROTECTED], all others were 
the work of Maurice Ling

def __init__(self, dict = None, file = None):
Constructor. It calls for the compilation of regular 
expressions from either
a dictionary object or a replacement rule file.

@param dict: dictionary object containing replacement rules with 
the string to be
replaced as keys.
@param file: file name of replacement rule file

self.re = None
self.regex = None
if file == None:
UserDict.__init__(self, dict)
self._make_regex()
else:
UserDict.__init__(self, self.readDictionaryFile(file))
self._make_regex()

def cleanDictionaryFile(self, file):

Method to clean up the replacement rule dictionary file and 
write the cleaned
file as the same name as the original file.
import os
dict = self.readDictionaryFile(file)
f = open(file, 'w')
for key in dict.keys(): f.write(str(key) + '=' + str(dict[key]) 
+ os.linesep)
f.close()

def readDictionaryFile(self, file):

Method to parse a replacement rule file (file) into a dictionary 
for regular
expression processing. Each rule in the rule file is in the form:
string to be replaced=string to replace with

import string
import os
f = open(file, 'r')
data = f.readlines()
f.close()
dict = {}
for rule in data:
rule = rule.split('=')
if rule[1][-1] == os.linesep: rule[1] = rule[1][:-1]
dict[str(rule[0])] = str(rule[1])
print '%s replacement rule(s) read from %s' % 
(str(len(dict.keys())), str(file))
return dict

def _make_regex(self):
 Build a regular expression object based on the keys of the 
current dictionary 
self.re = (%s) % |.join(map(re.escape, self.keys()))
self.regex = re.compile(self.re)

def __call__(self, mo):
 This handler will be invoked for each regex match 
# Count substitutions
self.count += 1 # Look-up string
return self[mo.string[mo.start():mo.end()]]
def substitute(self, text):
 Translate text, returns the modified text. 
# Reset substitution counter
self.count = 0
# Process text
#return self._make_regex().sub(self, text)
return self.regex.sub(self, text)
def rmBracketDuplicate(self, text):
Removes the bracketed text in occurrences of 'text-x 
(text-x)'
regex = re.compile(r'(\w+)\s*(\(\1\))')
return regex.sub(r'\1', text)

def substituteMultiple(self, text):
Similar to substitute() method except that this method loops 
round the same text
multiple times until no more substitutions can be made or when 
it had looped
10 times. This is to pre-ampt for cases of recursive 
abbreviations.
count = 1 # to get into the loop
run = 0 # counter for number of runs thru the text
while count  0 and run  10:
count = 0
text = self.rmBracketDuplicate(self.substitute(text))
count = count + self.count
run = run + 1
print Pass %d: Changed %d things(s) % (run, count)
return text


Normally I will use the following to instantiate my module:
replace = Replacer('', 'rule.mdf')
rule.mdf is in the format of string to be replaced=string to replace 
with\n

Then using replace.substituteMultiple('my text') to carry out multiple 
replacements.

It all works well for rule count up to 800+ but when my replacement 
rules swells up to 1800+, it gives me a runtime error that says 
Internal error in regular expression engine... traceable to return 
self.regex.sub(self, text) in substitute() method.

Any ideas or workarounds?
Thanks in advance.
Cheers,
Maurice
--
http://mail.python.org/mailman/listinfo/python-list


Re: RE Engine error with sub()

2005-04-15 Thread André Søreng
The Internal error in regular expression engine occurs also
in Python 2.4.0 when creating a regular expression containing
more than  or's (|).
Dennis Benzinger wrote:
Maurice LING schrieb:
Hi,
I have the following codes:
from __future__ import nested_scopes
  [...]
Are you still using Python 2.1?
In every later version you don't need the
from __future__ import nested_scopes line.
So, if you are using Python 2.1 I strongly recommend
upgrading to Python 2.4.1.
[...]
It all works well for rule count up to 800+ but when my replacement 
rules swells up to 1800+, it gives me a runtime error that says 
Internal error in regular expression engine... traceable to return 
self.regex.sub(self, text) in substitute() method.
[...]

I didn't read your code, but this sounds like you have a problem with 
the regular expression engine being recursive in Python versions  2.4.
Try again using Python 2.4 or later (i.e. Python 2.4.1). The new regular 
expression engine is not recursive anymore.

Bye,
Dennis
--
http://mail.python.org/mailman/listinfo/python-list


Overlapping matches in Regular Expressions

2005-04-12 Thread André Søreng
With the re/sre module included with Python 2.4:
pattern = (?Pid1avi)|(?Pid2avi|mp3)
string2match = some string with avi in it
matches = re.finditer(pattern, string2match)
...
matches[0].groupdict()
{'id2': None, 'id1': 'avi'}
Which was expected since overlapping matches are ignored.
But I would also like to know if other groups had a match.
What modifications to the re/sre module is needed to allow
overlapping matches?
--
http://mail.python.org/mailman/listinfo/python-list


Re: monitoring folder in python

2005-04-05 Thread André Søreng
Raghul wrote:
Is it possible to monitor a folder in the python?My question is if I
put any file in it that particular folder my script should monitor the
folder and read the file name.If so what function can I use?
Thanx in advance
If you do not want to poll (check for changes yourself regularly),
here are some pointers:
In Windows:
With the win32 extensions, it's quite simple:
http://tgolden.sc.sabren.com/python/win32_how_do_i/watch_directory_for_changes.html
In Linux:
http://www.edoceo.com/creo/inotify/
http://www.student.lu.se/~nbi98oli/dnotify.html
http://www.lambda-computing.com/projects/dnotify/
Don't know of any Python extensions for the above, so you might
have to write your own if you want to use them from Python. Also,
you probably need to reconfigure and compile a new kernel. Have
tested inotify myself and it seems to work, you also get the filename
with the event. Only problem is it don't support monitoring a directory
recursive (subdirectories) like in Windows with ReadDirectoryChangesW.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expressions: large amount of or's

2005-03-02 Thread André Søreng
Bill Mill wrote:
On Tue, 01 Mar 2005 22:04:15 +0100, André Søreng [EMAIL PROTECTED] wrote:
Kent Johnson wrote:
André Søreng wrote:

Hi!
Given a string, I want to find all ocurrences of
certain predefined words in that string. Problem is, the list of
words that should be detected can be in the order of thousands.
With the re module, this can be solved something like this:
import re
r = re.compile(word1|word2|word3|...|wordN)
r.findall(some_string)
Unfortunately, when having more than about 10 000 words in
the regexp, I get a regular expression runtime error when
trying to execute the findall function (compile works fine, but slow).
I don't know if using the re module is the right solution here, any
suggestions on alternative solutions or data structures which could
be used to solve the problem?

If you can split some_string into individual words, you could look them
up in a set of known words:
known_words = set(word1 word2 word3 ... wordN.split())
found_words = [ word for word in some_string.split() if word in
known_words ]
Kent

André
That is not exactly what I want. It should discover if some of
the predefined words appear as substrings, not only as equal
words. For instance, after matching word2sgjoisejfisaword1yguyg, word2
and word1 should be detected.

Show some initiative, man!

known_words = set([word1, word2])
found_words = [word for word in known_words if word in word2sgjoisejfisawo
rd1yguyg]
found_words
['word1', 'word2']
Peace
Bill Mill
bill.mill at gmail.com
Yes, but I was looking for a solution which would scale. Searching 
through the same string 1+++ times does not seem like a suitable 
solution.

André
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expressions: large amount of or's

2005-03-02 Thread André Søreng
Daniel Yoo wrote:
Kent Johnson [EMAIL PROTECTED] wrote:
: Given a string, I want to find all ocurrences of
: certain predefined words in that string. Problem is, the list of
: words that should be detected can be in the order of thousands.
: 
: With the re module, this can be solved something like this:
: 
: import re
: 
: r = re.compile(word1|word2|word3|...|wordN)
: r.findall(some_string)

The internal data structure that encodes that set of keywords is
probably humongous.  An alternative approach to this problem is to
tokenize your string into words, and then check to see if each word is
in a defined list of keywords.  This works if your keywords are
single words:
###
keywords = set([word1, word2, ...])
matchingWords = set(re.findall(r'\w+')).intersection(keywords)
###
Would this approach work for you?

Otherwise, you may want to look at a specialized data structure for
doing mutiple keyword matching; I had an older module that wrapped
around a suffix tree:
http://hkn.eecs.berkeley.edu/~dyoo/python/suffix_trees/
It looks like other folks, thankfully, have written other
implementations of suffix trees:
http://cs.haifa.ac.il/~shlomo/suffix_tree/
Another approach is something called the Aho-Corasick algorithm:
http://portal.acm.org/citation.cfm?doid=360825.360855
though I haven't been able to find a nice Python module for this yet.
Best of wishes to you!
Thanks, seems like the Aho-Corasick algorithm is along the lines of
what I was looking for, but have not read the article completely yet.
Also:
http://alexandria.tue.nl/extra1/wskrap/publichtml/200407.pdf
provided several alternative algorithms.
André
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expressions: large amount of or's

2005-03-02 Thread André Søreng
Ola Natvig wrote:
André Søreng wrote:

Yes, but I was looking for a solution which would scale. Searching 
through the same string 1+++ times does not seem like a suitable 
solution.

André

Just for curiosity, what would a regexp do? Perhaps it's a clue in how 
you could do this in the way regexp's are executed.

ola
I think this article provides me with what I was looking for:
http://alexandria.tue.nl/extra1/wskrap/publichtml/200407.pdf
Enough info there to keep me going for some while.
--
http://mail.python.org/mailman/listinfo/python-list


Regular Expressions: large amount of or's

2005-03-01 Thread André Søreng
Hi!
Given a string, I want to find all ocurrences of
certain predefined words in that string. Problem is, the list of
words that should be detected can be in the order of thousands.
With the re module, this can be solved something like this:
import re
r = re.compile(word1|word2|word3|...|wordN)
r.findall(some_string)
Unfortunately, when having more than about 10 000 words in
the regexp, I get a regular expression runtime error when
trying to execute the findall function (compile works fine, but slow).
I don't know if using the re module is the right solution here, any
suggestions on alternative solutions or data structures which could
be used to solve the problem?
André
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expressions: large amount of or's

2005-03-01 Thread André Søreng
Kent Johnson wrote:
André Søreng wrote:
Hi!
Given a string, I want to find all ocurrences of
certain predefined words in that string. Problem is, the list of
words that should be detected can be in the order of thousands.
With the re module, this can be solved something like this:
import re
r = re.compile(word1|word2|word3|...|wordN)
r.findall(some_string)
Unfortunately, when having more than about 10 000 words in
the regexp, I get a regular expression runtime error when
trying to execute the findall function (compile works fine, but slow).
I don't know if using the re module is the right solution here, any
suggestions on alternative solutions or data structures which could
be used to solve the problem?

If you can split some_string into individual words, you could look them 
up in a set of known words:

known_words = set(word1 word2 word3 ... wordN.split())
found_words = [ word for word in some_string.split() if word in 
known_words ]

Kent
André
That is not exactly what I want. It should discover if some of
the predefined words appear as substrings, not only as equal
words. For instance, after matching word2sgjoisejfisaword1yguyg, word2
and word1 should be detected.
--
http://mail.python.org/mailman/listinfo/python-list