Re: a little about regex
On Wednesday 18 October 2006 23:05, Ant wrote: allow = re.compile(r'.*(?!\.com)\.my(|$)') # negative lookbehind if allow.search(adr): return True return False I'd point out that : allow = re.search(r'.*(?!\.com)\.my(|$)',adr) Will do as yours, since the call to 're' class will do the compilation as here it's doing separately. Though having the explicit allow and deny expressions may make what's going on clearer than the fairly esoteric negative lookbehind. This makes me think that your point is truly correct. The option for my case is meant as deny all except those are specified. Also may go viceversa. Therefore I should refine the way the filtering act. In fact the (temporarily) ignored score is the base of the method to be applied. Obviously here mainly we are talking about email addresses, so my intention is like the mailfilter concept, which means the program may block an entire domain but some are allowed and all from .my are allowed but not those from .com.my (mostly annoying emails :P ) At the sum of the view I've considered a flexible programming as much as I'm thinking that may be published some time to benefit for multiplatform user as python is. In such perspective I'm a bit curious to know if exist sites on the web where small program are welcomed and people like me can express all of their ignorance about the mode of using python. For such ignorance I may concour for the Nobel Price :) Also the News Group doesn't contemplate the idea to split into beginners and high level programmers (HLP). Of course the HLP are welcome to discuss on such NG :). F -- http://mail.python.org/mailman/listinfo/python-list
Re: [OT] a little about regex
On Friday 20 October 2006 02:40, Ron Adam wrote: I see, is this a cleanup script to remove the least wanted items? Yes. Probably will remain in this mode for a while. I'm not prepaired to bring out a new algorithm Or is it a bit of both? Why the score? As exposed on another post. There should be a way to define a deny/allow with some particular exception.( I.e deny all .com but not [EMAIL PROTECTED]) I would think the allow(keep?) filters would always have priority over deny filters. It's a term which discerning capacity are involved. The previous post got this point up. I think to allow all .uk (let us say) but not info.uk (all reference are purely meant as example). Therefore if applying regex denial on .info.uk surely that doesn't match only .uk. I think keeping the allow filter seperate from the deny filter is good. Agreed with you. Simply I was supposing the regex can do negative matching. You might be able to merge the header lines and run the filters across the whole header at once instead of each line. I got into this idea, which is good, I still need a bit of thinking to code it. It need to remember what will be the right separator between fields, otherwise may cause problems with different charset. Actually I've problem on issuing the command to imap server to flag Deleted the message which count as spam. I only know the message I can't help you here. Sorry. Found it :), by tryfail. BTW whose Fred? news://news.cox.net:119/[EMAIL PROTECTED] I can't link foreigner NG than my isp giving me. I'm curious and I'll give it a try. F -- http://mail.python.org/mailman/listinfo/python-list
Re: [OT] a little about regex
*** Your mail has been scanned by InterScan MSS. *** On Wednesday 18 October 2006 15:32, Ron Adam wrote: |Instead of using two separate if's, Use an if - elif and be sure to test Thank you, Ron, for the input :) I'll examine also in this mode. Meanwhile I had faced the total disaster :) of deleting all my emails from all server ;( (I've saved them locally, luckly :) ) |It's not exactly clear on what output you are seeking. If you want 0 for | not filtered and 1 for filtered, then look to Freds Hint. Actually the return code is like herein: if _filter(hdrs,allow,deny): # allow and deny are objects prepared by re.compile(pattern) _del(Num_of_Email) In short, it means unwanted to be deleted. And now the function is : def _filter(msg,al,dn): Filter try to classify a list of lines for a set of compiled patterns. a = 0 for hdrline in msg: # deny has the first priority and stop any further searching. Score 10 #times if dn.search(hdrline): return len(msg) * 10 if al.search(hdrline): return 0 a += 1 return a # it returns with a score of rejected matches or zero if none The patterns are taken from a configuration file. Those with Axx ='pattern' are allowing streams the others are Dxx to block under different criteria. Here they're : [Filters] A01 = ^From:.*\.it\b A02 = ^(To|Cc):.*frioio@ A03 = ^(To|Cc):.*the_sting@ A04 = ^(To|Cc):.*calm_me_or_die@ A05 = ^(To|Cc):.*further@ A06 = ^From:.*\.za\b D01 = ^From:.*\.co\.au\b D02 = ^Subject:.*\*\*\*SPAM\*\*\* *A bit of fake in order to get some privacy* :) I'm using configparser to fetch their value and they're are joint by : allow = re.compile('|'.join([k[1] for k in ifil if k[0] is 'a'])) deny = re.compile('|'.join([k[1] for k in ifil if k[0] is 'd'])) ifil is the input filter's section. At this point I suppose that I have realized the right thing, just I'm a bit curious to know if ithere's a better chance and realize a single regex compilation for all of the options. Basically the program will work, in term of filtering as per config and sincronizing with local $HOME/Mail/trash (configurable path). This last option will remove emails on the server for those that are in the local trash. Todo = backup local and remote emails for those filtered as good. multithread to connect all server in parallel SSL for POP3 and IMAP4 as well Actually I've problem on issuing the command to imap server to flag Deleted the message which count as spam. I only know the message details but what is the correct command is a bit obscure, for me. BTW whose Fred? F -- http://mail.python.org/mailman/listinfo/python-list
Re: a little about regex
*** Your mail has been scanned by InterScan MSS. *** On Wednesday 18 October 2006 16:43, Rob Wolfe wrote: |def filter(adr): # note that filter is a builtin function also | import re I didn't know it, but my function _is_ starting by underscore (a bit of localization :) ) | allow = re.compile(r'.*(?!\.com)\.my(|$)') # negative lookbehind | deny = re.compile(r'.*\.com\.my(|$)') Great, it works perfectly. I found my errors. I didn't use r ahead of the patterns and i was close to the 'allow' pattern but didn't give positive result and KregexEditor reported wrong way. This specially because of '' inside the stream. I thing that is not a normal regex input. It's only python valid. Am I right? More details are the previous thread. F -- http://mail.python.org/mailman/listinfo/python-list
Re: a little about regex
Fulvio wrote: Great, it works perfectly. I found my errors. I didn't use r ahead of the patterns and i was close to the 'allow' pattern but didn't give positive result and KregexEditor reported wrong way. This specially because of '' inside the stream. I thing that is not a normal regex input. It's only python valid. Am I right? The sequence inside (?...) is an extension notation specific to python. Regards, Rob -- http://mail.python.org/mailman/listinfo/python-list
Re: [OT] a little about regex
Fulvio wrote: *** Your mail has been scanned by InterScan MSS. *** On Wednesday 18 October 2006 15:32, Ron Adam wrote: |Instead of using two separate if's, Use an if - elif and be sure to test Thank you, Ron, for the input :) I'll examine also in this mode. Meanwhile I had faced the total disaster :) of deleting all my emails from all server ;( (I've saved them locally, luckly :) ) |It's not exactly clear on what output you are seeking. If you want 0 for | not filtered and 1 for filtered, then look to Freds Hint. Actually the return code is like herein: if _filter(hdrs,allow,deny): # allow and deny are objects prepared by re.compile(pattern) _del(Num_of_Email) In short, it means unwanted to be deleted. And now the function is : def _filter(msg,al,dn): Filter try to classify a list of lines for a set of compiled patterns. a = 0 for hdrline in msg: # deny has the first priority and stop any further searching. Score 10 #times if dn.search(hdrline): return len(msg) * 10 if al.search(hdrline): return 0 a += 1 return a # it returns with a score of rejected matches or zero if none I see, is this a cleanup script to remove the least wanted items? The allow/deny caused me to think it was more along the lines of a white/black list. Where as keep/discard would be terms more suitable to cleaning out items already allowed. Or is it a bit of both? Why the score? Just curious, I don't think I have any suggestions that will help in any specific ways. I would think the allow(keep?) filters would always have priority over deny filters. The patterns are taken from a configuration file. Those with Axx ='pattern' are allowing streams the others are Dxx to block under different criteria. Here they're : [Filters] A01 = ^From:.*\.it\b A02 = ^(To|Cc):.*frioio@ A03 = ^(To|Cc):.*the_sting@ A04 = ^(To|Cc):.*calm_me_or_die@ A05 = ^(To|Cc):.*further@ A06 = ^From:.*\.za\b D01 = ^From:.*\.co\.au\b D02 = ^Subject:.*\*\*\*SPAM\*\*\* *A bit of fake in order to get some privacy* :) I'm using configparser to fetch their value and they're are joint by : allow = re.compile('|'.join([k[1] for k in ifil if k[0] is 'a'])) deny = re.compile('|'.join([k[1] for k in ifil if k[0] is 'd'])) ifil is the input filter's section. At this point I suppose that I have realized the right thing, just I'm a bit curious to know if ithere's a better chance and realize a single regex compilation for all of the options. I think keeping the allow filter seperate from the deny filter is good. You might be able to merge the header lines and run the filters across the whole header at once instead of each line. Basically the program will work, in term of filtering as per config and sincronizing with local $HOME/Mail/trash (configurable path). This last option will remove emails on the server for those that are in the local trash. Todo = backup local and remote emails for those filtered as good. multithread to connect all server in parallel SSL for POP3 and IMAP4 as well Actually I've problem on issuing the command to imap server to flag Deleted the message which count as spam. I only know the message details but what is the correct command is a bit obscure, for me. I can't help you here. Sorry. BTW whose Fred? F Fredrik see... news://news.cox.net:119/[EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
[OT] a little about regex
*** Your mail has been scanned by InterScan MSS. *** Hello, I'm trying to get working an assertion which filter address from some domain but if it's prefixed by '.com'. Even trying to put the result in a negate test I can't get the wanted result. The tought in program term : def filter(adr): ... import re ... allow = re.compile('.*\.my(|$)') ... deny = re.compile('.*\.com\.my(|$)') ... cnt = 0 ... if deny.search(adr): cnt += 1 ... if allow.search(adr): cnt += 1 ... return cnt ... filter('[EMAIL PROTECTED]') 2 filter('[EMAIL PROTECTED]') 1 Seem that I miss some better regex implementation to avoid that both of the filters taking action. I'm thinking of lookbehind (negative or positive) option, but I think I couldn't realize it yet. I think the compilation should either allow have no '.com' before '.my' or deny should have _only_ '.com' before '.my'. Sorry I don't get the correct sintax to do it. Suggestions are welcome. F -- http://mail.python.org/mailman/listinfo/python-list
Re: [OT] a little about regex
Fulvio wrote: ... if deny.search(adr): cnt += 1 ... if allow.search(adr): cnt += 1 hint: under what circumstances are cnt decremented in the above snippet? /F -- http://mail.python.org/mailman/listinfo/python-list
Re: [OT] a little about regex
Fulvio wrote: *** Your mail has been scanned by InterScan MSS. *** Hello, I'm trying to get working an assertion which filter address from some domain but if it's prefixed by '.com'. Even trying to put the result in a negate test I can't get the wanted result. The tought in program term : def filter(adr): ... import re ... allow = re.compile('.*\.my(|$)') ... deny = re.compile('.*\.com\.my(|$)') ... cnt = 0 ... if deny.search(adr): cnt += 1 ... if allow.search(adr): cnt += 1 ... return cnt ... filter('[EMAIL PROTECTED]') 2 filter('[EMAIL PROTECTED]') 1 Seem that I miss some better regex implementation to avoid that both of the filters taking action. I'm thinking of lookbehind (negative or positive) option, but I think I couldn't realize it yet. I think the compilation should either allow have no '.com' before '.my' or deny should have _only_ '.com' before '.my'. Sorry I don't get the correct sintax to do it. Suggestions are welcome. F Instead of using two separate if's, Use an if - elif and be sure to test the narrower filter first. (You have them in the correct order) That way it will skip the more general filter and not increment cnt twice. It's not exactly clear on what output you are seeking. If you want 0 for not filtered and 1 for filtered, then look to Freds Hint. Or are you writing a test at the moment, a 1 means it only passed one filter so you know your filters are working as designed? Another approach would be to assign values for filtered, accepted, and undefined and set those accordingly instead of incrementing and decrementing a counter. Cheers, Ron -- http://mail.python.org/mailman/listinfo/python-list
Re: a little about regex
Fulvio wrote: I'm trying to get working an assertion which filter address from some domain but if it's prefixed by '.com'. Even trying to put the result in a negate test I can't get the wanted result. [...] Seem that I miss some better regex implementation to avoid that both of the filters taking action. I'm thinking of lookbehind (negative or positive) option, but I think I couldn't realize it yet. I think the compilation should either allow have no '.com' before '.my' or deny should have _only_ '.com' before '.my'. Sorry I don't get the correct sintax to do it. Suggestions are welcome. Try this: def filter(adr):# note that filter is a builtin function also import re allow = re.compile(r'.*(?!\.com)\.my(|$)') # negative lookbehind deny = re.compile(r'.*\.com\.my(|$)') cnt = 0 if deny.search(adr): cnt += 1 if allow.search(adr): cnt += 1 return cnt HTH, Rob -- http://mail.python.org/mailman/listinfo/python-list
Re: a little about regex
Rob Wolfe wrote: ... def filter(adr):# note that filter is a builtin function also import re allow = re.compile(r'.*(?!\.com)\.my(|$)') # negative lookbehind deny = re.compile(r'.*\.com\.my(|$)') cnt = 0 if deny.search(adr): cnt += 1 if allow.search(adr): cnt += 1 return cnt Which makes the 'deny' code here redundant so in this case the function could be reduced to: import re def allow(adr):# note that filter is a builtin function also allow = re.compile(r'.*(?!\.com)\.my(|$)') # negative lookbehind if allow.search(adr): return True return False Though having the explicit allow and deny expressions may make what's going on clearer than the fairly esoteric negative lookbehind. -- http://mail.python.org/mailman/listinfo/python-list