subject:"a little about regex"

Re: a little about regex

2006-10-20 Thread Fulvio

On Wednesday 18 October 2006 23:05, Ant wrote:
     allow = re.compile(r'.*(?!\.com)\.my(|$)')  # negative lookbehind
     if allow.search(adr):
         return True
     return False

I'd point out that :
 allow = re.search(r'.*(?!\.com)\.my(|$)',adr)

Will do as yours, since the call to 're' class will do the compilation as here 
it's doing separately.

 Though having the explicit allow and deny expressions may make what's
 going on clearer than the fairly esoteric negative lookbehind.

This makes me think that your point is truly correct.
The option for my case is meant as  deny all except those are specified. 
Also may go viceversa. Therefore I should refine the way the filtering act.
In fact the (temporarily) ignored score is the base of the method to be 
applied.
Obviously here mainly we are talking about email addresses, so my intention is 
like the mailfilter concept, which means the program may block an entire 
domain but some are allowed and all from .my are allowed but not those 
from .com.my (mostly annoying emails :P )

At the sum of the view I've considered a flexible programming as much as I'm 
thinking that may be published some time to benefit for multiplatform user as 
python is.
In such perspective I'm a bit curious to know if exist sites on the web where 
small program are welcomed and people like me can express all of their 
ignorance about the mode of using python. For such ignorance I may concour 
for the Nobel Price :)

Also the News Group doesn't contemplate the idea to split into beginners and 
high level programmers (HLP). Of course the HLP are welcome to discuss on 
such NG :).

F
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: [OT] a little about regex

2006-10-20 Thread Fulvio

On Friday 20 October 2006 02:40, Ron Adam wrote:
 I see, is this a cleanup script to remove the least wanted items?

Yes. Probably will remain in this mode for a while.
I'm not prepaired to bring out a new algorithm

 Or is it a bit of both?  Why the score?

As exposed on another post. There should be a way to define a deny/allow with 
some particular exception.( I.e deny all .com but not 
[EMAIL PROTECTED])

 I would think the allow(keep?) filters would always have priority over deny
 filters.

It's a term which discerning capacity are involved. The previous post got this 
point up. I think to allow all .uk (let us say) but not info.uk (all 
reference are purely meant as example). Therefore if applying regex denial 
on .info.uk surely that doesn't match only .uk.


 I think keeping the allow filter seperate from the deny filter is good.
Agreed with you. Simply I was supposing the regex can do negative matching.

 You might be able to merge the header lines and run the filters across the
 whole header at once instead of each line.

I got into this idea, which is good, I still need a bit of thinking to code 
it. It need to remember what will be the right separator between fields, 
otherwise may cause problems with different charset.

  Actually I've problem on issuing the command to imap server to flag
  Deleted the message which count as spam. I only know the message

 I can't help you here.  Sorry.

Found it :), by tryfail.

  BTW whose Fred?
    
 news://news.cox.net:119/[EMAIL PROTECTED]

I can't link foreigner NG than my isp giving me. I'm curious and I'll give it  
a try.

F

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: [OT] a little about regex

2006-10-19 Thread Fulvio

***
Your mail has been scanned by InterScan MSS.
***


On Wednesday 18 October 2006 15:32, Ron Adam wrote:

 |Instead of using two separate if's, Use an if - elif and be sure to test

Thank you, Ron, for the input :)
I'll examine also in this mode. Meanwhile I had faced the total disaster :) of 
deleting all my emails from all server ;(
(I've saved them locally, luckly :) )

 |It's not exactly clear on what output you are seeking.  If you want 0 for
 | not filtered and 1 for filtered, then look to Freds Hint.

Actually the return code is like herein:

if _filter(hdrs,allow,deny):
# allow and deny are objects prepared by re.compile(pattern)
_del(Num_of_Email)

In short, it means unwanted to be deleted. 
And now the function is :

def _filter(msg,al,dn):
 Filter try to classify a list of lines for a set of compiled
 
 patterns.
a = 0
for hdrline in msg:
# deny has the first priority and stop any further searching. Score 10 
 #times
if dn.search(hdrline): return len(msg) * 10
if al.search(hdrline): return 0
a += 1
return a # it returns with a score of rejected matches or zero if none


The patterns are taken from a configuration file. Those with Axx ='pattern' 
are allowing streams the others are Dxx to block under different criteria.
Here they're :

[Filters]
A01 = ^From:.*\.it\b
A02 = ^(To|Cc):.*frioio@
A03 = ^(To|Cc):.*the_sting@
A04 = ^(To|Cc):.*calm_me_or_die@
A05 = ^(To|Cc):.*further@
A06 = ^From:.*\.za\b
D01 = ^From:.*\.co\.au\b
D02 = ^Subject:.*\*\*\*SPAM\*\*\*

*A bit of fake in order to get some privacy* :)
I'm using configparser to fetch their value and they're are joint by :

allow = re.compile('|'.join([k[1] for k in ifil if k[0] is 'a']))
deny = re.compile('|'.join([k[1] for k in ifil if k[0] is 'd']))

ifil is the input filter's section.

At this point I suppose that I have realized the right thing, just I'm a bit 
curious to know if ithere's a better chance and realize a single regex 
compilation for all of the options.
Basically the program will work, in term of filtering as per config and 
sincronizing with local $HOME/Mail/trash (configurable path). This last 
option will remove emails on the server for those that are in the local 
trash.
Todo = backup local and remote emails for those filtered as good.
multithread to connect all server in parallel
SSL for POP3 and IMAP4 as well
Actually I've problem on issuing the command to imap server to flag Deleted 
the message which count as spam. I only know the message details but what 
is the correct command is a bit obscure, for me.
BTW whose Fred?

F


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: a little about regex

2006-10-19 Thread Fulvio

***
Your mail has been scanned by InterScan MSS.
***


On Wednesday 18 October 2006 16:43, Rob Wolfe wrote:

 |def filter(adr):    # note that filter is a builtin function also
 |    import re

I didn't know it, but my function _is_  starting by underscore (a bit of 
localization :) )

 |    allow = re.compile(r'.*(?!\.com)\.my(|$)')  # negative lookbehind
 |    deny = re.compile(r'.*\.com\.my(|$)')

Great,  it works perfectly. I found my errors.
I didn't use r ahead of the patterns and i was close to the 'allow' pattern 
but didn't give positive result and KregexEditor reported wrong way. This 
specially because of '' inside the stream. I thing that is not a normal 
regex input. It's only python valid. Am I right?

More details are the previous thread.

F


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: a little about regex

2006-10-19 Thread Rob Wolfe


Fulvio wrote:

 Great,  it works perfectly. I found my errors.
 I didn't use r ahead of the patterns and i was close to the 'allow' pattern
 but didn't give positive result and KregexEditor reported wrong way. This
 specially because of '' inside the stream. I thing that is not a normal
 regex input. It's only python valid. Am I right?

The sequence inside (?...) is an extension notation specific to
python.

Regards,
Rob

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: [OT] a little about regex

2006-10-19 Thread Ron Adam

Fulvio wrote:
 ***
 Your mail has been scanned by InterScan MSS.
 ***
 
 
 On Wednesday 18 October 2006 15:32, Ron Adam wrote:
 
 |Instead of using two separate if's, Use an if - elif and be sure to test
 
 Thank you, Ron, for the input :)
 I'll examine also in this mode. Meanwhile I had faced the total disaster :) 
 of 
 deleting all my emails from all server ;(
 (I've saved them locally, luckly :) )
 
 |It's not exactly clear on what output you are seeking.  If you want 0 for
 | not filtered and 1 for filtered, then look to Freds Hint.
 
 Actually the return code is like herein:
 
 if _filter(hdrs,allow,deny):
 # allow and deny are objects prepared by re.compile(pattern)
 _del(Num_of_Email)
 
 In short, it means unwanted to be deleted. 
 And now the function is :
 
 def _filter(msg,al,dn):
  Filter try to classify a list of lines for a set of compiled  

  patterns.
 a = 0
 for hdrline in msg:
 # deny has the first priority and stop any further searching. Score 
 10 
  #times
 if dn.search(hdrline): return len(msg) * 10
 if al.search(hdrline): return 0
 a += 1
 return a # it returns with a score of rejected matches or zero if none

I see, is this a cleanup script to remove the least wanted items?

The allow/deny caused me to think it was more along the lines of a white/black 
list.  Where as keep/discard would be terms more suitable to cleaning out items 
already allowed.

Or is it a bit of both?  Why the score?

Just curious, I don't think I have any suggestions that will help in any 
specific ways.

I would think the allow(keep?) filters would always have priority over deny 
filters.


 The patterns are taken from a configuration file. Those with Axx ='pattern' 
 are allowing streams the others are Dxx to block under different criteria.
 Here they're :
 
 [Filters]
 A01 = ^From:.*\.it\b
 A02 = ^(To|Cc):.*frioio@
 A03 = ^(To|Cc):.*the_sting@
 A04 = ^(To|Cc):.*calm_me_or_die@
 A05 = ^(To|Cc):.*further@
 A06 = ^From:.*\.za\b
 D01 = ^From:.*\.co\.au\b
 D02 = ^Subject:.*\*\*\*SPAM\*\*\*
 
 *A bit of fake in order to get some privacy* :)
 I'm using configparser to fetch their value and they're are joint by :
 
 allow = re.compile('|'.join([k[1] for k in ifil if k[0] is 'a']))
 deny = re.compile('|'.join([k[1] for k in ifil if k[0] is 'd']))
 
 ifil is the input filter's section.
 
 At this point I suppose that I have realized the right thing, just I'm a bit 
 curious to know if ithere's a better chance and realize a single regex 
 compilation for all of the options.

I think keeping the allow filter seperate from the deny filter is good.

You might be able to merge the header lines and run the filters across the 
whole 
header at once instead of each line.

 Basically the program will work, in term of filtering as per config and 
 sincronizing with local $HOME/Mail/trash (configurable path). This last 
 option will remove emails on the server for those that are in the local 
 trash.
 Todo = backup local and remote emails for those filtered as good.
 multithread to connect all server in parallel
 SSL for POP3 and IMAP4 as well
 Actually I've problem on issuing the command to imap server to flag Deleted 
 the message which count as spam. I only know the message details but what 
 is the correct command is a bit obscure, for me.

I can't help you here.  Sorry.

 BTW whose Fred?
 
 F

Fredrik   see...

news://news.cox.net:119/[EMAIL PROTECTED]


-- 
http://mail.python.org/mailman/listinfo/python-list

[OT] a little about regex

2006-10-18 Thread Fulvio

***
Your mail has been scanned by InterScan MSS.
***


Hello,

I'm trying to get working an assertion which filter address from some domain 
but if it's prefixed by '.com'.
Even trying to put the result in a negate test I can't get the wanted result.

The tought in program term :

 def filter(adr):
... import re
... allow = re.compile('.*\.my(|$)')
... deny = re.compile('.*\.com\.my(|$)')
... cnt = 0
... if deny.search(adr): cnt += 1
... if allow.search(adr): cnt += 1
... return cnt
...
 filter('[EMAIL PROTECTED]')
2
 filter('[EMAIL PROTECTED]')
1


Seem that I miss some better regex implementation to avoid that both of the 
filters taking action. I'm thinking of lookbehind (negative or positive) 
option, but I think I couldn't realize it yet.
I think the compilation should either allow have no '.com' before '.my' or 
deny should have _only_ '.com' before '.my'. Sorry I don't get the correct 
sintax to do it.

Suggestions are welcome.

F


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: [OT] a little about regex

2006-10-18 Thread Fredrik Lundh

Fulvio wrote:

 ... if deny.search(adr): cnt += 1
 ... if allow.search(adr): cnt += 1

hint: under what circumstances are cnt decremented in the above snippet?

/F

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: [OT] a little about regex

2006-10-18 Thread Ron Adam

Fulvio wrote:
 ***
 Your mail has been scanned by InterScan MSS.
 ***
 
 
 Hello,
 
 I'm trying to get working an assertion which filter address from some domain 
 but if it's prefixed by '.com'.
 Even trying to put the result in a negate test I can't get the wanted result.
 
 The tought in program term :
 
 def filter(adr):
 ... import re
 ... allow = re.compile('.*\.my(|$)')
 ... deny = re.compile('.*\.com\.my(|$)')
 ... cnt = 0
 ... if deny.search(adr): cnt += 1
 ... if allow.search(adr): cnt += 1
 ... return cnt
 ...
 filter('[EMAIL PROTECTED]')
 2
 filter('[EMAIL PROTECTED]')
 1
 
 Seem that I miss some better regex implementation to avoid that both of the 
 filters taking action. I'm thinking of lookbehind (negative or positive) 
 option, but I think I couldn't realize it yet.
 I think the compilation should either allow have no '.com' before '.my' or 
 deny should have _only_ '.com' before '.my'. Sorry I don't get the correct 
 sintax to do it.
 
 Suggestions are welcome.
 
 F

Instead of using two separate if's, Use an if - elif and be sure to test the 
narrower filter first.  (You have them in the correct order) That way it will 
skip the more general filter and not increment cnt twice.

It's not exactly clear on what output you are seeking.  If you want 0 for not 
filtered and 1 for filtered, then look to Freds Hint.

Or are you writing a test at the moment, a 1 means it only passed one filter so 
you know your filters are working as designed?

Another approach would be to assign values for filtered, accepted, and 
undefined 
and set those accordingly instead of incrementing and decrementing a counter.

Cheers,
   Ron



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: a little about regex

2006-10-18 Thread Rob Wolfe


Fulvio wrote:

 I'm trying to get working an assertion which filter address from some domain
 but if it's prefixed by '.com'.
 Even trying to put the result in a negate test I can't get the wanted result.

[...]

 Seem that I miss some better regex implementation to avoid that both of the
 filters taking action. I'm thinking of lookbehind (negative or positive)
 option, but I think I couldn't realize it yet.
 I think the compilation should either allow have no '.com' before '.my' or
 deny should have _only_ '.com' before '.my'. Sorry I don't get the correct
 sintax to do it.

 Suggestions are welcome.

Try this:

def filter(adr):# note that filter is a builtin function also
import re

allow = re.compile(r'.*(?!\.com)\.my(|$)')  # negative lookbehind
deny = re.compile(r'.*\.com\.my(|$)')
cnt = 0
if deny.search(adr): cnt += 1
if allow.search(adr): cnt += 1
return cnt


HTH,
Rob

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: a little about regex

2006-10-18 Thread Ant

Rob Wolfe wrote:
...
 def filter(adr):# note that filter is a builtin function also
 import re

 allow = re.compile(r'.*(?!\.com)\.my(|$)')  # negative lookbehind
 deny = re.compile(r'.*\.com\.my(|$)')
 cnt = 0
 if deny.search(adr): cnt += 1
 if allow.search(adr): cnt += 1
 return cnt

Which makes the 'deny' code here redundant so in this case the function
could be reduced to:

import re

def allow(adr):# note that filter is a builtin function also
allow = re.compile(r'.*(?!\.com)\.my(|$)')  # negative lookbehind
if allow.search(adr):
return True
return False

Though having the explicit allow and deny expressions may make what's
going on clearer than the fairly esoteric negative lookbehind.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: a little about regex

Re: [OT] a little about regex

Re: [OT] a little about regex

Re: a little about regex

Re: a little about regex

Re: [OT] a little about regex

[OT] a little about regex

Re: [OT] a little about regex

Re: [OT] a little about regex

Re: a little about regex

Re: a little about regex

11 matches

Site Navigation

Mail list logo

Footer information