You could try restricting the number of characters for the actual domain. I
would suggest something like this:
http\:\/\/www.+\.com\..{4,15}\.com
Also in many cases the www will not be present and the real domain will not be
a .com so you would need to use something like this:
http\:\/\/.+\.com\..{4,15}\.(net|com|info|biz|co|cn)
There are also many TLD you want to check and I would think in most cases it
would point to some URL add the extra /
http\:\/\/.+\.com\..{4,15}\..{2,4}/
Run this as a test let's see if we get any false positives and we can take a
look at it again to tweak.
David
-----Original Message-----
From: Rick Davidson [mailto:[email protected]]
Sent: Thursday, November 03, 2011 10:38 PM
To: [email protected]
Subject: RE: [Declude.JunkMail] Regex Greed Issue
well based on your response I guessed you couldn't reproduce it with the
example I sent, I confirmed that, and I am unable to trick that regex, however
it does catch messages it shouldnt.
here is the log entry for the example message
11/03/2011 15:14:07.489 008080891 Triggered body PCRE filter TEST :
http://www.facebook.com/n/?permalink.php&id=3D1209018066&story_fbid=3D2337=
84096686420&mid=3D51cf32eG5af347a420ebGae7c0bG52&bcode=3Dln1Ayh0a&n_m=3Dsc=
ollins%40nat.com You can now tag your friends in your status or post. Type @
and then type = the friend's name. For example: "Had lunch with @John Smith."
Thanks, The Facebook Team
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D This message was sent to
[email protected]. If you don't want to receive = these emails from Facebook in
the future, please follow the link below to = unsubscribe.
http://www.facebook.com [weight -> 0]
I will try to get a few more examples with the original message
--
Rick
-----Original Message-----
From: David Barker [mailto:[email protected]]
Sent: Thursday, November 03, 2011 9:00 PM
To: [email protected]
Subject: RE: [Declude.JunkMail] Regex Greed Issue
Hi Rick,
Are you sure your regex catches the long URL how did you test it ?
David
-----Original Message-----
From: Rick Davidson [mailto:[email protected]]
Sent: Thursday, November 03, 2011 6:38 PM
To: [email protected]
Subject: [Declude.JunkMail] Regex Greed Issue
I am trying to use the following regex to catch phishing URLs like
http://www.usps.com.scam.com
http\:\/\/www.*?\.com\..*?\.com
The issue is the question marks do not stop the greediness of the *
it will catch
http://www.facebook.com/n/?permalink.php&id=1209018066&story_fbid=233784096686420&mid=f347a420ebGae7c0bG52&bcode=ln1Ayh0a&n_m=xxxxxx%40nat.com
it seems that it is not supported in PCRE is there a work around?
--
Rick
CONFIDENTIALITY NOTICE
This e-mail message and any attachments contain confidential and/or privileged
information for the sole use of the intended recipient. If you are not the
intended recipient, you may not read, disseminate, distribute or copy this
e-mail message or any attachments. Please notify the sender immediately by
reply e-mail if you received this e-mail message by mistake and delete this
e-mail message and any attachments from your system. E-mail transmission
cannot be guaranteed to be secure or error-free as information could be
intercepted, corrupted, lost, destroyed, delayed, incomplete, or contain
viruses. The sender, therefore, does not accept liability for any errors or
omissions in the contents of this e-mail message or any attachments, which
arise as a result of e-mail transmission. If verification is required, please
request a hard-copy version.
-. .- -
You have received this e-mail due to a past or current transaction or as a
result of our efforts to keep you in touch with current developments affecting
your industry. If you wish to unsubscribe from any future general information
mailings, please click the 'Reply' button and add the word 'UNSUBSCRIBE' to the
subject of your response.
---
This E-mail came from the Declude.JunkMail mailing list. To unsubscribe, just
send an E-mail to [email protected], and type "unsubscribe
Declude.JunkMail". The archives can be found at http://www.mail-archive.com.
---
This E-mail came from the Declude.JunkMail mailing list. To
unsubscribe, just send an E-mail to [email protected], and
type "unsubscribe Declude.JunkMail". The archives can be found
at http://www.mail-archive.com.
You have received this e-mail due to a past or current transaction or as a
result of our efforts to keep you in touch with current developments affecting
your industry. If you wish to unsubscribe from any future general information
mailings, please click the 'Reply' button and add the word 'UNSUBSCRIBE' to the
subject of your response.
---
This E-mail came from the Declude.JunkMail mailing list. To
unsubscribe, just send an E-mail to [email protected], and
type "unsubscribe Declude.JunkMail". The archives can be found
at http://www.mail-archive.com.
---
This E-mail came from the Declude.JunkMail mailing list. To
unsubscribe, just send an E-mail to [email protected], and
type "unsubscribe Declude.JunkMail". The archives can be found
at http://www.mail-archive.com.