Re: FPs on URI_HEX NUMERIC_HTTP_ADDR

2014-11-12 Thread Joe Quinn

On 11/9/2014 11:07 AM, David B Funk wrote:

On Sun, 9 Nov 2014, David B Funk wrote:


For NUMERIC_HTTP_ADDR the rule is: /^https?\:\/\/\d{7}/is
If that pattern were terminated like:
 /^https?\:\/\/\d{7}(?::\d+)?(?:\/|$)/is
it should prevent the FPs (hopefully with out destroying its 
effectiveness)


Oops, for that new formulation it would actually need to be:
  /^https?\:\/\/\d{7,10}(?::\d+)?(?:\/|$)/is



This rule is currently scored very low:
score NUMERIC_HTTP_ADDR 0.000 0.001 0.001 1.242

Can you give an example of a real legitimate domain that is being 
falsely marked by this? Otherwise I wouldn't be too worried because this 
rule would have to be in conjunction with a good bit more in order to 
get flagged as an FP. Your proposed change is very minimal in terms of 
the strings I would expect it to hit on.


Re: FPs on URI_HEX NUMERIC_HTTP_ADDR

2014-11-12 Thread David B Funk

On Wed, 12 Nov 2014, Joe Quinn wrote:


On 11/9/2014 11:07 AM, David B Funk wrote:

On Sun, 9 Nov 2014, David B Funk wrote:


For NUMERIC_HTTP_ADDR the rule is: /^https?\:\/\/\d{7}/is
If that pattern were terminated like:
 /^https?\:\/\/\d{7}(?::\d+)?(?:\/|$)/is
it should prevent the FPs (hopefully with out destroying its 
effectiveness)


Oops, for that new formulation it would actually need to be:
  /^https?\:\/\/\d{7,10}(?::\d+)?(?:\/|$)/is



This rule is currently scored very low:
score NUMERIC_HTTP_ADDR 0.000 0.001 0.001 1.242

Can you give an example of a real legitimate domain that is being falsely 
marked by this? Otherwise I wouldn't be too worried because this rule would 
have to be in conjunction with a good bit more in order to get flagged as an 
FP. Your proposed change is very minimal in terms of the strings I would 
expect it to hit on.


In my first message I included an example URL from an Amtrak ticket notification
that it fired on.

My proposed change makes the different between it FPing on a URL that has a
host part that starts with a long number and it only firing on URLs with
a decimal encoded IP address (which I assume was the intention of the rule's
original author).

It's mis-firing enough that its S/O ratio is down to 0.3244 based upon my mail
flows (IE it's firing on twice as much ham as spam).
That same message also fired on URI_HEX, URIBL_RHS_DOB, STYLE_GIBBERISH, and
MPART_ALT_DIFF which were enough to overcome the BAYES_00 and cause an FP.

I can put the message up on pastbin but I'll need to anonymize it first
as it does have personal ticket info in it.


--
Dave Funk  University of Iowa
dbfunk (at) engineering.uiowa.eduCollege of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include std_disclaimer.h
Better is not better, 'standard' is better. B{


Re: FPs on URI_HEX NUMERIC_HTTP_ADDR

2014-11-12 Thread John Hardin

On Wed, 12 Nov 2014, David B Funk wrote:

In my first message I included an example URL from an Amtrak ticket 
notification that it fired on.

...

That same message also fired on URI_HEX, URIBL_RHS_DOB, STYLE_GIBBERISH, and
MPART_ALT_DIFF which were enough to overcome the BAYES_00 and cause an FP.


Another potential change is to have URI_HEX exclude purely-numeric 
strings:


before:

uri URI_HEX m%^https?://[^/?]*\b(?![0-9a-f]{0,12}[a-f]{3})[0-9a-f]{6,}\b%i

after (perhaps):

uri URI_HEX 
m%^https?://[^/?]*\b(?![0-9a-f]{0,12}[a-f]{3}|[0-9]{6,}\b)[0-9a-f]{6,}\b%i


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The [assault weapons] ban is the moral equivalent of banning red
  cars because they look too fast.  -- Steve Chapman, Chicago Tribune
---
 895 days since the first successful private support mission to ISS (SpaceX)


Re: FPs on URI_HEX NUMERIC_HTTP_ADDR

2014-11-09 Thread David B Funk

On Sun, 9 Nov 2014, David B Funk wrote:


For NUMERIC_HTTP_ADDR the rule is: /^https?\:\/\/\d{7}/is
If that pattern were terminated like:
 /^https?\:\/\/\d{7}(?::\d+)?(?:\/|$)/is
it should prevent the FPs (hopefully with out destroying its effectiveness)


Oops, for that new formulation it would actually need to be:
  /^https?\:\/\/\d{7,10}(?::\d+)?(?:\/|$)/is


--
Dave Funk  University of Iowa
dbfunk (at) engineering.uiowa.eduCollege of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include std_disclaimer.h
Better is not better, 'standard' is better. B{