Re: FPs on URI_HEX NUMERIC_HTTP_ADDR
On 11/9/2014 11:07 AM, David B Funk wrote: On Sun, 9 Nov 2014, David B Funk wrote: For NUMERIC_HTTP_ADDR the rule is: /^https?\:\/\/\d{7}/is If that pattern were terminated like: /^https?\:\/\/\d{7}(?::\d+)?(?:\/|$)/is it should prevent the FPs (hopefully with out destroying its effectiveness) Oops, for that new formulation it would actually need to be: /^https?\:\/\/\d{7,10}(?::\d+)?(?:\/|$)/is This rule is currently scored very low: score NUMERIC_HTTP_ADDR 0.000 0.001 0.001 1.242 Can you give an example of a real legitimate domain that is being falsely marked by this? Otherwise I wouldn't be too worried because this rule would have to be in conjunction with a good bit more in order to get flagged as an FP. Your proposed change is very minimal in terms of the strings I would expect it to hit on.
Re: FPs on URI_HEX NUMERIC_HTTP_ADDR
On Wed, 12 Nov 2014, Joe Quinn wrote: On 11/9/2014 11:07 AM, David B Funk wrote: On Sun, 9 Nov 2014, David B Funk wrote: For NUMERIC_HTTP_ADDR the rule is: /^https?\:\/\/\d{7}/is If that pattern were terminated like: /^https?\:\/\/\d{7}(?::\d+)?(?:\/|$)/is it should prevent the FPs (hopefully with out destroying its effectiveness) Oops, for that new formulation it would actually need to be: /^https?\:\/\/\d{7,10}(?::\d+)?(?:\/|$)/is This rule is currently scored very low: score NUMERIC_HTTP_ADDR 0.000 0.001 0.001 1.242 Can you give an example of a real legitimate domain that is being falsely marked by this? Otherwise I wouldn't be too worried because this rule would have to be in conjunction with a good bit more in order to get flagged as an FP. Your proposed change is very minimal in terms of the strings I would expect it to hit on. In my first message I included an example URL from an Amtrak ticket notification that it fired on. My proposed change makes the different between it FPing on a URL that has a host part that starts with a long number and it only firing on URLs with a decimal encoded IP address (which I assume was the intention of the rule's original author). It's mis-firing enough that its S/O ratio is down to 0.3244 based upon my mail flows (IE it's firing on twice as much ham as spam). That same message also fired on URI_HEX, URIBL_RHS_DOB, STYLE_GIBBERISH, and MPART_ALT_DIFF which were enough to overcome the BAYES_00 and cause an FP. I can put the message up on pastbin but I'll need to anonymize it first as it does have personal ticket info in it. -- Dave Funk University of Iowa dbfunk (at) engineering.uiowa.eduCollege of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include std_disclaimer.h Better is not better, 'standard' is better. B{
Re: FPs on URI_HEX NUMERIC_HTTP_ADDR
On Wed, 12 Nov 2014, David B Funk wrote: In my first message I included an example URL from an Amtrak ticket notification that it fired on. ... That same message also fired on URI_HEX, URIBL_RHS_DOB, STYLE_GIBBERISH, and MPART_ALT_DIFF which were enough to overcome the BAYES_00 and cause an FP. Another potential change is to have URI_HEX exclude purely-numeric strings: before: uri URI_HEX m%^https?://[^/?]*\b(?![0-9a-f]{0,12}[a-f]{3})[0-9a-f]{6,}\b%i after (perhaps): uri URI_HEX m%^https?://[^/?]*\b(?![0-9a-f]{0,12}[a-f]{3}|[0-9]{6,}\b)[0-9a-f]{6,}\b%i -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The [assault weapons] ban is the moral equivalent of banning red cars because they look too fast. -- Steve Chapman, Chicago Tribune --- 895 days since the first successful private support mission to ISS (SpaceX)
FPs on URI_HEX NUMERIC_HTTP_ADDR
Recently I've seen a bunch of FPs on URI_HEX NUMERIC_HTTP_ADDR thanks to some URLs that look like: https : // 4490379 . fls . doubleclick . net / activityi (extra spaces my addition, remove to see actual URL) These were embedded in some amtrack ticket confirmation messages. Looking at my logs, I see that the recent S/O ratios for those two rules have dropped below 0.5 (IE now hit more ham than spam). For NUMERIC_HTTP_ADDR the rule is: /^https?\:\/\/\d{7}/is If that pattern were terminated like: /^https?\:\/\/\d{7}(?::\d+)?(?:\/|$)/is it should prevent the FPs (hopefully with out destroying its effectiveness) -- Dave Funk University of Iowa dbfunk (at) engineering.uiowa.eduCollege of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include std_disclaimer.h Better is not better, 'standard' is better. B{
Re: FPs on URI_HEX NUMERIC_HTTP_ADDR
On Sun, 9 Nov 2014, David B Funk wrote: For NUMERIC_HTTP_ADDR the rule is: /^https?\:\/\/\d{7}/is If that pattern were terminated like: /^https?\:\/\/\d{7}(?::\d+)?(?:\/|$)/is it should prevent the FPs (hopefully with out destroying its effectiveness) Oops, for that new formulation it would actually need to be: /^https?\:\/\/\d{7,10}(?::\d+)?(?:\/|$)/is -- Dave Funk University of Iowa dbfunk (at) engineering.uiowa.eduCollege of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include std_disclaimer.h Better is not better, 'standard' is better. B{