[sniffer] New reference settings for GBUdb ranges.

2008-01-22 Thread Pete McNeil
Hello Sniffer Folks,

We have been researching/refining the default ranges for GBUdb. Here
are our latest reference settings. These are conservative for large
systems (500/min) and should be even more conservative for smaller
systems.

Smaller systems that experience lower message rates will tend
to have lower confidence numbers in their GBUdb due to fewer message
interactions. If you run a system that sees fewer than 500 messages
per minute then you may achieve higher capture rates before FPs with
lower confidence values in some of your ranges.

Another way smaller systems may adjust their GBUdb sensitivity is to
adjust the time between condensation from one day to two days (or
more) or to eliminate the time based trigger and rely on the memory
usage trigger instead (by triggering condensation events only when a
specific memory threshold has been reached). The latter method is
typically recommended for systems with fewer than 10 messages per
minute.

All of the above tuning recommendations are somewhat experimental
since GBUdb is relatively new and at present sparsely populated (about
300 participating nodes at present). As time goes on we will all learn
more about how to optimize GBUdb - please experiment cautiously and
scientifically (one change at a time and understand what has happened)
and please share your results.

Here is the current reference:

regions

white on-off='on' symbol='0'
edge probability='-1.0' confidence='0.4'/
edge probability='-0.8' confidence='1.0'/
panic on-off='on' rule-range='1000'/
/white

caution on-off='on' symbol='40'
edge probability='0.1' confidence='0.0'/
edge probability='0.8' confidence='0.3'/
/caution

black on-off='on' symbol='63'
edge probability='0.8' confidence='0.2'/
edge probability='0.8' confidence='1.0'/
truncate on-off='on' probability='0.9' peek-one-in='5' symbol='20'/
sample on-off='on' probability='0.8' grab-one-in='5' passthrough='no' 
passthrough-symbol='0'/
/black

/regions

If you are running the new SNF and you haven't checked your GBUdb
range settings in a while this might be a good time to make some
adjustments ;-) Some of the settings in previous releases were less
conservative and some were less aggressive -- all were backed by less
experience (of course).

The settings shown above are likely to become the default settings for
the production release, however we will continue to refine these
settings through our research prior to (and following) the production
release (planned in Q1).

Best,

_M

-- 
Pete McNeil
Chief Scientist,
Arm Research Labs, LLC.


#
This message is sent to you because you are subscribed to
  the mailing list sniffer@sortmonster.com.
To unsubscribe, E-mail to: [EMAIL PROTECTED]
To switch to the DIGEST mode, E-mail to [EMAIL PROTECTED]
To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
Send administrative queries to  [EMAIL PROTECTED]



[sniffer] Re: GBUdb question

2008-01-22 Thread Pi-Web - Frank Jensen

Hi Rob,

You can add the IPs to GBUdbIgnoreList.txt if you want sniffer to ignore the 
IPs.


Pete,

I have some questions about GBUdb

FIRST QUESTION:

I have several clients who forward over e-mails from ISP accounts. I 
have a system whereby I can pick out the original sending server IP. I 
then add that IP to the message in a special header. (this can vary by 
ISP and situation, but I've programmed my system to appropriately 
determine which IP is the original sending server IP. Next, I add a 
special custom header which points out that IP.


Would it be possible for MessageSniffer to grab the IP from a particular 
header (perhaps this header could be added as a node in the XML config 
file?). That way,  if/when that header is available in the message, 
Sniffer would then treat *that* IP as the sender's IP?


SECOND QUESTION:

Is it possible to tell Sniffer to NOT allow the possibility of 
truncating on a message-by-message basis, where this would be 
determined if a special command line switch were present. In fact, can 
Sniffer be further instructed to ONLY run pattern matching scanning 
and ignore the GBUdb for that particular message?


THIRD QUESTION:

Much of the spam I block doesn't run through Sniffer. Additionally, many 
of the messages that Sniffer blocks are spams sent via established ISPs 
whereas I already have those IPs in an extensive whitelist that I've 
built up over the years.


A 4% sampling of this whitelist can be found here:
http://invaluement.com/fourpercentofwhitelist.txt
(multiple the size of that by 25 to get an idea of the massive size of 
my IP whitelist)


Here is what I'd like to do which I believe would make my contribution 
to sniffer most effective:


(A) Have sniffer NOT automatically input data into GBUdb with each 
sniffer scan. (Is that possible?)


(B) Alternatively, whenever my spam filter marks a message as spam, it 
will issue the following command (but ONLY if that IP is NOT on my IP 
whitelist, and regardless of whether or not the message was run through 
sniffer):


SNFClient.exe -bad IP4Address

(If on my IP whitelist, it just won't do anything here.)

(C) If my spam filter marks a message as ham, then it will issue the 
following command (again, regardless of whether or not the message was 
run through sniffer)


SNFClient.exe -good IP4Address

**
**
I know that this puts more trust on me and my system, but I have also 
know that the quality of stats you'd receive from my system would vastly 
improved due to my abilities in this area and this would be a huge 
contribution to other Sniffer users over the norm. (I run one of the 
best RBLs and URI blacklists in the world... I know what I'm doing here!)


Can these things be done?

Rob McEwen



#
This message is sent to you because you are subscribed to
 the mailing list sniffer@sortmonster.com.
To unsubscribe, E-mail to: [EMAIL PROTECTED]
To switch to the DIGEST mode, E-mail to [EMAIL PROTECTED]
To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
Send administrative queries to  [EMAIL PROTECTED]





--
Mvh. Frank Jensen
[EMAIL PROTECTED]
www.pi.dk



Imponerende, fascinerende og kæmpe
Plakater f.eks. 149 x 149 = 629 kr
Vi kan også lave plakat fra dit digitale foto

www.plakatkunst.dk



#
This message is sent to you because you are subscribed to
 the mailing list sniffer@sortmonster.com.
To unsubscribe, E-mail to: [EMAIL PROTECTED]
To switch to the DIGEST mode, E-mail to [EMAIL PROTECTED]
To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
Send administrative queries to  [EMAIL PROTECTED]



[sniffer] Re: GBUdb question

2008-01-22 Thread Pete McNeil
Hello Rob,

Tuesday, January 22, 2008, 11:09:10 AM, you wrote:

 Pete,

 I have some questions about GBUdb

This may help:

http://kb.armresearch.com/index.php?title=Message_Sniffer.TechnicalDetails.GBUdb

 FIRST QUESTION:

 I have several clients who forward over e-mails from ISP accounts. I 
 have a system whereby I can pick out the original sending server IP. I
 then add that IP to the message in a special header. (this can vary by
 ISP and situation, but I've programmed my system to appropriately 
 determine which IP is the original sending server IP. Next, I add a 
 special custom header which points out that IP.

We are developing an auto-drill-down feature for GBUdb to assist in
automatically training GBUdb in this way. The auto drill feature will
add IPs of intermediate systems to the local ignore list based on
header directives. The theory is that GBUdb will be able to
automatically learn to ignore the intermediate nodes of mixed-source
ISPs in order to identify the original source of the message.

There is still some development work to do on this experimental
feature but we hope to include it in the upcoming release. Any
insights you can provide on reliably identifying these intermediate
servers would be very useful.

The current plan is to locate a specific tell tale string in the
Received header that is likely to be the source (based on current
knowledge). If the string is found then that header is disqualified
(and it's IP added to the ignore list) so that the next header becomes
the source candidate.

The tell tale string is presumed to be the domain portion (or
similar fragment) of the reverse DNS data in the Received header. So,
for example, if the top Received header contains .troublesome.isp.com
[ then that header would be disqualified as the source of the message
(for GBUdb purposes), it's IP would be added to the ignore
(infrastructure) list, and the next Received header would be
considered. Once all of the .troublesome.isp.com [ or similar
headers are exhausted then the next header is likely to be the actual
source (so the theory goes).

 Would it be possible for MessageSniffer to grab the IP from a particular
 header (perhaps this header could be added as a node in the XML config
 file?). That way,  if/when that header is available in the message, 
 Sniffer would then treat *that* IP as the sender's IP?

I will consider adding this to the feature request list. It probably
won't be added to the first version though -- we have a request freeze
in effect to ensure we get the production version out in Q1.

This is also a highly specialized request -- there aren't a lot of
systems out there that can accurately drill through delivery chains to
identify the original source of the message with any great accuracy --
so the number of folks who could use this feature would be pretty
small (if not one). Your use of the command line utility (described
below) seems more appropriate since in effect you want to eliminate
GBUdb's source detection features.

That said - I am anxious to support your work -

Please share an example of the header you would inject.

If it is possible to implement the feature quickly and reliably then I
will see what I can do to add it to the header directives engine.

 SECOND QUESTION:

 Is it possible to tell Sniffer to NOT allow the possibility of 
 truncating on a message-by-message basis, where this would be 
 determined if a special command line switch were present. In fact, can
 Sniffer be further instructed to ONLY run pattern matching scanning 
 and ignore the GBUdb for that particular message?

It is not possible to turn off truncate on a message by message basis.

It is possible to turn off truncate for all messages but not on a
message by message basis.

You can also create a header directive to cause GBUdb training to
ignore a message with a specific header (or specifically, if it finds
a specific string in a specific header).

 THIRD QUESTION:

 Much of the spam I block doesn't run through Sniffer. Additionally, many
 of the messages that Sniffer blocks are spams sent via established ISPs
 whereas I already have those IPs in an extensive whitelist that I've 
 built up over the years.

 A 4% sampling of this whitelist can be found here:
 http://invaluement.com/fourpercentofwhitelist.txt (multiple the size
 of that by 25 to get an idea of the massive size of my IP whitelist)

 Here is what I'd like to do which I believe would make my contribution
 to sniffer most effective:

 (A) Have sniffer NOT automatically input data into GBUdb with each 
 sniffer scan. (Is that possible?)

You could create header directives to selectively disable GBUdb
training.

You can also disable GBUdb training for all messages.

training on-off='off'

 (B) Alternatively, whenever my spam filter marks a message as spam, it
 will issue the following command (but ONLY if that IP is NOT on my IP 
 whitelist, and regardless of whether or not the message was run through
 sniffer):

 

[sniffer] Re: New reference settings for GBUdb ranges.

2008-01-22 Thread David Waller
Hi,

I think I must have missing something or been asleep. I've had a look at the
Sniffer site and to be honest I don't fully understand what GBUdb is. I've
read the technical details page but I don't see how it fits into the whole
scheme of things, if it's useful to me, and if it is, how to implement it. I
understand what it's trying to acheive but I can't see beyond that.

David



#
This message is sent to you because you are subscribed to
  the mailing list sniffer@sortmonster.com.
To unsubscribe, E-mail to: [EMAIL PROTECTED]
To switch to the DIGEST mode, E-mail to [EMAIL PROTECTED]
To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
Send administrative queries to  [EMAIL PROTECTED]



[sniffer] Re: GBUdb question

2008-01-22 Thread Rob McEwen

Pete McNeil wrote:

This may help:

http://kb.armresearch.com/index.php?title=Message_Sniffer.TechnicalDetails.GBUdb

  

I did read that first. It was helpful. I'll keep referring back.

We are developing an auto-drill-down feature for GBUdb to assist in
automatically training GBUdb in this way. The auto drill feature will
add IPs of intermediate systems to the local ignore list based on
header directives. The theory is that GBUdb will be able to
automatically learn to ignore the intermediate nodes of mixed-source
ISPs in order to identify the original source of the message.

There is still some development work to do on this experimental
feature but we hope to include it in the upcoming release. Any
insights you can provide on reliably identifying these intermediate
servers would be very useful.
I'm not confident that this will handle the forwarded messages 
scenarios that I described, which I have ready custom programmed for the 
specific narrow range of ways that this currently happens with my server.

Please share an example of the header you would inject.
  

Currently, I'm using the following:

X-RegEx-Original-IP: 127.0.0.1

(But X-RegEx-Original-IP was arbitrary. This was inherited by an 
antiquated anti-spam utility I used years ago. The X-RegEx-Original-IP 
part can change at any time. This would even be a header custom 
designated by Sniffer.)


Even better, another option would be for the IP to be passed to sniffer 
via the command line where sniffer would know to use that one and not 
bother trying to grab this from the header. Please consider that as a 
feature request.

It is not possible to turn off truncate on a message by message basis.

It is possible to turn off truncate for all messages but not on a
message by message basis.
  

that will suffice


Here is what I'd like to do which I believe would make my contribution
to sniffer most effective:

(A) Have sniffer NOT automatically input data into GBUdb with each 
sniffer scan. (Is that possible?)



You could create header directives to selectively disable GBUdb
training.

You can also disable GBUdb training for all messages.

training on-off='off'

  
That will work. But will this disable the SNFClient.exe -bad and 
SNFClient.exe -good tools?? and will this disable sharing of the data? 
Can data accumulated via these manual reportings be shared even if 
training is off?

That sounds very much like what these tools were designed for. However
the effect may not be what you intend.

If the IPs you track are not detected as the source IP by GBUdb then
it is likely to ignore the data during it's scans. It will evaluate
the statistics of the IP it believes to be the source. When it gets
that right it will find your data. When it gets that wrong it will
find no data (most likely) so GBUdb will be effectively inert in those
cases.

If your intent is simply to input this data into the GBUdb system so
that it is available as a resource then that will work - somewhat.

One other thought that I have is that you could use the command line
(or the ignore list) to mark the IPs on your internal white-list as
Infrastructure (ignore flag). This might effectively train GBUdb to
skip those IPs when finding the source of the message - and in any
case would render GBUdb inert for those IPs.
  
There are too many IPs on that whitelist (it might have been possible 
were it not that many of these entries are massive blocks of IPs).


Follow-up question...

If, therefore, I cannot stop GBUdb-processing for a particular message, 
but I turn off truncate for all messages, the way I see it, couldn't I 
simply ignore the GBUdb reporting for some particular messages? (might 
not be as efficient, but I'd get the same result I seek!) But in a case 
where truncate is turned off, if GBUdb reports a message as spam, AND 
content rules ALSO mark that message as spam, will the return code tell 
me that both GBUdb *and *rules caught the spam? Or do I get one code 
instead of the other (if so, which one?)


Thanks!

Rob McEwen



#
This message is sent to you because you are subscribed to
 the mailing list sniffer@sortmonster.com.
To unsubscribe, E-mail to: [EMAIL PROTECTED]
To switch to the DIGEST mode, E-mail to [EMAIL PROTECTED]
To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
Send administrative queries to  [EMAIL PROTECTED]



[sniffer] Re: New reference settings for GBUdb ranges.

2008-01-22 Thread Pete McNeil
Hello David,

Tuesday, January 22, 2008, 12:43:09 PM, you wrote:

 Hi,

 I think I must have missing something or been asleep. I've had a look at the
 Sniffer site and to be honest I don't fully understand what GBUdb is. I've
 read the technical details page but I don't see how it fits into the whole
 scheme of things, if it's useful to me, and if it is, how to implement it. I
 understand what it's trying to acheive but I can't see beyond that.

Think of GBUdb as an enhancement to the SNF scanning engine.

GBUdb keeps track of where messages come from and whether those
messages are spam or not. If they fail an SNF pattern rule then they
are considered to be spam. If they do not fail an SNF pattern rule
then the are not considered to be spam.

When a new message comes from a source that GBUdb knows about then it
SNF work better and faster.


Reducing Leakage:

If GBUdb knows that messages from a particular source are almost
always spam then SNF will detect the message as spam even if there is
no pattern rule yet. This helps reduce leakage.

That is-- new spam from old bots will generally get killed by GBUdb.


Reducing False Positives:

On the other side of things; if an SNF pattern rule tags a message
that comes from a trusted source then GBUdb will make sure that the
message gets through. This reduces false positives.

_
GBUdb has Friends:

One other thing that is important about GBUdb is that it doesn't work
alone -- it has friends. All of the GBUdb systems on the 'net share
what they know about message sources. This way when a spam bot starts
to send messages to a new system that's never seen it before the other
GBUdb systems can tell the new system that the message source (IP) is
bad so it doesn't have to start learning that information all on it's
own.

_
Faster and More Efficient:

In addition to reducing leakage and false positives, GBUdb also makes
message scanning go faster and take fewer resources. If GBUdb knows
that a message source is very, very bad then it will cause SNF to stop
scanning the message as soon as it sees the IP address that sent it.
This is the truncate feature. The result is that between 15% and 50%
of messages going through the SNF scanner will be handled almost
instantaneously - without bothering to look at most of the message.

Hope this helps,

_M

-- 
Pete McNeil
Chief Scientist,
Arm Research Labs, LLC.


#
This message is sent to you because you are subscribed to
  the mailing list sniffer@sortmonster.com.
To unsubscribe, E-mail to: [EMAIL PROTECTED]
To switch to the DIGEST mode, E-mail to [EMAIL PROTECTED]
To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
Send administrative queries to  [EMAIL PROTECTED]



[sniffer] Re: New reference settings for GBUdb ranges.

2008-01-22 Thread Pete McNeil
Hello David,

Ooops, I missed a question...

Tuesday, January 22, 2008, 12:43:09 PM, you wrote:

snip/

 ..., how to implement it.

GBUdb is built in to the new version of Message Sniffer. It is turned
on by default and the default settings work for just about everybody.

If you have any email gateways or an email address where you
legitimately receive spam (such as an abuse reporting address) then
you will want to tell GBUdb about those so that it doesn't get the
wrong idea about them.

If you have more questions then please let us know.

Hope this helps,

_M


-- 
Pete McNeil
Chief Scientist,
Arm Research Labs, LLC.


#
This message is sent to you because you are subscribed to
  the mailing list sniffer@sortmonster.com.
To unsubscribe, E-mail to: [EMAIL PROTECTED]
To switch to the DIGEST mode, E-mail to [EMAIL PROTECTED]
To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
Send administrative queries to  [EMAIL PROTECTED]



[sniffer] Re: GBUdb question

2008-01-22 Thread Pete McNeil
Hello Rob,

Tuesday, January 22, 2008, 1:11:00 PM, you wrote:

snip... about auto-drill-down/

 I'm not confident that this will handle the forwarded messages
 scenarios that I described, which I have ready custom programmed for
 the specific narrow range of ways that this currently happens with
 my server.

We're hopeful it will work for many cases. If you can identify cases
where it won't work please let us know.

 Please share an example of the header you would inject.
   
 Currently, I'm using the following:

 X-RegEx-Original-IP: 127.0.0.1

 (But X-RegEx-Original-IP was arbitrary. This was inherited by an 
 antiquated anti-spam utility I used years ago. The X-RegEx-Original-IP
 part can change at any time. This would even be a header custom 
 designated by Sniffer.)

That seems straight forward enough. Thanks.

 Even better, another option would be for the IP to be passed to sniffer
 via the command line where sniffer would know to use that one and not 
 bother trying to grab this from the header. Please consider that as a 
 feature request.

I will add that to the list.

snip about GBUdb training options (disabled training)/

 That will work. But will this disable the SNFClient.exe -bad and
 SNFClient.exe -good tools?? and will this disable sharing of the
 data? Can data accumulated via these manual reportings be shared
 even if  training is off?

The command line tools always work. When you report a good or bad
hit it has the same effect as GBUdb learning from a message scan.

The information will be stored and shared in exactly the same way.

When you turn off training you are only disabling the system's ability
to learn automatically from scanned messages. Inputs from the command
line utility are still retained.

snip/

 One other thought that I have is that you could use the command
 line (or the ignore list) to mark the IPs on your internal
 white-list as Infrastructure (ignore flag). This might effectively
 train GBUdb to skip those IPs when finding the source of the
 message - and in any case would render GBUdb inert for those IPs.
 There are too many IPs on that whitelist (it might have been possible 
 were it not that many of these entries are massive blocks of IPs).

Perhaps - that's up to you. However, the GBUdb system is designed to
handle large numbers of IPs without slowing down. It is not uncommon
to have significantly more than half a million IPs in GBUdb on systems
that handle 500 msg/min or more.

The ignore list file is intended to handle local infrastructure so
that if you lose your GBUdb data you can be assured that your local
resources are not tagged as bad sources accidentally.

Other IP records (ignore, good, bad, or ugly) can be entered via the
command line utility with the only real limit being the amount of RAM
you want to commit to the GBUdb.

To give you an idea of scalability, one of our spamtrap processors is
currently (typ) handling about 3000 msg/minute and has the following
GBUdb statistics:

gbudb
size bytes='109051904'/
records count='479671'/
utilization percent='96.7379'/
/gbudb


 Follow-up question...

 If, therefore, I cannot stop GBUdb-processing for a particular message,
 but I turn off truncate for all messages, the way I see it, couldn't I
 simply ignore the GBUdb reporting for some particular messages? (might
 not be as efficient, but I'd get the same result I seek!) But in a case
 where truncate is turned off, if GBUdb reports a message as spam, AND 
 content rules ALSO mark that message as spam, will the return code tell
 me that both GBUdb *and *rules caught the spam? Or do I get one code 
 instead of the other (if so, which one?)

If you turn off truncate then you will see the following results by
default in a conventional command-line implementation:

* For messages that match pattern rules you will see the pattern rule
result.

* If a message fails to match a pattern rule but would have been
truncated then it will be treated as black and you will get result
code 40.

* If a message fails to match a pattern rule but the IP falls in the
black range then you will get the black result code 40.

* If the message fails to match a pattern rule and the IP falls in the
caution range then you will get an bad IP result code 63. This is the
same result code you get from SNF when an IP pattern rule has matched.
IP pattern rules are deprecated and will be phased out over time -
GBUdb replaces them.

If you call SNF directly via XCI, or use the command line utility with
the -xhdr and capture the output then you also have the ability to
configure SNF to provide detailed information about the scan including
the GBUdb data and all available pattern matches. You could also mine
this data from the log files if you wish.

Note that you can set the x-header option to api and it will be
available to the XCI and command line interfaces without being
injected into the message.

--- One other thing ---

You can