Matt,

Here are two analyses.  The 11-15 to 11-30 covers the period from when I
implemented your filters until I began using SKIPIFWEIGHT and MAXWEIGHT
which obviously has some effect on the stats.  The 11-15 to 12-21 expands
the prior set to include the additional filters.

There's also the weighting effect to consider.  While I run the OBFUSCATION
and Y!DIRECTED at hold weight (15), I use the GIBBERISH like the COMMENTS
test and accumulate weight per hit.  Since my SKIPIFWEIGHT is set to my
DELETE weight (60), the filters will run until that's reached.

These stats aren't a big deal to produce since its all in a SQL database.

I'll be implementing your new filter versions this coming weekend (with new
names to avoid commingling stats).  I do strip out comments since they
become meaningless as the filter contents are resequenced by my system.

George

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Matthew Bramble
> Sent: Monday, December 22, 2003 10:32 PM
> To: [EMAIL PROTECTED]
> Subject: Re: [Declude.JunkMail] GIBBERISH 2.0.1, single file 
> filter with END functionality. functionality. functionality. 
> functionality.
> 
> 
> George,
> 
> I think that logic can get you 95% of the way there with something as 
> convoluted as this, that is run only about 1/3 of the time, and 
> considering that you are only battling for about 2% of the processing 
> power required by this filter alone, which shouldn't be too terribly 
> much.  Removing the comment blocks would probably have a 
> bigger effect 
> :)  Changing to the new version of the filter should definitely help, 
> though this isn't by far my most weighty filter.
> 
> Here's something that I've very curious about though...the Y!DIRECTED 
> filter contains a bunch of BODY searches for obfuscated strings, 
> something that is almost totally redundant with the 
> OBFUSCATION filter.  
> I would be very curious to see how often those lines are hit because 
> they could be dumped for a measurable performance increase.  
> Any chance 
> you want to take a crack at that?  I wouldn't be surprised to 
> see them 
> never hit.
> 
> Matt
> 
> 
> 
> George Kulman wrote:
> 
> >Matt,
> >
> >I use LOGLEVEL HIGH for my data collection and analysis 
> stuff and, as Bill
> >pointed out, all hits are reflected.
> >
> >I've started to use SKIPIFWEIGHT.  The result of course is 
> that filters are
> >bypassed and the statistics are skewed.
> >
> >For example on Friday 12/19, 15291 emails were processed by 
> Declude on my
> >system.  Only 4604 were processed by the GIBBERISH filter.  
> Of these 1328
> >had a total of 3854 hits.
> >
> >My quandary now is to decide whether to use the new control 
> functions of
> >SKIPIFWEIGHT, MAXWEIGHT and END to reduce processing 
> overhead or to collect
> >a full set of evaluation data by letting everything run.  
> It's truly a
> >catch-22 situation.  If I collect all of the data, then I 
> gain no benefit,
> >since all of the processing takes place.  If I take advantage of the
> >analysis data, I reduce my processing workload but 
> effectively destroy the
> >validity of the statistical data which is now skewed by my filtering
> >control.
> >
> >George
> >
> >  
> >
> >>-----Original Message-----
> >>From: [EMAIL PROTECTED] 
> >>[mailto:[EMAIL PROTECTED] On Behalf Of 
> >>Matthew Bramble
> >>Sent: Monday, December 22, 2003 3:17 PM
> >>To: [EMAIL PROTECTED]
> >>Subject: Re: [Declude.JunkMail] GIBBERISH 2.0.1, single file 
> >>filter with END functionality. functionality.
> >>
> >>
> >>George,
> >>
> >>That's good data to have.  I would have to assume that 
> >>something tagged 
> >>as gibberish in the main test would be random, and that's 
> fairly well 
> >>indicated by the somewhat tight range of the two character 
> strings.  
> >>Unless you are using a logging feature that I'm not aware 
> of, you are 
> >>only showing the last hit that the filter produces, and 
> that explains 
> >>why the Z strings are mostly bunched at the top.  I've got 
> >>these ordered 
> >>alphabetically and will probably leave them there for 
> >>management purposes.
> >>
> >>The counterbalances though are definitely something that I 
> >>will use your 
> >>information for reordering them.  I believe I made an attempt 
> >>to order 
> >>these in the 2.0 filter version according to what I thought 
> would be 
> >>more common as well as what would be a faster search (BODY 
> >>searches are 
> >>slower than other things and will go lower in general, 
> though a BODY 
> >>search for base64 goes at the top because it is fairly 
> >>common). Because 
> >>of this and along with the above mentioned issue, the hit stats 
> >>therefore aren't a perfect indication of what would save the most 
> >>processing power, but it definitely helps if you just make some 
> >>assumptions.  I hadn't gathered any stats myself on the 
> >>Auto-generated 
> >>Codes that I added in about a month or so ago, and it's nice 
> >>to see that 
> >>they're getting hit since I was really just brainstorming 
> about what 
> >>types of things might be seen.  I might remove some entries 
> though if 
> >>they aren't showing being hit since they are BODY searches and 
> >>expensive.  I'll probably still leave that list of 
> >>Auto-generated Codes 
> >>in alphabetical order though for management purposes.  This 
> shouldn't 
> >>make a big difference considering that the most common one 
> >>only gets hit 
> >>about 1-3% of the time (don't know how common the filter 
> >>fails a later 
> >>line which ends up getting logged instead).
> >>
> >>If Declude did log every line that hits in a filter, you would see 
> >>things like GIBBERISH hitting some attachments thousands of 
> times per 
> >>message, and I don't think that's worth the trouble.  Data 
> like this 
> >>will make a much bigger impact on performance if you run it against 
> >>filters where hits can only occur once in a file due to 
> >>unique data or 
> >>exact matching.  Kami has a bunch of those.
> >>
> >>Thanks,
> >>
> >>Matt
> >>
> >>
> >>
> >>George Kulman wrote:
> >>
> >>    
> >>
> >>>Matt,
> >>>
> >>>I thought you might be interested in the attached data which 
> >>>      
> >>>
> >>analyzes the
> >>    
> >>
> >>>GIBBERISH and ANTI-GIBBERISH filters by number of hits on my 
> >>>      
> >>>
> >>system from
> >>    
> >>
> >>>11/15 through yesterday.
> >>>
> >>>If you're looking for "effectiveness" you should set the entries in
> >>>descending order of probability.  I use a variation which 
> >>>      
> >>>
> >>looks at date of
> >>    
> >>
> >>>most recent hit as well as hit count, although that's more 
> >>>      
> >>>
> >>important with
> >>    
> >>
> >>>filters that are being modified on a continual rather that a 
> >>>      
> >>>
> >>fairly static
> >>    
> >>
> >>>filter such as these two.
> >>>
> >>>George
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>>>-----Original Message-----
> >>>>From: [EMAIL PROTECTED] 
> >>>>[mailto:[EMAIL PROTECTED] On Behalf Of 
> >>>>Matthew Bramble
> >>>>Sent: Monday, December 22, 2003 9:52 AM
> >>>>To: [EMAIL PROTECTED]
> >>>>Subject: [Declude.JunkMail] GIBBERISH 2.0.1, single file 
> >>>>filter with END functionality.
> >>>>
> >>>>
> >>>>I've made some huge leaps forward recently in terms of the 
> >>>>        
> >>>>
> >>processing 
> >>    
> >>
> >>>>power required to run Declude with the custom filters that I have 
> >>>>installed.  This was done by way of the SKIPIFWEIGHT 
> functionality 
> >>>>introduced in the latest beta, but also by way of re-ordering 
> >>>>my filters 
> >>>>in the Global.cfg file so that the easiest to process custom 
> >>>>filters are 
> >>>>run first in the hopes of avoiding the need to run more 
> costly ones.
> >>>>
> >>>>This new version of GIBBERISH makes use of functionality 
> >>>>introduced in 
> >>>>the 1.77 beta, however the most recent interim release, 
> >>>>1.77i7, should 
> >>>>be used in order to guarantee proper operation (initial 
> >>>>versions would 
> >>>>always end processing, and effectively disabled the filters). 
> >>>>The END 
> >>>>functionality removes the need to have ANTI filters since the 
> >>>>filter can 
> >>>>be stopped before it gets to the main filter matches, and it also 
> >>>>presents another opportunity to save on the processing power 
> >>>>required to 
> >>>>run such things.  This also makes use of the MAXWEIGHT 
> >>>>functionality to 
> >>>>limit the max score as well as end processing once a single 
> >>>>hit has been 
> >>>>scored.  Note that the filter will only log (at the LOW 
> >>>>        
> >>>>
> >>setting) and 
> >>    
> >>
> >>>>show WARN actions when the filter is tripped and an END was not 
> >>>>hit...which is great!  No more looking at non-scoring custom 
> >>>>filters due 
> >>>>to counterbalances :D
> >>>>
> >>>>Please read through the file and follow these instructions if you 
> >>>>already have GIBBERISH installed:
> >>>>
> >>>>   1) Comment out the ANTI-GIBBERISH custom filter in your 
> >>>>        
> >>>>
> >>Global.cfg
> >>    
> >>
> >>>>   2) Change the score of the GIBBERISH filter to 0 in your 
> >>>>Global.cfg.
> >>>>   3) Change the scoring of the filter to match your 
> system (it is 
> >>>>scored by default for base 10 systems).  This can be done
> >>>>        by changing the MAXWEIGHT and Main Filter lines to 
> >>>>reflect the 
> >>>>multiple of 10 that your system is based on.
> >>>>   4) Change the SKIPIFWEIGHT score to reflect your delete 
> >>>>weight, or 
> >>>>whatever weight you would like for the filter to
> >>>>        be skipped if the system has already reached it before 
> >>>>processing the filter.
> >>>>
> >>>>The file can be downloaded from the following location:
> >>>>
> >>>>   
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>http://www.mailpure.com/software/decludefilters/gibberish/Gib
> >>>      
> >>>
> >berish_v2-0-1.z
> >  
> >
> >>ip
> >>
> >>Please report any issues with the new filter format.  As 
> soon as bugs 
> >>stop being reported, I will move to convert the other dual 
> file filters 
> >>into single file alternatives which make use of the END 
> functionality.  
> >>Until the functionality goes into a full release, I'm going 
> to continue 
> >>to primarily provide the old style filters on my site.
> >>
> >>Matt
> >> 
> >>
> >>    
> >>
> 
> 
> ---
> [This E-mail was scanned for viruses by Declude Virus 
(http://www.declude.com)]

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type "unsubscribe Declude.JunkMail".  The archives can be found
at http://www.mail-archive.com.

Attachment: Analysis queries.zip
Description: Zip compressed data

Reply via email to