Kelly Jones wrote:
> If I understand correctly, razor runs several "engines" on my
> email. Each engine "normalizes" my messages (takes out spaces and
> stuff?), hashes the normalized message, and then asks a razor server
> if the resulting hash is spam. Is this correct? If yes,
> 
> How can I tell how many engines my version of razor has?

Generally 2, e4 and e8.

> 
> How can I see my email after its been normalized, before its been hashed?
> 
> What kinds of hashes does razor use? MD5? SHA1?

e4 is a SHA1 of a sub-set of the message text of each mime section. The server
tells the client what subset of the message text to choose using the "ep4"
parameter. In theory this prevents spammers from adapting their messages to only
alter the sections of the message that razor is looking at.

e8 is some kind of custom hash of URLs found in the text. IIRC Vipul once
explained it depends on both the domain and on the path part of the URL, but is
much more "fuzzy" about path parts.


> I've done "razor_check -d -H", but didn't 100% understand the output.
> 
> What does this mean (from -H):
> 
> 1.0 e4: GuDG3rTj4vwLIGcyaJbtLbnrIUAA, ep4: 7542-10
> 1.1 e4: 2UCVUX8jE9jrHCJxn1xYSRLB1vEA, ep4: 7542-10
> 1.2 e4: GcePVVOdDWym2jn1EHMLVmZtVcwA, ep4: 7542-10

The first mime section (1.0), using the text-selection parameters "7542-10"
(whatever that means) generated a SHA1 hash of GuDG3rTj4vwLIGcyaJbtLbnrIUAA.

..
> 
> Does e4 mean engine 4?
Yes
 Why did it generate 3 hashes for a single mail?
There are 3 mime sections.
> Does 1.0, 1.1, 1.2 mean the three MIME pieces of the single email?
Yes
> Does razor split an email on MIME boundaries?
yes.
> 
> What does this mean (from -d):
> 
> check[6284]: [ 6] preproc: mail 1.0 went from 1866 bytes to 1716
> check[6284]: [ 6] preproc: mail 1.1 went from 3075 bytes to 2797
> check[6284]: [ 6] preproc: mail 1.2 went from 19018 bytes to 13956
> 
> Is this the normalization process shrinking the pieces of my email?
Yes

> 
> And how about this (from -d):
> 
> check[6284]: [ 6] Engine (8) didn't produce a signature for mail 1.0
> 
> Why couldn't engine (8) produce a hash for a piece of my mail?

There were no URLs in it.

> 
> Also, what do the values in ~/.razor/server.c101.cloudmark.com.conf (for
> example) mean?
> 
> Finally, if razor uses hashes to define spam, why use a whitelist? The
> odds of ham having the same hash as spam are really low, right?

Errors in reporting happen on occasion. And of course, your idea of spam might
not be the same as mine. This problem mostly impacts large-volume subscriber 
mail.

Usually the TeS system deals with this pretty quickly.. in other cases.. Well,
the Intel Developer Forum conference newsletter used to be listed in e4 with
surprising regularity a couple of years ago.


 Or
> does the normalization process sometimes reduce ham and spam to the
> same string?

Generally the normalization will not make two messages reduce to the same string
unless they're substantially the same. (ie: the body text itself is the same and
they only differ in HTML tags or message footer text.)

But don't assume that nobody else in the world will ever receive the same
message as you that you consider "ham" but they consider "spam"

> 
> Appreciate everyone's help. I think razor's great, just want to make
> sure I understand it.
> 


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Razor-users mailing list
Razor-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/razor-users

Reply via email to