There are two possibilities.

(1) rewrite from BODY to RAWBODY as Matsuda-san says.
(2) invent NBODY (or something else) apart from BODY.  NBODY contains
     normalized and tokenized version of body.  I once thought of this
     idea but did not propose because BODY has problems I mentioned
     above and overhead of executing nbody_test increases.

There is third method.

rawbody  SJIS_BODY  eval:check_charset("Shift_JIS")
describe SJIS_BODY  Mail text is encoded with Shift JIS
score    SJIS_BODY 1.4

rawbody  JIS_BODY   eval:check_charset("ISO-2022-JP")
describe JIS_BODY   Mail text is encoded with JIS
score    JIS_BODY   -0.5

check_charset is a function that detect charset of rawbody using Encode::Detect::Encoder::detect. I don't write this function yet though.

--
Motoharu Kubo
[EMAIL PROTECTED]

Reply via email to