There are two possibilities.
(1) rewrite from BODY to RAWBODY as Matsuda-san says.
(2) invent NBODY (or something else) apart from BODY. NBODY contains
normalized and tokenized version of body. I once thought of this
idea but did not propose because BODY has problems I mentioned
above and overhead of executing nbody_test increases.
There is third method.
rawbody SJIS_BODY eval:check_charset("Shift_JIS")
describe SJIS_BODY Mail text is encoded with Shift JIS
score SJIS_BODY 1.4
rawbody JIS_BODY eval:check_charset("ISO-2022-JP")
describe JIS_BODY Mail text is encoded with JIS
score JIS_BODY -0.5
check_charset is a function that detect charset of rawbody using
Encode::Detect::Encoder::detect. I don't write this function yet though.
--
Motoharu Kubo
[EMAIL PROTECTED]