Re: I18n and l10n

Justin Mason Tue, 17 Jan 2006 12:35:41 -0800

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


MATSUDA Yoh-ichi writes:
> > - Writing rule with hex notation is troublesome, boaring and decreases
> >    productivity.  If we could normalize charset, we could write rule
> >    directly with UTF-8 aware editor.
> 
> Yes.
> Directly writing REGEX rule with UTF-8 character is very convenience.
> But I think character normalization and tokenization before body
> testing is troublesome.
> Because, character normalization and tokenization is modifying
> message text, so REGEX rule writer can't recognize against the
> modified text.
> 
> Many rules are written for pure plain message text.
> If character normalization and tokenization are inserted before body
> testing, many body rules will be unavailable.
> 
> So,
> 
> > > But, if the character normalization will insert before body testing,
> > > my rule will be unavailable.
> > > 
> > > Do I have to re-write the above 2 rules from [body] to [rawbody]?
> > 
> > There are two possibilities.
> > 
> > (1) rewrite from BODY to RAWBODY as Matsuda-san says.
> > (2) invent NBODY (or something else) apart from BODY.  NBODY contains
> >      normalized and tokenized version of body.  I once thought of this
> >      idea but did not propose because BODY has problems I mentioned
> >      above and overhead of executing nbody_test increases.
> 
> I want (2), for the reason of compatibility of rules.

+1, agreed.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFDzVTuMJF5cimLx9ARAqxOAKCILBFwluZj3/yicF3aPBSTpy8vigCgkZ7C
kn0sKCBOmjDJRpSRh5LYVsw=
=eJbr
-----END PGP SIGNATURE-----

Re: I18n and l10n

Reply via email to