subject:"How to use non\-ascii charsets with sieve\?"

Re: How to use non-ascii charsets with sieve?

2002-12-10 Thread Mark Keasling

Hi,


On Mon, 09 Dec 2002 22:17:06 -0500, Lawrence Greenfield [EMAIL PROTECTED] wrote...
 --On Tuesday, December 10, 2002 11:52 AM +0900 Mark Keasling 
 [EMAIL PROTECTED] wrote:
 
  Hi Larry,
 
 [ ... decode in fill_cache() ... ]
  This hasn't been tested this yet since I stuck it in yesterday before
  going home and have just returned to the office.  It should decode
  subjects into utf8.  But it may have interesting unintended
  side-effects.  So far we are only interested in decoded subjects.  But
  decoding the comment part of addresses also has a high probability of
  being desired.  Depends on the feed-back we get from users.
 
  Will charset_decode1522( ) strip the whitespace?
 
 Yes, the output of charset_decode1522() is intended to be fed into the 
 Cyrus IMAP SEARCH algorithm, which ignores whitespace. It also does case 
 folding, preventing i;octet searches from working.
 
 charset_decode1522() would work if it was using a different transcoding 
 table than what is generated in the lib/ directory.

I'm in the process of trying to figure out how this stuff works...
Is it possible to separate the charset to utf-8 conversion from the text to
search data transformation?

Regards,
Mark Keasling [EMAIL PROTECTED]

Re: How to use non-ascii charsets with sieve?

2002-12-10 Thread Ken Murchison

I dug up the patch I have for creating a separate Sieve charset table. 
I have no idea if it will still apply cleanly due to its age, but it
should point you at the places to look in the code.  If you can find a
way to make one unified table as Larry suggests, that would be great.


Mark Keasling wrote:
 
 Hi,
 
 On Mon, 09 Dec 2002 22:17:06 -0500, Lawrence Greenfield [EMAIL PROTECTED] 
wrote...
  --On Tuesday, December 10, 2002 11:52 AM +0900 Mark Keasling
  [EMAIL PROTECTED] wrote:
 
   Hi Larry,
 
  [ ... decode in fill_cache() ... ]
   This hasn't been tested this yet since I stuck it in yesterday before
   going home and have just returned to the office.  It should decode
   subjects into utf8.  But it may have interesting unintended
   side-effects.  So far we are only interested in decoded subjects.  But
   decoding the comment part of addresses also has a high probability of
   being desired.  Depends on the feed-back we get from users.
  
   Will charset_decode1522( ) strip the whitespace?
 
  Yes, the output of charset_decode1522() is intended to be fed into the
  Cyrus IMAP SEARCH algorithm, which ignores whitespace. It also does case
  folding, preventing i;octet searches from working.
 
  charset_decode1522() would work if it was using a different transcoding
  table than what is generated in the lib/ directory.
 
 I'm in the process of trying to figure out how this stuff works...
 Is it possible to separate the charset to utf-8 conversion from the text to
 search data transformation?
 
 Regards,
 Mark Keasling [EMAIL PROTECTED]

-- 
Kenneth Murchison Oceana Matrix Ltd.
Software Engineer 21 Princeton Place
716-662-8973 x26  Orchard Park, NY 14127
--PGP Public Key--http://www.oceana.com/~ken/ksm.pgp


sieve-mime.patch.gz
Description: GNU Zip compressed data

Re: How to use non-ascii charsets with sieve?

2002-12-10 Thread Lawrence Greenfield

   Date: Tue, 10 Dec 2002 19:07:55 +0900 (JST)
   From: Mark Keasling [EMAIL PROTECTED]
[...]
   I'm in the process of trying to figure out how this stuff works...
   Is it possible to separate the charset to utf-8 conversion from the text to
   search data transformation?

It would be technically possible. It's probably not the easiest thing
to do in the Cyrus code base.

Currently mkchartable.c does casemapping, character decomposition, and
whitespace elimination. It also applies some mappings
(charset/unifix.txt) that help with a language independant match but
may not be appropriate for collation or all UTF-8 comparators.

To make the chartable stuff work for Sieve  our current SEARCH, we
probably should build tables that just output decomposed (or fully
composed) UTF-8 characters.

We can then write a UTF-8 comparator library that, during comparison,
does the canonicalization.

The easier path to make Sieve work would be to just build two
completely seperate tables. I'd prefer to see the more general
solution.

While none of this is rocket science, it is heavily detailed oriented
and requires concentration.

Larry

How to use non-ascii charsets with sieve?

2002-12-09 Thread Mark Keasling

Hi,
(B
(BCan someone give me "how to" pointer...
(B
(BI need to know how to use non-ASCII text in sieve scripts.
(BFor example: using Japanese in message headers or mailbox names.
(B
(BFor example a message has a subject as follows:
(B
(B$BBjL>(B: $B%"%/%;%7%S%j%F%#%;%_%J!

Re: How to use non-ascii charsets with sieve?

2002-12-09 Thread Ken Murchison



Lawrence Greenfield wrote:
 
 You bring up good questions.
 
 First, our Sieve implementation currently doesn't deal with RFC 2047
 encoded headers---or rather, it just compares the undecoded headers
 against the UTF-8 string. This is obviously a bug which sadly isn't in
 bugzilla.
 
 Ken and I talked (a long time ago) about this. The main issue is that
 Cyrus's character comparison routines remove whitespace and always
 perform casemapping, and this is probably inappropriate for Sieve's
 use. Fixing this is probably not difficult, but I'd prefer not to have
 multiple different canonicalization tables.

I _think_ I still have the code around which implements the Sieve
charset tables and does the rfc2047 decoding.  I don't recall why we had
to have the separate tables however.

-- 
Kenneth Murchison Oceana Matrix Ltd.
Software Engineer 21 Princeton Place
716-662-8973 x26  Orchard Park, NY 14127
--PGP Public Key--http://www.oceana.com/~ken/ksm.pgp

Re: How to use non-ascii charsets with sieve?

2002-12-09 Thread Tim Showalter

First, our Sieve implementation currently doesn't deal with RFC 2047
encoded headers---or rather, it just compares the undecoded headers
against the UTF-8 string. This is obviously a bug which sadly isn't in
bugzilla.

Ken and I talked (a long time ago) about this. The main issue is that
Cyrus's character comparison routines remove whitespace and always
perform casemapping, and this is probably inappropriate for Sieve's
use. Fixing this is probably not difficult, but I'd prefer not to have
multiple different canonicalization tables.


I _think_ I still have the code around which implements the Sieve
charset tables and does the rfc2047 decoding.  I don't recall why we had
to have the separate tables however.


different comparators would require different tables, I think.  The 
table Cyrus usually uses isn't suitable for i;ascii-casemap since space 
isn't significant, but transcoding to UTF-8 and doing a dumb comparison 
is all that's required, a big improvement on what Cyrus is doing now, 
and not hard to implement.

Tim

Re: How to use non-ascii charsets with sieve?

2002-12-09 Thread Mark Keasling

Hi Larry,
(B
(BWe are considering a modification like this to fill_cache(message_data_t *)
(Bin cyrus-imapd-2.1.11/sieve/test.c
(B
(B%%SNIP%%
(Bvoid fill_cache(message_data_t *m)
(B{
(Brewind(m-data);
(B
(B/* let's fill that header cache */
(Bfor (;;) {
(Bchar *name, *body;
(Bint cl, clinit;
(B
(Bif (parseheader(m-data, name, body)  0) {
(Bbreak;
(B}
(B
(B#ifdef DECODE_SUBJECT
(B/* decode mime encoded subjects */
(Bif( name  * name  ! strcmp( name, "subject" )
(B body  * body  strstr( body, "=?" ))
(B{
(Bchar * de = charset_decode1522( body, NULL, 0 ) ;
(Bif( decoded  * decoded )
(B{
(Bfree( body ) ;
(Bbody = decoded ;
(B}
(B}
(B#endif /* DECODE_SUBJECT */
(B
(B%%SNIP%%
(B
(BThis hasn't been tested this yet since I stuck it in yesterday before
(Bgoing home and have just returned to the office.  It should decode subjects
(Binto utf8.  But it may have "interesting" unintended side-effects.  So far
(Bwe are only interested in decoded subjects.  But decoding the comment part
(Bof addresses also has a high probability of being desired.  Depends on the
(Bfeed-back we get from users.
(B
(BWill charset_decode1522( ) strip the whitespace?
(BSomeone else found the function and I have only given it the most cursory
(Bglance over.
(B
(BOn Mon, 9 Dec 2002 15:59:38 -0500, Lawrence Greenfield [EMAIL PROTECTED] wrote...
(B You bring up good questions.
(B 
(B First, our Sieve implementation currently doesn't deal with RFC 2047
(B encoded headers---or rather, it just compares the undecoded headers
(B against the UTF-8 string. This is obviously a bug which sadly isn't in
(B bugzilla.
(B 
(B Ken and I talked (a long time ago) about this. The main issue is that
(B Cyrus's character comparison routines remove whitespace and always
(B perform casemapping, and this is probably inappropriate for Sieve's
(B use. Fixing this is probably not difficult, but I'd prefer not to have
(B multiple different canonicalization tables.
(B 
(B The "fileinto" problem is more straightforward and should be fixed in
(B lmtpd.c:sieve_fileinto().
(B 
(B I would add a function to mboxname.[ch] of mboxname_utf8tomutf7() and
(B then make sieve_fileinto() call it.
(B 
(B Larry
(B
(BThank You!  This is very NICE as I hadn't gotten far enough along to look
(Bat this yet.  The obvious work around (ugly hack) for fileinto is to have
(Bthe client do the mutf-7 conversion before submitting the script.  We're
(Bworking on the client so such a hack isn't out of the question; but probably
(Bwont work well if some other client were to access the server.
(B
(B 
(BDate: Mon, 9 Dec 2002 19:53:37 +0900 (JST)
(BFrom: Mark Keasling [EMAIL PROTECTED]
(B [...]
(Bscript language="sieve" version="RFC-3028"
(B  # pretend this is encoded in UTF-8
(B 
(B  require ["reject","fileinto"];
(B 
(B  if header :contains "Subject" "$B%;%_%J!

Re: How to use non-ascii charsets with sieve?

2002-12-09 Thread Lawrence Greenfield

--On Monday, December 09, 2002 6:01 PM -0800 Tim Showalter [EMAIL PROTECTED] 
wrote:

different comparators would require different tables, I think.  The table
Cyrus usually uses isn't suitable for i;ascii-casemap since space isn't
significant, but transcoding to UTF-8 and doing a dumb comparison is all
that's required, a big improvement on what Cyrus is doing now, and not
hard to implement.


Yes, unlike the IMAP SEARCH command the Sieve comparators have strict 
semantics. I believe it's possible to implement the Sieve comparators and 
Cyrus's current SEARCH comparator with a single table. The table would 
transcode into UTF-8 (fully decomposed) and the SEARCH comparator could do 
the modifications needed.

Larry

Re: How to use non-ascii charsets with sieve?

Re: How to use non-ascii charsets with sieve?

Re: How to use non-ascii charsets with sieve?

How to use non-ascii charsets with sieve?

Re: How to use non-ascii charsets with sieve?

Re: How to use non-ascii charsets with sieve?

Re: How to use non-ascii charsets with sieve?

Re: How to use non-ascii charsets with sieve?

8 matches

Site Navigation

Mail list logo

Footer information