Bug#355678: charset conversion added to rfc2047.c and (lbdb-)?fetchaddr
Peter Colberg schrieb: As you could probably guess, this is my first time delving into the autotools stuff, so I did not realise that lbdb is using only autoconf, but not automake. This led me to believe that files like Makefile.am were missing, compared to other autotoolized packages. To avoid further confustion about it, the file conigure.in should contain some comments, that lbdb uses some automake configure macros, but not automake itself. And it should state, that additions to aclocal.m4 should be made in acinclude.m4 and aclocal generates aclocal.m4. So finally automake is needed for some development issues, but as long as aclocal.m4 is provided, in most cases autoconf will be sufficiant. Thats some kind of development dependency, which is far beyond the scope of debian packages. So it need not to occure there. Tobias. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#355678: charset conversion added to rfc2047.c and (lbdb-)?fetchaddr
On Sat, Jun 16, 2007 at 08:41:20PM +0200, Roland Rosenfeld wrote: > [...] > > For the records: I didn't forget to include this patch, the problem > is, that lbdb should not only run on Debian but on all Unix systems, > including systems, where iconv is not available or not installed. I > don't like to add a build dependency on iconv on all systems, so I'm > looking for some autoconf stuff to make the iconv patch configurable > using --without-iconv (default, if iconv is not available), > --with-iconv (default, if iconv is available), or > --with-iconv=/some/dir. > > Any help with this autoconf stuff is greatly appreciated... Forget about my previous question about macro-including sources... As you could probably guess, this is my first time delving into the autotools stuff, so I did not realise that lbdb is using only autoconf, but not automake. This led me to believe that files like Makefile.am were missing, compared to other autotoolized packages. Anyway, adding iconv autoconf support proved to be rather easy. Appending the AM_ICONV macro to configure.in and including a @LIBICONV@ definition in Makefile.in needed for linking flags on some platforms (e.g. FreeBSD) was all it needed. The steps to then prepare the configure script are: # Requires packages gettext, autoconf, automake and autotools-dev cp /usr/share/misc/config.{guess,sub} . aclocal autoconf rm -rf autom4te.cache That's all. (The above-mentioned changes are included in the attached patch.) Regards, Peter --- lbdb-0.35.1.orig/fetchaddr.c +++ lbdb-0.35.1/fetchaddr.c @@ -119,6 +119,9 @@ char *headerlist = NULL; char *fieldname, *next; char create_real_name = 0; +#ifdef HAVE_ICONV + const char **charsetptr = &Charset; +#endif /* process command line arguments: */ if (argc > 1) { @@ -128,6 +131,10 @@ datefmt = argv[++i]; } else if (!strcmp (argv[i], "-x") && i+1 < argc) { headerlist = argv[++i]; +#ifdef HAVE_ICONV + } else if (!strcmp (argv[i], "-c") && i+1 < argc) { + *charsetptr = argv[++i]; +#endif } else if (!strcmp (argv[i], "-a")) { create_real_name = 1; } else { --- lbdb-0.35.1.orig/lbdb-fetchaddr.man.in +++ lbdb-0.35.1/lbdb-fetchaddr.man.in @@ -24,6 +24,8 @@ .IR dateformat ] .RB [ -x .IR headerfieldlist ] +.RB [ -c +.IR charset ] .RB [ -a ] .br .B lbdb-fetchaddr @@ -88,6 +90,12 @@ mail addresses. If this option isn't given, we fall back to .RB ` from:to:cc:resent-from:resent-to '. .TP +.BI -c " charset" +The charset which will be used to write the database. This should be +the charset which the application expects (normally the one from your +current locale). If this option isn't given, we fall back to +.RB ` iso-8859-15 '. +.TP .B -a Also grab addresses without a real name. Use the local part of the mail address as real name. --- lbdb-0.35.1.orig/lbdb-fetchaddr.sh.in +++ lbdb-0.35.1/lbdb-fetchaddr.sh.in @@ -41,6 +41,7 @@ echo " -h this short help" echo " -d 'dateformat'select date format using strftime(3)" echo " -x 'from:to:cc'colon separated list of header fields" +echo " -c 'charset' charset for the database storage" echo " -a also grep addresses without realname" } @@ -69,6 +70,13 @@ hdrlist="-x $1" fi ;; +-c) + if [ $# -gt 1 ] + then + shift + charset="-c $1" + fi + ;; -a) additional_param="$additional_param $1" ;; @@ -112,7 +120,7 @@ exit 1 fi -if $fetchaddr $additional_param -d "$datefmt" $hdrlist >> $db ; then +if $fetchaddr $additional_param -d "$datefmt" $hdrlist $charset >> $db ; then touch $db.dirty fi --- lbdb-0.35.1.orig/qpto8bit.c +++ lbdb-0.35.1/qpto8bit.c @@ -27,9 +27,17 @@ #include "rfc822.h" #include "rfc2047.h" -int main () +int main (int argc, char * argv[]) { char buff[2048]; +#ifdef HAVE_ICONV + const char **charsetptr = &Charset; +#endif + +#ifdef HAVE_ICONV + if (argc > 1) +*charsetptr = argv[1]; +#endif while (fgets (buff, sizeof (buff), stdin)) { rfc2047_decode (buff, buff, sizeof (buff)); --- lbdb-0.35.1.orig/rfc2047.c +++ lbdb-0.35.1/rfc2047.c @@ -20,6 +20,11 @@ #include #include +#ifdef HAVE_ICONV +#include +#include +#include +#endif #include "rfc822.h" #include "rfc2047.h" @@ -36,7 +41,7 @@ }; const char MimeSpecials[] = "@.,;<>[]\\\"()?/="; -const char Charset[] = "iso-8859-1"; /* XXX - hack */ +const char *Charset = "iso-8859-15"; /* XXX - hack */ int Index_hex[128] = { @@ -68,12 +73,18 @@ #define hexval(c) Index_hex[(unsigned int)(c)] #define base64val(c) Index_64[(unsigned int)(c)] -static int rfc2047_decode_word (char *d, const char *s, size_t len) +static int rfc2047_decode_word (char *d, const char *s, size_t dlen) { char *p = safe_strdup (s); char *pp = p; char *pd = d; + size_t len = dlen; int enc = 0, filter = 0, count = 0, c1, c2, c3, c4; +#ifdef HAVE_ICONV + char *fromcharset; + iconv_t cd; + size_t
Bug#355678: charset conversion added to rfc2047.c and (lbdb-)?fetchaddr
Hi, On Sat, Jun 16, 2007 at 08:41:20PM +0200, Roland Rosenfeld wrote: > Tobias Schlemmer schrieb am Dienstag, den 07. März 2006: > > > Package: lbdb > > Version: 0.31.1-0ts1 > > Severity: wishlist > > Tags: l10n patch > > > I have added charset conversion to rfc2047.c and (lbdb-)?fetchaddr using > > iconv. I don't know, how portable it is, so try it out. > > now my .procmailrc has an enty of > > > > :0hc > > | lbdb-fetchaddr -d '%d.%m.%Y %H.%M' -c utf-8 > > > > It seems to work fine for me, even with evolution and muttalias ;-). And > > I think it's a very small step towards real internationalization. > I tried out Tobias' patch a few days ago, to cope with the mess of having both ISO-8859-1 and UTF-8 encoded personal names in lbdb's m_inmail database. While conversion of characters from ISO-8859-1 to the desired UTF-8 charset works as expected, long names are mysteriously cut off at the end. Here's an example: $ echo 'From: =?ISO-8859-1?Q?El_nombre_m=E1s_largo_del_mundo?= <[EMAIL PROTECTED]>' | /usr/lib/lbdb/fetchaddr -c utf-8 [EMAIL PROTECTED] El nombre más larg 2007-10-27 17:03 Instead, the correct result should be: [EMAIL PROTECTED] El nombre más largo del mundo 2007-10-27 17:06 Looking at the code, the first thing I spotted was that the destination buffer length passed to iconv is erroneously set to the length of the source buffer which contains the decoded string. This is the cause for the cut off strings, as a character byte in one charset may correspond to multiple bytes in another charset, e.g. when converting non-ascii characters from ISO-8859-1 to UTF-8. Below, I included a revised version of Tobias' charset conversion patch. In summary, the following changes were made: * Set the destination buffer length (variable `len') to the maximum available length (variable `dlen' passed to `rfc2047_decode_word'). This fixes the above problem, but I am not quite sure whether the converted string will always be shorter (in bytes) than the RFC2047 encoded string, which determines the maximum buffer length. * In the iconv loop, the source buffer length (`in') has to be decremented, too, if I understand the iconv(3) error behaviour correctly. * Set the `in' length to include the terminating '\0' character. * Ignore all changes concerning only white space. * Wrap all iconv-related code blocks with an `#ifdef HAVE_ICONV'. This allows for compilation on platforms without iconv support. * Cosmetic change to the lbdb-fetchaddr(1) man page paragraph. * Fix a segfault in qpto8bit if no arguments are supplied. > For the records: I didn't forget to include this patch, the problem > is, that lbdb should not only run on Debian but on all Unix systems, > including systems, where iconv is not available or not installed. I > don't like to add a build dependency on iconv on all systems, so I'm > looking for some autoconf stuff to make the iconv patch configurable > using --without-iconv (default, if iconv is not available), > --with-iconv (default, if iconv is available), or > --with-iconv=/some/dir. > > Any help with this autoconf stuff is greatly appreciated... Roland, are there any sources available for lbdb (for example, a CVS or SVN repository) with the original automake/autoconf macro files? It does not seem all too hard to make the iconv support configurable, looking at the installation section of libiconv[1]. So if you could give me a hint about the macro files, I would be glad to look into this again. Regards, Peter [1] http://www.gnu.org/software/libiconv/ --- lbdb-0.35.1.orig/fetchaddr.c +++ lbdb-0.35.1/fetchaddr.c @@ -119,6 +119,9 @@ char *headerlist = NULL; char *fieldname, *next; char create_real_name = 0; +#ifdef HAVE_ICONV + const char **charsetptr = &Charset; +#endif /* process command line arguments: */ if (argc > 1) { @@ -128,6 +131,10 @@ datefmt = argv[++i]; } else if (!strcmp (argv[i], "-x") && i+1 < argc) { headerlist = argv[++i]; +#ifdef HAVE_ICONV + } else if (!strcmp (argv[i], "-c") && i+1 < argc) { + *charsetptr = argv[++i]; +#endif } else if (!strcmp (argv[i], "-a")) { create_real_name = 1; } else { --- lbdb-0.35.1.orig/lbdb-fetchaddr.man.in +++ lbdb-0.35.1/lbdb-fetchaddr.man.in @@ -24,6 +24,8 @@ .IR dateformat ] .RB [ -x .IR headerfieldlist ] +.RB [ -c +.IR charset ] .RB [ -a ] .br .B lbdb-fetchaddr @@ -88,6 +90,12 @@ mail addresses. If this option isn't given, we fall back to .RB ` from:to:cc:resent-from:resent-to '. .TP +.BI -c " charset" +The charset which will be used to write the database. This should be +the charset which the application expects (normally the one from your +current locale). If this option isn't given, we fall back to +.RB ` iso-8859-15 '. +.TP .B -a Also grab addresses without a real name. Use the local part of the mail address as real name. --- lbdb-0.35.1.orig/lbdb-fetchaddr.sh.in +++ lbdb-0.35.1/lbdb-fetchaddr.sh.in @@ -41,6 +41,7 @@ echo "
Bug#355678: charset conversion added to rfc2047.c and (lbdb-)?fetchaddr
Am Sa, 16. Jun 2007 08:41:20 +0200, schrieb Roland Rosenfeld: > Tobias Schlemmer schrieb am Dienstag, den 07. März 2006: > > Any help with this autoconf stuff is greatly appreciated... The keyword ”real internationalization“ leeds to another hint: the gettext package includes a file named ”iconv.m4“ which is meant to deal with the charset problem. I'm not shure, but gettextizing may help you. Tobias. -- ,---, Tobias /| Schlemmer / Tel.: 01 62 / 7 63 94 35| Dipl.-Math. / http://www.schlemmer.de.tt -. / [EMAIL PROTECTED] \ / GnuPG/PGP Public Keys: \ / 4A77CEF5 (RSA) bzw. DF2A703C (DSA) ' Jabber: [EMAIL PROTECTED] -'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,- Aristoteles beharrte darauf, daß Frauen weniger Zähne hätten als Männer. Obwohl er zweimal verheiratet war, kam er nie auf den Gedanken, seine Behauptung anhand einer Untersuchung der Münder seiner Frauen zu überprüfen. Bertrand Russel PGP-Unterschrift.asc Description: Digitale Unterschrift mit PGP/GnuPG
Bug#355678: charset conversion added to rfc2047.c and (lbdb-)?fetchaddr
Tobias Schlemmer schrieb am Dienstag, den 07. März 2006: > Package: lbdb > Version: 0.31.1-0ts1 > Severity: wishlist > Tags: l10n patch > I have added charset conversion to rfc2047.c and (lbdb-)?fetchaddr using > iconv. I don't know, how portable it is, so try it out. > now my .procmailrc has an enty of > > :0hc > | lbdb-fetchaddr -d '%d.%m.%Y %H.%M' -c utf-8 > > It seems to work fine for me, even with evolution and muttalias ;-). And > I think it's a very small step towards real internationalization. For the records: I didn't forget to include this patch, the problem is, that lbdb should not only run on Debian but on all Unix systems, including systems, where iconv is not available or not installed. I don't like to add a build dependency on iconv on all systems, so I'm looking for some autoconf stuff to make the iconv patch configurable using --without-iconv (default, if iconv is not available), --with-iconv (default, if iconv is available), or --with-iconv=/some/dir. Any help with this autoconf stuff is greatly appreciated... Tscho Roland
Bug#355678: charset conversion added to rfc2047.c and (lbdb-)?fetchaddr
Package: lbdb Version: 0.31.1-0ts1 Severity: wishlist Tags: l10n patch -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I have added charset conversion to rfc2047.c and (lbdb-)?fetchaddr using iconv. I don't know, how portable it is, so try it out. now my .procmailrc has an enty of :0hc | lbdb-fetchaddr -d '%d.%m.%Y %H.%M' -c utf-8 It seems to work fine for me, even with evolution and muttalias ;-). And I think it's a very small step towards real internationalization. Tobias - -- System Information: Debian Release: testing/unstable APT prefers testing APT policy: (990, 'testing'), (500, 'unstable') Architecture: i386 (i686) Shell: /bin/sh linked to /bin/bash Kernel: Linux 2.6.13.1-tobias1tobias Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8) Versions of packages lbdb depends on: ii libc6 2.3.5-13 GNU C Library: Shared libraries an ii libvformat1 1.13-3 Library to read and write vcard fi ii perl 5.8.8-2Larry Wall's Practical Extraction - -- no debconf information -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (GNU/Linux) iQCVAwUBRA1G/g3XOWNKd871AQJFYQP+Ito5mxZLGfUTa+8HLG4qc15+YsfSMCKe +1tdIV+k9n2FBCKFs7NOKJPChZYX0R5ojLSs0qY2LTtjO2wZuAdwcEt6uuLuBAe4 u+DV6d+uJrG5U7O45X/Tlqwe6qrMnvGluAsA/UVtIEB6ENWmJAsnC/n14gE7yfGo q9z0ZlHwkfY= =3Iob -END PGP SIGNATURE- lbdb_0.31.1-0ts1.diff.gz Description: Binary data