Bug#355678: charset conversion added to rfc2047.c and (lbdb-)?fetchaddr

2007-10-29 Thread Tobias Schlemmer

Peter Colberg schrieb:

As you could probably guess, this is my first time delving into the
autotools stuff, so I did not realise that lbdb is using only
autoconf, but not automake. This led me to believe that files like
Makefile.am were missing, compared to other autotoolized packages.
  
To avoid further confustion about it, the file conigure.in should 
contain some comments, that lbdb uses some automake configure macros, 
but not automake itself. And it should state, that additions to 
aclocal.m4 should be made in acinclude.m4 and aclocal generates aclocal.m4.


So finally automake is needed for some development issues, but as long 
as aclocal.m4 is provided, in most cases autoconf will be sufficiant. 
Thats some kind of development dependency, which is far beyond the scope 
of debian packages. So it need not to occure there.


Tobias.



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#355678: charset conversion added to rfc2047.c and (lbdb-)?fetchaddr

2007-10-28 Thread Peter Colberg
On Sat, Jun 16, 2007 at 08:41:20PM +0200, Roland Rosenfeld wrote:
> [...]
> 
> For the records: I didn't forget to include this patch, the problem
> is, that lbdb should not only run on Debian but on all Unix systems,
> including systems, where iconv is not available or not installed.  I
> don't like to add a build dependency on iconv on all systems, so I'm
> looking for some autoconf stuff to make the iconv patch configurable
> using --without-iconv (default, if iconv is not available),
> --with-iconv (default, if iconv is available), or
> --with-iconv=/some/dir.
> 
> Any help with this autoconf stuff is greatly appreciated...

Forget about my previous question about macro-including sources...

As you could probably guess, this is my first time delving into the
autotools stuff, so I did not realise that lbdb is using only
autoconf, but not automake. This led me to believe that files like
Makefile.am were missing, compared to other autotoolized packages.


Anyway, adding iconv autoconf support proved to be rather easy.
Appending the AM_ICONV macro to configure.in and including a
@LIBICONV@ definition in Makefile.in needed for linking flags on
some platforms (e.g. FreeBSD) was all it needed.

The steps to then prepare the configure script are:

  # Requires packages gettext, autoconf, automake and autotools-dev
  cp /usr/share/misc/config.{guess,sub} .
  aclocal
  autoconf
  rm -rf autom4te.cache


That's all.

(The above-mentioned changes are included in the attached patch.)

Regards,
Peter
--- lbdb-0.35.1.orig/fetchaddr.c
+++ lbdb-0.35.1/fetchaddr.c
@@ -119,6 +119,9 @@
   char *headerlist = NULL;
   char *fieldname, *next;
   char create_real_name = 0;
+#ifdef HAVE_ICONV
+  const char **charsetptr = &Charset;
+#endif
 
   /* process command line arguments: */
   if (argc > 1) {
@@ -128,6 +131,10 @@
 	datefmt = argv[++i];
   } else if (!strcmp (argv[i], "-x") && i+1 < argc) {
 	headerlist = argv[++i];
+#ifdef HAVE_ICONV
+  } else if (!strcmp (argv[i], "-c") && i+1 < argc) {
+	*charsetptr = argv[++i];
+#endif
   } else if (!strcmp (argv[i], "-a")) {
 	create_real_name = 1;
   } else {
--- lbdb-0.35.1.orig/lbdb-fetchaddr.man.in
+++ lbdb-0.35.1/lbdb-fetchaddr.man.in
@@ -24,6 +24,8 @@
 .IR dateformat ]
 .RB [ -x
 .IR headerfieldlist ]
+.RB [ -c
+.IR charset ]
 .RB [ -a ]
 .br
 .B lbdb-fetchaddr
@@ -88,6 +90,12 @@
 mail addresses.  If this option isn't given, we fall back to
 .RB ` from:to:cc:resent-from:resent-to '.
 .TP
+.BI -c " charset"
+The charset which will be used to write the database. This should be
+the charset which the application expects (normally the one from your
+current locale).  If this option isn't given, we fall back to
+.RB ` iso-8859-15 '.
+.TP
 .B -a
 Also grab addresses without a real name.  Use the local part of the
 mail address as real name.
--- lbdb-0.35.1.orig/lbdb-fetchaddr.sh.in
+++ lbdb-0.35.1/lbdb-fetchaddr.sh.in
@@ -41,6 +41,7 @@
 echo "   -h this short help"
 echo "   -d 'dateformat'select date format using strftime(3)"
 echo "   -x 'from:to:cc'colon separated list of header fields"
+echo "   -c 'charset'   charset for the database storage"
 echo "   -a also grep addresses without realname"
 }
 
@@ -69,6 +70,13 @@
 	hdrlist="-x $1"
 	fi
 	;;
+-c)
+	if [ $# -gt 1 ]
+	then
+	shift
+	charset="-c $1"
+	fi
+	;;
 -a)
 	additional_param="$additional_param $1"
 	;;
@@ -112,7 +120,7 @@
   exit 1
 fi
 
-if $fetchaddr $additional_param -d "$datefmt" $hdrlist >> $db ; then
+if $fetchaddr $additional_param -d "$datefmt" $hdrlist $charset >> $db ; then
   touch $db.dirty
 fi
 
--- lbdb-0.35.1.orig/qpto8bit.c
+++ lbdb-0.35.1/qpto8bit.c
@@ -27,9 +27,17 @@
 #include "rfc822.h"
 #include "rfc2047.h"
 
-int main ()
+int main (int argc, char * argv[])
 {
   char buff[2048];
+#ifdef HAVE_ICONV
+  const char **charsetptr = &Charset;
+#endif
+
+#ifdef HAVE_ICONV
+  if (argc > 1)
+*charsetptr = argv[1];
+#endif
 
   while (fgets (buff, sizeof (buff), stdin)) {
 rfc2047_decode (buff, buff, sizeof (buff));
--- lbdb-0.35.1.orig/rfc2047.c
+++ lbdb-0.35.1/rfc2047.c
@@ -20,6 +20,11 @@
 
 #include 
 #include 
+#ifdef HAVE_ICONV
+#include 
+#include 
+#include 
+#endif
 
 #include "rfc822.h"
 #include "rfc2047.h"
@@ -36,7 +41,7 @@
 };
 
 const char MimeSpecials[] = "@.,;<>[]\\\"()?/=";
-const char Charset[] = "iso-8859-1"; /* XXX - hack */
+const char *Charset = "iso-8859-15"; /* XXX - hack */
 
 
 int Index_hex[128] = {
@@ -68,12 +73,18 @@
 #define hexval(c) Index_hex[(unsigned int)(c)]
 #define base64val(c) Index_64[(unsigned int)(c)]
 
-static int rfc2047_decode_word (char *d, const char *s, size_t len)
+static int rfc2047_decode_word (char *d, const char *s, size_t dlen)
 {
   char *p = safe_strdup (s);
   char *pp = p;
   char *pd = d;
+  size_t len = dlen;
   int enc = 0, filter = 0, count = 0, c1, c2, c3, c4;
+#ifdef HAVE_ICONV
+  char *fromcharset;
+  iconv_t cd;
+  size_t 

Bug#355678: charset conversion added to rfc2047.c and (lbdb-)?fetchaddr

2007-10-27 Thread Peter Colberg
Hi,

On Sat, Jun 16, 2007 at 08:41:20PM +0200, Roland Rosenfeld wrote:
> Tobias Schlemmer schrieb am Dienstag, den 07. März 2006:
> 
> > Package: lbdb
> > Version: 0.31.1-0ts1
> > Severity: wishlist
> > Tags: l10n patch
> 
> > I have added charset conversion to rfc2047.c and (lbdb-)?fetchaddr using
> > iconv. I don't know, how portable it is, so try it out. 
> > now my .procmailrc has an enty of 
> > 
> > :0hc
> > | lbdb-fetchaddr -d '%d.%m.%Y %H.%M' -c utf-8
> > 
> > It seems to work fine for me, even with evolution and muttalias ;-). And
> > I think it's a very small step towards real internationalization. 
> 

I tried out Tobias' patch a few days ago, to cope with the mess of
having both ISO-8859-1 and UTF-8 encoded personal names in lbdb's
m_inmail database.

While conversion of characters from ISO-8859-1 to the desired UTF-8
charset works as expected, long names are mysteriously cut off at the
end. Here's an example:

  $ echo 'From: =?ISO-8859-1?Q?El_nombre_m=E1s_largo_del_mundo?= <[EMAIL 
PROTECTED]>' | /usr/lib/lbdb/fetchaddr -c utf-8
  [EMAIL PROTECTED] El nombre más larg  2007-10-27 17:03

Instead, the correct result should be:

  [EMAIL PROTECTED] El nombre más largo del mundo   2007-10-27 17:06


Looking at the code, the first thing I spotted was that the
destination buffer length passed to iconv is erroneously set to the
length of the source buffer which contains the decoded string.
This is the cause for the cut off strings, as a character byte in
one charset may correspond to multiple bytes in another charset,
e.g. when converting non-ascii characters from ISO-8859-1 to UTF-8.


Below, I included a revised version of Tobias' charset conversion patch.

In summary, the following changes were made:

* Set the destination buffer length (variable `len') to the maximum
  available length (variable `dlen' passed to `rfc2047_decode_word').

  This fixes the above problem, but I am not quite sure whether the
  converted string will always be shorter (in bytes) than the RFC2047
  encoded string, which determines the maximum buffer length.

* In the iconv loop, the source buffer length (`in') has to be
  decremented, too, if I understand the iconv(3) error behaviour
  correctly.

* Set the `in' length to include the terminating '\0' character.

* Ignore all changes concerning only white space.

* Wrap all iconv-related code blocks with an `#ifdef HAVE_ICONV'.
  This allows for compilation on platforms without iconv support.

* Cosmetic change to the lbdb-fetchaddr(1) man page paragraph.

* Fix a segfault in qpto8bit if no arguments are supplied.


> For the records: I didn't forget to include this patch, the problem
> is, that lbdb should not only run on Debian but on all Unix systems,
> including systems, where iconv is not available or not installed.  I
> don't like to add a build dependency on iconv on all systems, so I'm
> looking for some autoconf stuff to make the iconv patch configurable
> using --without-iconv (default, if iconv is not available),
> --with-iconv (default, if iconv is available), or
> --with-iconv=/some/dir.
> 
> Any help with this autoconf stuff is greatly appreciated...

Roland, are there any sources available for lbdb (for example, a CVS
or SVN repository) with the original automake/autoconf macro files?

It does not seem all too hard to make the iconv support configurable,
looking at the installation section of libiconv[1]. So if you could
give me a hint about the macro files, I would be glad to look into
this again.

Regards,
Peter

[1] http://www.gnu.org/software/libiconv/
--- lbdb-0.35.1.orig/fetchaddr.c
+++ lbdb-0.35.1/fetchaddr.c
@@ -119,6 +119,9 @@
   char *headerlist = NULL;
   char *fieldname, *next;
   char create_real_name = 0;
+#ifdef HAVE_ICONV
+  const char **charsetptr = &Charset;
+#endif
 
   /* process command line arguments: */
   if (argc > 1) {
@@ -128,6 +131,10 @@
 	datefmt = argv[++i];
   } else if (!strcmp (argv[i], "-x") && i+1 < argc) {
 	headerlist = argv[++i];
+#ifdef HAVE_ICONV
+  } else if (!strcmp (argv[i], "-c") && i+1 < argc) {
+	*charsetptr = argv[++i];
+#endif
   } else if (!strcmp (argv[i], "-a")) {
 	create_real_name = 1;
   } else {
--- lbdb-0.35.1.orig/lbdb-fetchaddr.man.in
+++ lbdb-0.35.1/lbdb-fetchaddr.man.in
@@ -24,6 +24,8 @@
 .IR dateformat ]
 .RB [ -x
 .IR headerfieldlist ]
+.RB [ -c
+.IR charset ]
 .RB [ -a ]
 .br
 .B lbdb-fetchaddr
@@ -88,6 +90,12 @@
 mail addresses.  If this option isn't given, we fall back to
 .RB ` from:to:cc:resent-from:resent-to '.
 .TP
+.BI -c " charset"
+The charset which will be used to write the database. This should be
+the charset which the application expects (normally the one from your
+current locale).  If this option isn't given, we fall back to
+.RB ` iso-8859-15 '.
+.TP
 .B -a
 Also grab addresses without a real name.  Use the local part of the
 mail address as real name.
--- lbdb-0.35.1.orig/lbdb-fetchaddr.sh.in
+++ lbdb-0.35.1/lbdb-fetchaddr.sh.in
@@ -41,6 +41,7 @@
 echo " 

Bug#355678: charset conversion added to rfc2047.c and (lbdb-)?fetchaddr

2007-06-16 Thread Tobias Schlemmer
Am Sa, 16. Jun 2007 08:41:20 +0200, schrieb Roland Rosenfeld:
> Tobias Schlemmer schrieb am Dienstag, den 07. März 2006:
> 
> Any help with this autoconf stuff is greatly appreciated...

The keyword ”real internationalization“ leeds to another hint:
the gettext package includes a file named ”iconv.m4“ which is meant to
deal with the charset problem. I'm not shure, but gettextizing may help
you.

Tobias.
-- 
   ,---,
  Tobias  /|
 Schlemmer   / Tel.: 01 62 / 7 63 94 35|
Dipl.-Math. / http://www.schlemmer.de.tt
-. /   [EMAIL PROTECTED]
  \   / GnuPG/PGP Public Keys:
   \ / 4A77CEF5 (RSA)  bzw. DF2A703C (DSA)
'   Jabber: [EMAIL PROTECTED]
-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-
  
Aristoteles beharrte darauf, daß Frauen weniger Zähne hätten als Männer.
Obwohl er zweimal verheiratet war, kam er nie auf den Gedanken, seine
Behauptung anhand einer Untersuchung der Münder seiner Frauen zu
überprüfen.

Bertrand Russel


PGP-Unterschrift.asc
Description: Digitale Unterschrift mit PGP/GnuPG


Bug#355678: charset conversion added to rfc2047.c and (lbdb-)?fetchaddr

2007-06-16 Thread Roland Rosenfeld
Tobias Schlemmer schrieb am Dienstag, den 07. März 2006:

> Package: lbdb
> Version: 0.31.1-0ts1
> Severity: wishlist
> Tags: l10n patch

> I have added charset conversion to rfc2047.c and (lbdb-)?fetchaddr using
> iconv. I don't know, how portable it is, so try it out. 
> now my .procmailrc has an enty of 
> 
> :0hc
> | lbdb-fetchaddr -d '%d.%m.%Y %H.%M' -c utf-8
> 
> It seems to work fine for me, even with evolution and muttalias ;-). And
> I think it's a very small step towards real internationalization. 

For the records: I didn't forget to include this patch, the problem
is, that lbdb should not only run on Debian but on all Unix systems,
including systems, where iconv is not available or not installed.  I
don't like to add a build dependency on iconv on all systems, so I'm
looking for some autoconf stuff to make the iconv patch configurable
using --without-iconv (default, if iconv is not available),
--with-iconv (default, if iconv is available), or
--with-iconv=/some/dir.

Any help with this autoconf stuff is greatly appreciated...

Tscho

Roland



Bug#355678: charset conversion added to rfc2047.c and (lbdb-)?fetchaddr

2006-03-07 Thread Tobias Schlemmer
Package: lbdb
Version: 0.31.1-0ts1
Severity: wishlist
Tags: l10n patch

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I have added charset conversion to rfc2047.c and (lbdb-)?fetchaddr using
iconv. I don't know, how portable it is, so try it out. 
now my .procmailrc has an enty of 

:0hc
| lbdb-fetchaddr -d '%d.%m.%Y %H.%M' -c utf-8

It seems to work fine for me, even with evolution and muttalias ;-). And
I think it's a very small step towards real internationalization. 

Tobias

- -- System Information:
Debian Release: testing/unstable
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'unstable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.13.1-tobias1tobias
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8)

Versions of packages lbdb depends on:
ii  libc6 2.3.5-13   GNU C Library: Shared libraries an
ii  libvformat1   1.13-3 Library to read and write vcard fi
ii  perl  5.8.8-2Larry Wall's Practical Extraction 

- -- no debconf information

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2 (GNU/Linux)

iQCVAwUBRA1G/g3XOWNKd871AQJFYQP+Ito5mxZLGfUTa+8HLG4qc15+YsfSMCKe
+1tdIV+k9n2FBCKFs7NOKJPChZYX0R5ojLSs0qY2LTtjO2wZuAdwcEt6uuLuBAe4
u+DV6d+uJrG5U7O45X/Tlqwe6qrMnvGluAsA/UVtIEB6ENWmJAsnC/n14gE7yfGo
q9z0ZlHwkfY=
=3Iob
-END PGP SIGNATURE-


lbdb_0.31.1-0ts1.diff.gz
Description: Binary data