Re: argp --help formatting bug with non-ASCII characters

2024-04-13 Thread Lasse Collin
On 2024-04-12 Bruno Haible wrote:
> How to reproduce:
> 1. Build GNU tar (1.33 or 1.35, for example).
> 2. Install it.
> 3. You can see misindentation:
> $ LC_ALL=de_DE.UTF-8 src/tar --help
> 
> ...
>  abzubilden
>   --mode=ÄNDERUNGEN den (symbolischen) Modus ÄNDERUNGEN für
>  hinzugefügte Dateien erzwingen
[...]
> In other words, it should use the Gnulib module 'mbswidth'.

This is one of the issues. The other one is with the wrapping of the
description text. It counts 1 byte == 1 column but in languages which
mostly use non-ASCII chars it can be mostly 2 bytes == 1 column. Then
the wrapping occurs far too early, resulting in narrow output.

-- 
Lasse Collin



Re: argp --help formatting bug with non-ASCII characters

2024-04-12 Thread Collin Funk
Hi Bruno,

On 4/12/24 10:19 AM, Bruno Haible wrote:
> So, my feeling would be that the first thing to do is to create a new
> unit test for it:
>   1. so that we don't need to build and try GNU tar in order to see
>  whether there is a regression in the future,
>   2. so as to cover the tricky cases as well (such as, near the end of
>  a line, a word that ends in 'Ä' or in some width 2 character).

Yes, that makes sense. Since GNU tar only has the case

 --opt=ARGsingle-line doc string

where ARG can have 'Ä' or some other character. I'm not very familiar
with localization or multi-byte encodings, but I can try to come up
with a unit test with some inspiration from some other modules.

Collin



Re: argp --help formatting bug with non-ASCII characters

2024-04-12 Thread Bruno Haible
Hi Collin,

> This patch fixes the two examples that you gave. I think it may be
> missing the case where the buffer doesn't end in a partial line
> though.

There are probably some other cases that are not handled either.
The code is ca. 200 lines of considerations of columns and widths
here and there.

> I have a feeling it is necessary, but I don't have a test case to make
> sure I don't break things.

So, my feeling would be that the first thing to do is to create a new
unit test for it:
  1. so that we don't need to build and try GNU tar in order to see
 whether there is a regression in the future,
  2. so as to cover the tricky cases as well (such as, near the end of
 a line, a word that ends in 'Ä' or in some width 2 character).

Bruno






Re: argp --help formatting bug with non-ASCII characters

2024-04-12 Thread Collin Funk
Hi Bruno,

On 4/11/24 4:23 PM, Bruno Haible wrote:
> The bug is apparently that in argp-fmtstream.c, a character such as Ä or É
> augments fs->point_col by 2, when it should only augment it by 1.
> 
> In other words, it should use the Gnulib module 'mbswidth'.
> 
> Any volunteers?

I am very unfamiliar with these modules, so I would appreciate this
being double checked.

This patch fixes the two examples that you gave. I think it may be
missing the case where the buffer doesn't end in a partial line
though.

Here is the variable I add to adjust the columns:

   len = fs->p - buf;
+  visible_len = mbsnwidth (buf, len, 0);
   nl = memchr (buf, '\n', len);

After checking that nl != NULL, would I need to do:

  visible_len = mbsnwidth (buf, nl - buf, 0);

I have a feeling it is necessary, but I don't have a test case to make
sure I don't break things.

CollinFrom 0f18bb3f38e4189a4f164d716fe411f7a3a868cf Mon Sep 17 00:00:00 2001
From: Collin Funk 
Date: Fri, 12 Apr 2024 08:55:48 -0700
Subject: [PATCH] argp: Fix alignment of option doc strings.

Reported by Lasse Collin in

* lib/argp-fmtstream.c (__argp_fmtstream_update): Adjust the column of
output based on character width instead of number of bytes.
* modules/argp: Add mbswidth to module dependencies.
---
 ChangeLog| 9 +
 lib/argp-fmtstream.c | 7 +--
 modules/argp | 1 +
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index b20f69b06b..c307284fcd 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2024-04-12  Collin Funk  
+
+	argp: Fix alignment of option doc strings.
+	Reported by Lasse Collin in
+	
+	* lib/argp-fmtstream.c (__argp_fmtstream_update): Adjust the column of
+	output based on character width instead of number of bytes.
+	* modules/argp: Add mbswidth to module dependencies.
+
 2024-04-12  Simon Josefsson  
 
 	gitlog-to-changelog: Make output reproducible.
diff --git a/lib/argp-fmtstream.c b/lib/argp-fmtstream.c
index 7f75879d69..b1042529b4 100644
--- a/lib/argp-fmtstream.c
+++ b/lib/argp-fmtstream.c
@@ -31,6 +31,7 @@
 
 #include "argp-fmtstream.h"
 #include "argp-namefrob.h"
+#include "mbswidth.h"
 
 #ifndef ARGP_FMTSTREAM_USE_LINEWRAP
 
@@ -122,6 +123,7 @@ __argp_fmtstream_update (argp_fmtstream_t fs)
 {
   char *buf, *nl;
   size_t len;
+  int visible_len;
 
   /* Scan the buffer for newlines.  */
   buf = fs->buf + fs->point_offs;
@@ -160,6 +162,7 @@ __argp_fmtstream_update (argp_fmtstream_t fs)
 }
 
   len = fs->p - buf;
+  visible_len = mbsnwidth (buf, len, 0);
   nl = memchr (buf, '\n', len);
 
   if (fs->point_col < 0)
@@ -169,12 +172,12 @@ __argp_fmtstream_update (argp_fmtstream_t fs)
 {
   /* The buffer ends in a partial line.  */
 
-  if (fs->point_col + len < fs->rmargin)
+  if (fs->point_col + visible_len < fs->rmargin)
 {
   /* The remaining buffer text is a partial line and fits
  within the maximum line width.  Advance point for the
  characters to be written and stop scanning.  */
-  fs->point_col += len;
+  fs->point_col += visible_len;
   break;
 }
   else
diff --git a/modules/argp b/modules/argp
index 88cc78c3cb..2605bccb89 100644
--- a/modules/argp
+++ b/modules/argp
@@ -37,6 +37,7 @@ stdio
 strerror
 memchr
 memmove
+mbswidth
 
 configure.ac:
 gl_ARGP
-- 
2.44.0