Re: [PATCH v11 08/10] convert: advise canonical UTF encoding names

Lars Schneider Thu, 15 Mar 2018 15:43:44 -0700

> On 09 Mar 2018, at 20:11, Junio C Hamano <[email protected]> wrote:
> 
> [email protected] writes:
> 
>> From: Lars Schneider <[email protected]>
>> 
>> The canonical name of an UTF encoding has the format UTF, dash, number,
>> and an optionally byte order in upper case (e.g. UTF-8 or UTF-16BE).
>> Some iconv versions support alternative names without a dash or with
>> lower case characters.
>> 
>> To avoid problems between different iconv version always suggest the
>> canonical UTF names in advise messages.
>> 
>> Signed-off-by: Lars Schneider <[email protected]>
>> ---
> 
> I think it is probably better to squash this to earlier step,
> i.e. jumping straight to the endgame solution.


ok!


>> diff --git a/convert.c b/convert.c
>> index b80d666a6b..9a3ae7cce1 100644
>> --- a/convert.c
>> +++ b/convert.c
>> @@ -279,12 +279,20 @@ static int validate_encoding(const char *path, const 
>> char *enc,
>>                              "BOM is prohibited in '%s' if encoded as %s");
>>                      /*
>>                       * This advice is shown for UTF-??BE and UTF-??LE 
>> encodings.
>> +                     * We cut off the last two characters of the encoding 
>> name
>> +                     # to generate the encoding name suitable for BOMs.
>>                       */
> 
> I somehow thought that I saw "s/#/*/" in somebody's response during
> the previous round?

Oops. Will fix!


>>                      const char *advise_msg = _(
>>                              "The file '%s' contains a byte order "
>> -                            "mark (BOM). Please use %.6s as "
>> +                            "mark (BOM). Please use UTF-%s as "
>>                              "working-tree-encoding.");
>> -                    advise(advise_msg, path, enc);
>> +                    const char *stripped = "";
>> +                    char *upper = xstrdup_toupper(enc);
>> +                    upper[strlen(upper)-2] = '\0';
>> +                    if (!skip_prefix(upper, "UTF-", &stripped))
>> +                            skip_prefix(stripped, "UTF", &stripped);
>> +                    advise(advise_msg, path, stripped);
>> +                    free(upper);
> 
> If this codepath is ever entered with "enc" that does not begin with
> "UTF" (e.g. "Shift_JIS", which is impossible in the current code,
> but I'll talk about future-proofing here), then neither of these
> skip_prefix will trigger, and then you'd end up suggesting to use
> "UTF-" that is nonsense.  Perhaps initialize stripped to NULL and
> force advise to segv to catch such a programmer error?

Agreed!


Thanks,
Lars

Re: [PATCH v11 08/10] convert: advise canonical UTF encoding names

Reply via email to