Re: [PATCH] Gracefully handle incompatible locale data

2015-10-28 Thread Carlos O'Donell
On 10/27/2015 11:30 AM, Samuel Thibault wrote:
> Hello,
> 
> Ludovic Courtès, le Tue 22 Sep 2015 17:27:55 +0200, a écrit :
>>   loadlocale.c:130: _nl_intern_locale_data: Assertion `cnt < (sizeof 
>> (_nl_value_type_LC_COLLATE) / sizeof (_nl_value_type_LC_COLLATE[0]))' failed.
>>
>> This patch changes such conditions to return EINVAL instead of aborting.
> 
> Just like it does for the __glibc_unlikely (idx > (size_t)
> newdata->filesize) test above, so it doesn't actually introduce any new
> error condition.
> 
> I thus commited the change, thanks!
> 
> Samuel
> 

Thanks Samuel!

c.



Re: [PATCH] Gracefully handle incompatible locale data

2015-10-27 Thread Carlos O'Donell
On 10/13/2015 10:45 AM, Ludovic Courtès wrote:
>> - What does localedef --list-archive return?
>>
>>   - The new LC_COLLATE format will make it's way into the binary locale 
>> archive
>> and that means glibc can't read the locale-archive? Does it fail? exit 
>> code?
> 
> The patch does not change how locale archives are handled.
> 
> I think we’re confusing locale archive and locale data; or am I simply
> missing something?  :-)

Your patch is OK.

Notes:

(1) Do we return NULL and EINVAL? Yes.

Loading locale data from the locale archive uses _nl_load_locale_from_archive.
The function _nl_load_locale_from_archive calls _nl_intern_locale_data
which can trigger the assert on invalid type sizes.

~~~ locale/loadarchive.c ~~~
134 _nl_load_locale_from_archive (int category, const char **namep)
...
478 lia->data[cnt] = _nl_intern_locale_data (cnt,
479  results[cnt].addr,
480  results[cnt].len);
~~~

Which seems like it can trigger the assertion when loading the larger
LC_COLLATE data from the archive. Now we return NULL, ignore the failed load,
and potentially return NULL again since `lia->data[category]` is now NULL.

This means `_nl_find_locale` returns NULL, and then functions like `setlocale`
appear to return NULL to indicate no data was loaded with errno set to EINVAL.

(2) Does localedef --list-archive work?

Yes. It is unaffected by the LC_COLLATE changes since the locale archive records
have explicit length and can be listed even when they can't be loaded. This is
wrong IMO, and we should have done something to detect the invalid LC_COLLATE
and print a warning, but that's another QoI issue unrelated to the patch you're
trying to apply.

c.



Re: [PATCH] Gracefully handle incompatible locale data

2015-10-13 Thread Carlos O'Donell
On 10/12/2015 08:49 PM, Allan McRae wrote:
> On 29/09/15 06:54, Carlos O'Donell wrote:
>> On 09/26/2015 06:24 AM, Ludovic Courtès wrote:
>>> Furthermore, the function in question returns EINVAL in other similar
>>> cases–e.g., when libc 2.22 loads LC_COLLATE data from libc 2.21.
>>
>> If you change this particular case to EINVAL, what does the user see
>> as a result of this change? Do they get a non-zero exit code from
>> `localedef --list-archive` along with an error written out to stderr?
>>
>> This is the kind of change I'm expecting. If we are removing an assertion,
>> we should be replacing it with something meaningful and verifying that
>> meaningful change.
>>
>> You need not change any of the other cases you've found that return EINVAL,
>> we can update those incrementally, but for this one change you're making
>> we should fix it as best we can.
>>
> 
> If I am reading this correctly, the change to from an abort to EINVAL
> would be fine if it is accompanied by a change to localedef
> --list-archive.  Is that correct?
> 
> A solution to this would be great given we now run into this assert with
> locale archives built with different glibc builds along the 2.22 release
> branch.

Yes.

I'll make some general comments in the thread you started about the patch,
rather than here.

Cheers,
Carlos.




Re: [PATCH] Gracefully handle incompatible locale data

2015-10-13 Thread Carlos O'Donell
On 09/29/2015 04:08 AM, Ludovic Courtès wrote:
> "Carlos O'Donell" <car...@redhat.com> skribis:
> 
>> On 09/26/2015 06:24 AM, Ludovic Courtès wrote:
>>> Furthermore, the function in question returns EINVAL in other similar
>>> cases–e.g., when libc 2.22 loads LC_COLLATE data from libc 2.21.
>>
>> If you change this particular case to EINVAL, what does the user see
>> as a result of this change?
> 
> The user-visible change is that, if incompatible or broken locale data
> is found, a call like:
> 
>   setlocale (LC_ALL, "");
> 
> returns EINVAL instead of aborting.

Perfect.
 
>> Do they get a non-zero exit code from `localedef --list-archive` along
>> with an error written out to stderr?
> 
> ‘localedef’ starts with:
> 
>   setlocale (LC_MESSAGES, "");
>   setlocale (LC_CTYPE, "");
> 
> so it will no longer abort when invalid locale data is found (although
> in the 2.21 → 2.22 transition, only the LC_COLLATE data format differs
> anyway.)

OK.

> Apart from that, ‘localedef --list-archive’ simply opens the locale
> archive (typically /usr/lib/locale/locale-archive, regardless of the
> ‘LOCPATH’ environment variable value), so its behavior is unchanged.
> 
> Am I overlooking something?

If the locale-archive is upgraded to the new format with LC_COLLATE changed
what happens when you run localedef --list-archive? Does it list zero locales
and exit with an exit code of zero?

I would hope that it prints something about the broken locale, because after
the removal of the assertion you won't know anything is wrong with the archive?

>> This is the kind of change I'm expecting. If we are removing an assertion,
>> we should be replacing it with something meaningful and verifying that
>> meaningful change.
> 
> Yes, agreed.
> 
> The function that is changed, ‘_nl_intern_locale_data’, has only two
> callers in libc, and both check whether it returns NULL.  So it seems to
> me that the code is not introducing anything new in the API contract.
> WDYT?

It isn't introducing anything new, but you are removing an assert and we need
to make sure that the intent of the design remains: Warn the user something
is wrong with the locale data.

I see two cases:

- What does setlocale() return?

  - Verified. You say it returns EINVAL for the affected locale and that's 
perfect.

- What does localedef --list-archive return?

  - The new LC_COLLATE format will make it's way into the binary locale archive
and that means glibc can't read the locale-archive? Does it fail? exit code?

If we cover those two cases with some kind of error message then I think we're 
done.

Cheers,
Carlos.




Re: [PATCH] Gracefully handle incompatible locale data

2015-09-25 Thread Carlos O'Donell
On 09/24/2015 12:12 PM, Ludovic Courtès wrote:
> Ondřej, I think we have been miscommunicating.
> 
> I noticed that a program linked against 2.21 or earlier would abort with
> an assertion failure when it stumbles upon 2.22 locale data.
> 
> All the patch tries to do is change the abort to EINVAL (and skip locale
> data) when that happens.
> 
> I’m not claiming this is perfect, and I agree with you on that point.
> I’m just saying that ignoring the faulty locale data and returning
> EINVAL (which the application can choose to take into account or not) is
> preferable to aborting.
> 
> Does that make sense?

Despite Roland saying "LGTM", I think this is not a good change.

Firstly, it's not the community consensus as Ondrej is pointing out.

https://sourceware.org/glibc/wiki/Style_and_Conventions#Error_Handling

It is a fundamental system misconfiguration issue not to have upgraded
the binary locale data from one release to another.

The community consensus was that user errors like this should fail
immediately, but in ways which the user understands the failure, and
fixes the system.

Returning an error code simply leads to the user ignoring the serious
configuration issue. Worse is that in a locale archive file we now
skip such bad binary archives (_nl_load_locale_from_archive), and
this hides the problem. Worse we might also skip locale files in
directories like /usr/lib/locale/C.utf8, which we might want to always
be loaded as a default UTF-8 locale. I'd rather see an error message
in Fedora than allow that to continue by skipping that locale with
no error given.

We should abort, but the abort error message should be much clearer
about what's going wrong. Therefore I would accept a patch that gives
a clearer error message in this case, but not one that removes the
assert.

Cheers,
Carlos.