On Apr 5, 2011, at 3:49 PM, Per Øyvind Karlsen wrote:

> 2011/4/5 Jeff Johnson <n3...@mac.com>:
>> No way Jose!
>> 
>> rpmbuild (and *.rpm metadata) can NOT have any encoding
>> assumed.
>> 
>> Encoding is for DISPLAY, not for octets.
>> 
>> Put unicode into package metadata at your own peril.
>> 
>> Meanwhile -- without an means to specify encoding in metadata --
>> rpm in C has *ONLY* 8 bit clean octet's and the usual conventions
>> for NUL terminated strings.
>> 
>> Until there's a well defined means of specifying encoding for all
>> tag strings -- and that's a fundamental design change to *.rpm packaging that
>> likely will NEVER happen -- the problem simply CANNOT be fixed to meet naive
>> luser expectations, and all attempts to "fix" anything
>> are just doomed.
>> 
>> C has octest, not utf8, and rpmdb strings are _NOT_ based on LC_ALL
>> and other i18n/l10n conventions.
>> 
>> You can of course put whatever garbage you wish into strings that
>> will be stored as keys in an rpmdb, subject to all the usual
>> GIGO conventions distro's wish to inflict upon their customers.
> Okay, my mistake anyways, I was looking into an issue with unicode strings,
> then I specified wrong locale when testing. I notice now that with properly
> specified locale, it accepts unicode characters.
> 

The test is way way feeble, but once the expectation starts, well
there's nothing to do but solve the problem "correctly".

What is broken -- by design -- is that *.spec recipes have multiple
encodings, not a per-file encoding. And all hell starts to break
loose in *.rpm packages when retrievals using keys pick up a per-key encoding.

You tell me how to lookup all possible encodings from a database without
specifically tying an encoding to every possible tag.

> Still though, using '%description -l', descriptions disappears.. :|
> 

C permits octets, not encodings. All possible encodings fit into octets
with NUL terminated strings. The only thing that saves %description
(which doesn't belong in *.rpm packages, another design issue that I don't
feels like arguing about because you somplly will NOT like the answer
of specifying possibly hundreds of properly encoded %description's in
a single *.spec using the full-blown form of the 4-tuple used for
encoding on a per-tag basis. Package metadata will simply explode
for no known purpose.

And RPM_I18NSTRING_TYPE has been on death row all of this century,
is carried along solely because PLD and a few other distros *still*
insist on inserting translations into *.spec recipes directly.

A data type that is sometimes an arary, and sometimes a scalar dependent
on the context of interpretation just isn't a useful data type.

Nor is there any known/modern reason why all possible encodings MUST be carried 
in
each and every package header in the year 2011. There's specspo and other means
of %description et al distribution that are far far superiour to 
RPM_I18NSTRING_TYPE pulled
in from *.spec recipes. This was _NOT_ true back in 1998 when 
RPM_I18NSTRING_TYPE was devised.

73 de Jeff______________________________________________________________________
RPM Package Manager                                    http://rpm5.org
Developer Communication List                        rpm-devel@rpm5.org

Reply via email to