On Apr 5, 2011, at 3:49 PM, Per Øyvind Karlsen wrote: > 2011/4/5 Jeff Johnson <n3...@mac.com>: >> No way Jose! >> >> rpmbuild (and *.rpm metadata) can NOT have any encoding >> assumed. >> >> Encoding is for DISPLAY, not for octets. >> >> Put unicode into package metadata at your own peril. >> >> Meanwhile -- without an means to specify encoding in metadata -- >> rpm in C has *ONLY* 8 bit clean octet's and the usual conventions >> for NUL terminated strings. >> >> Until there's a well defined means of specifying encoding for all >> tag strings -- and that's a fundamental design change to *.rpm packaging that >> likely will NEVER happen -- the problem simply CANNOT be fixed to meet naive >> luser expectations, and all attempts to "fix" anything >> are just doomed. >> >> C has octest, not utf8, and rpmdb strings are _NOT_ based on LC_ALL >> and other i18n/l10n conventions. >> >> You can of course put whatever garbage you wish into strings that >> will be stored as keys in an rpmdb, subject to all the usual >> GIGO conventions distro's wish to inflict upon their customers. > Okay, my mistake anyways, I was looking into an issue with unicode strings, > then I specified wrong locale when testing. I notice now that with properly > specified locale, it accepts unicode characters. >
The test is way way feeble, but once the expectation starts, well there's nothing to do but solve the problem "correctly". What is broken -- by design -- is that *.spec recipes have multiple encodings, not a per-file encoding. And all hell starts to break loose in *.rpm packages when retrievals using keys pick up a per-key encoding. You tell me how to lookup all possible encodings from a database without specifically tying an encoding to every possible tag. > Still though, using '%description -l', descriptions disappears.. :| > C permits octets, not encodings. All possible encodings fit into octets with NUL terminated strings. The only thing that saves %description (which doesn't belong in *.rpm packages, another design issue that I don't feels like arguing about because you somplly will NOT like the answer of specifying possibly hundreds of properly encoded %description's in a single *.spec using the full-blown form of the 4-tuple used for encoding on a per-tag basis. Package metadata will simply explode for no known purpose. And RPM_I18NSTRING_TYPE has been on death row all of this century, is carried along solely because PLD and a few other distros *still* insist on inserting translations into *.spec recipes directly. A data type that is sometimes an arary, and sometimes a scalar dependent on the context of interpretation just isn't a useful data type. Nor is there any known/modern reason why all possible encodings MUST be carried in each and every package header in the year 2011. There's specspo and other means of %description et al distribution that are far far superiour to RPM_I18NSTRING_TYPE pulled in from *.spec recipes. This was _NOT_ true back in 1998 when RPM_I18NSTRING_TYPE was devised. 73 de Jeff______________________________________________________________________ RPM Package Manager http://rpm5.org Developer Communication List rpm-devel@rpm5.org