On Wed, Nov 25, 2015 at 6:45 PM, William A Rowe Jr <wr...@rowe-clan.net> wrote:
> On Nov 25, 2015 4:19 PM, "Mikhail T." <mi+t...@aldan.algebra.com> wrote: > > > > Thus, I contend, using C-library will not cause invalid results, and the > only reason to have Apache's own implementation is performance, but not > correctness. > > Well almost but wrong... > > The pure char-based ß processing produced no case change in my reviews of > tolower/toupper in de_DE codeset. If you were to examine string comparison > the collation order changes substantially. > > And more to the point, if tolower()/toupper() could handle not only mbcs but multicharacter transliteration, your results would have varied. 1:1 character translations have their intrinsic limits. > That said, I'm working up a comprehensive audit and other codeset/language > combinations absolutely do. Code and results forthcoming shortly. > As promised, here's a quick review based on the sbcs and utf8 code pages in the very limited single-byte scope on my machine. I did not touch the following mbcs because they require 'shift-state' to toggle into and out of specific characters and that implies a lot of calculated fuzzing that I didn't have time for this week. (Since mod_ftp explicit tls is still broken, I had no time for any of this, either ;-) I also didn't get to evaluating the wide chars yet that fall into the traditional posix/c ascii range, which I still mean to do, and haven't yet repeated this exercise on win32 or os/x, only on a somewhat multinational configuration of fedora 22. The source code is pretty rudimentary. I used iconv to shove all of the resulting text evaluation into utf-8 for the console/file output, it really plays no part in the locality equation. It can be adapted for testing similar on an EBCDIC box with a bit of clever coding I never got to. Untested: ja_JP.eucjp ja_JP.ujis japanese.euc ko_KR.euckr korean.euc zh_CN.gb18030 zh_CN.gb2312 zh_CN.gbk zh_HK.big5hkscs zh_SG.gb2312 zh_SG.gbk zh_TW.big5 zh_TW.euctw Tested and exceptional results noted (source code attached); LANG="aa_DJ.iso88591"; no surprises LANG="af_ZA.iso88591"; no surprises LANG="an_ES.iso885915"; no surprises LANG="ar_AE.iso88596"; no surprises LANG="ar_BH.iso88596"; no surprises LANG="ar_DZ.iso88596"; no surprises LANG="ar_EG.iso88596"; no surprises LANG="ar_IQ.iso88596"; no surprises LANG="ar_JO.iso88596"; no surprises LANG="ar_KW.iso88596"; no surprises LANG="ar_LB.iso88596"; no surprises LANG="ar_LY.iso88596"; no surprises LANG="ar_MA.iso88596"; no surprises LANG="ar_OM.iso88596"; no surprises LANG="ar_QA.iso88596"; no surprises LANG="ar_SA.iso88596"; no surprises LANG="ar_SD.iso88596"; no surprises LANG="ar_SY.iso88596"; no surprises LANG="ar_TN.iso88596"; no surprises LANG="ar_YE.iso88596"; no surprises LANG="ast_ES.iso885915"; no surprises LANG="be_BY.cp1251"; no surprises LANG="bg_BG.cp1251"; no surprises LANG="br_FR.iso88591"; no surprises LANG="br_FR.iso885915@euro"; no surprises LANG="bs_BA.iso88592"; no surprises LANG="ca_AD.iso885915"; no surprises LANG="ca_ES.iso88591"; no surprises LANG="ca_ES.iso885915@euro"; no surprises LANG="ca_FR.iso885915"; no surprises LANG="ca_IT.iso885915"; no surprises LANG="cs_CZ.iso88592"; no surprises LANG="cy_GB.iso885914"; no surprises LANG="da_DK.iso88591"; no surprises LANG="da_DK.iso885915"; no surprises LANG="de_AT.iso88591"; no surprises LANG="de_AT.iso885915@euro"; no surprises LANG="de_BE.iso88591"; no surprises LANG="de_BE.iso885915@euro"; no surprises LANG="de_CH.iso88591"; no surprises LANG="de_DE.iso88591"; no surprises LANG="de_DE.iso885915@euro"; no surprises LANG="de_LU.iso88591"; no surprises LANG="de_LU.iso885915@euro"; no surprises LANG="el_CY.iso88597"; no surprises LANG="el_GR.iso88597"; no surprises LANG="en_AU.iso88591"; no surprises LANG="en_BW.iso88591"; no surprises LANG="en_CA.iso88591"; no surprises LANG="en_DK.iso88591"; no surprises LANG="en_GB.iso88591"; no surprises LANG="en_GB.iso885915"; no surprises LANG="en_HK.iso88591"; no surprises LANG="en_IE.iso88591"; no surprises LANG="en_IE.iso885915@euro"; no surprises LANG="en_NZ.iso88591"; no surprises LANG="en_PH.iso88591"; no surprises LANG="en_SG.iso88591"; no surprises LANG="en_US.iso88591"; no surprises LANG="en_US.iso885915"; no surprises LANG="en_ZA.iso88591"; no surprises LANG="en_ZW.iso88591"; no surprises LANG="es_AR.iso88591"; no surprises LANG="es_BO.iso88591"; no surprises LANG="es_CL.iso88591"; no surprises LANG="es_CO.iso88591"; no surprises LANG="es_CR.iso88591"; no surprises LANG="es_DO.iso88591"; no surprises LANG="es_EC.iso88591"; no surprises LANG="es_ES.iso88591"; no surprises LANG="es_ES.iso885915@euro"; no surprises LANG="es_GT.iso88591"; no surprises LANG="es_HN.iso88591"; no surprises LANG="es_MX.iso88591"; no surprises LANG="es_NI.iso88591"; no surprises LANG="es_PA.iso88591"; no surprises LANG="es_PE.iso88591"; no surprises LANG="es_PR.iso88591"; no surprises LANG="es_PY.iso88591"; no surprises LANG="es_SV.iso88591"; no surprises LANG="es_US.iso88591"; no surprises LANG="es_UY.iso88591"; no surprises LANG="es_VE.iso88591"; no surprises LANG="et_EE.iso88591"; no surprises LANG="et_EE.iso885915"; no surprises LANG="eu_ES.iso88591"; no surprises LANG="eu_ES.iso885915@euro"; no surprises LANG="fi_FI.iso88591"; no surprises LANG="fi_FI.iso885915@euro"; no surprises LANG="fo_FO.iso88591"; no surprises LANG="fr_BE.iso88591"; no surprises LANG="fr_BE.iso885915@euro"; no surprises LANG="fr_CA.iso88591"; no surprises LANG="fr_CH.iso88591"; no surprises LANG="fr_FR.iso88591"; no surprises LANG="fr_FR.iso885915@euro"; no surprises LANG="fr_LU.iso88591"; no surprises LANG="fr_LU.iso885915@euro"; no surprises LANG="ga_IE.iso88591"; no surprises LANG="ga_IE.iso885915@euro"; no surprises LANG="gd_GB.iso885915"; no surprises LANG="gl_ES.iso88591"; no surprises LANG="gl_ES.iso885915@euro"; no surprises LANG="gv_GB.iso88591"; no surprises LANG="he_IL.iso88598"; no surprises LANG="hr_HR.iso88592"; no surprises LANG="hsb_DE.iso88592"; no surprises LANG="hu_HU.iso88592"; no surprises LANG="hy_AM.armscii8"; no surprises LANG="id_ID.iso88591"; no surprises LANG="is_IS.iso88591"; no surprises LANG="it_CH.iso88591"; no surprises LANG="it_IT.iso88591"; no surprises LANG="it_IT.iso885915@euro"; no surprises LANG="iw_IL.iso88598"; no surprises LANG="ka_GE.georgianps"; no surprises LANG="kk_KZ.pt154"; no surprises LANG="kl_GL.iso88591"; no surprises LANG="ku_TR.iso88599"; 64 = @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ^ @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`ABCDEFGHİJKLMNOPQRSTUVWXYZ{|}~ v @abcdefghıjklmnopqrstuvwxyz[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ? ........*................. ''''''''*''''''''''''''''' 192 = ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏĞÑÒÓÔÕÖ×ØÙÚÛÜİŞßàáâãäåæçèéêëìíîïğñòóôõö÷øùúûüışÿ ^ ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏĞÑÒÓÔÕÖ×ØÙÚÛÜİŞßÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏĞÑÒÓÔÕÖ÷ØÙÚÛÜIŞÿ v àáâãäåæçèéêëìíîïğñòóôõö×øùúûüişßàáâãäåæçèéêëìíîïğñòóôõö÷øùúûüışÿ ? ....................... .....*. ''''''''''''''''''''''' '''''*' LANG="kw_GB.iso88591"; no surprises LANG="lg_UG.iso885910"; no surprises LANG="lt_LT.iso885913"; no surprises LANG="lv_LV.iso885913"; no surprises LANG="mg_MG.iso885915"; no surprises LANG="mi_NZ.iso885913"; no surprises LANG="mk_MK.iso88595"; no surprises LANG="ms_MY.iso88591"; no surprises LANG="mt_MT.iso88593"; 128 = Ħ˘£¤ Ĥ§¨İŞĞĴ Ż°ħ²³´µĥ·¸ışğĵ½ ż ^ Ħ˘£¤ Ĥ§¨İŞĞĴ Ż°Ħ²³´µĤ·¸IŞĞĴ½ Ż v ħ˘£¤ ĥ§¨işğĵ ż°ħ²³´µĥ·¸ışğĵ½ ż ? . . *... . ' ' *''' ' LANG="nb_NO.iso88591"; no surprises LANG="nl_BE.iso88591"; no surprises LANG="nl_BE.iso885915@euro"; no surprises LANG="nl_NL.iso88591"; no surprises LANG="nl_NL.iso885915@euro"; no surprises LANG="nn_NO.iso88591"; no surprises LANG="(null)"; no surprises LANG="oc_FR.iso88591"; no surprises LANG="om_KE.iso88591"; no surprises LANG="pl_PL.iso88592"; no surprises LANG="pt_BR.iso88591"; no surprises LANG="pt_PT.iso88591"; no surprises LANG="pt_PT.iso885915@euro"; no surprises LANG="ro_RO.iso88592"; no surprises LANG="ru_RU.iso88595"; no surprises LANG="ru_RU.koi8r"; no surprises LANG="ru_UA.koi8u"; no surprises LANG="sk_SK.iso88592"; no surprises LANG="sl_SI.iso88592"; no surprises LANG="so_DJ.iso88591"; no surprises LANG="so_KE.iso88591"; no surprises LANG="so_SO.iso88591"; no surprises LANG="sq_AL.iso88591"; no surprises LANG="st_ZA.iso88591"; no surprises LANG="sv_FI.iso88591"; no surprises LANG="sv_FI.iso885915@euro"; no surprises LANG="sv_SE.iso88591"; no surprises LANG="sv_SE.iso885915"; no surprises LANG="tg_TJ.koi8t"; no surprises LANG="th_TH.tis620"; no surprises LANG="tl_PH.iso88591"; no surprises LANG="tr_CY.iso88599"; 64 = @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ^ @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`ABCDEFGHİJKLMNOPQRSTUVWXYZ{|}~ v @abcdefghıjklmnopqrstuvwxyz[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ? ........*................. ''''''''*''''''''''''''''' 192 = ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏĞÑÒÓÔÕÖ×ØÙÚÛÜİŞßàáâãäåæçèéêëìíîïğñòóôõö÷øùúûüışÿ ^ ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏĞÑÒÓÔÕÖ×ØÙÚÛÜİŞßÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏĞÑÒÓÔÕÖ÷ØÙÚÛÜIŞÿ v àáâãäåæçèéêëìíîïğñòóôõö×øùúûüişßàáâãäåæçèéêëìíîïğñòóôõö÷øùúûüışÿ ? ....................... .....*. ''''''''''''''''''''''' '''''*' LANG="tr_TR.iso88599"; 64 = @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ^ @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`ABCDEFGHİJKLMNOPQRSTUVWXYZ{|}~ v @abcdefghıjklmnopqrstuvwxyz[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ? ........*................. ''''''''*''''''''''''''''' 192 = ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏĞÑÒÓÔÕÖ×ØÙÚÛÜİŞßàáâãäåæçèéêëìíîïğñòóôõö÷øùúûüışÿ ^ ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏĞÑÒÓÔÕÖ×ØÙÚÛÜİŞßÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏĞÑÒÓÔÕÖ÷ØÙÚÛÜIŞÿ v àáâãäåæçèéêëìíîïğñòóôõö×øùúûüişßàáâãäåæçèéêëìíîïğñòóôõö÷øùúûüışÿ ? ....................... .....*. ''''''''''''''''''''''' '''''*' LANG="uk_UA.koi8u"; no surprises LANG="uz_UZ.iso88591"; no surprises LANG="wa_BE.iso88591"; no surprises LANG="wa_BE.iso885915@euro"; no surprises LANG="xh_ZA.iso88591"; no surprises LANG="yi_US.cp1255"; no surprises LANG="zu_ZA.iso88591"; no surprises LANG="aa_DJ.utf8"; no surprises LANG="aa_ER.utf8"; no surprises LANG="aa_ER.utf8@saaho"; no surprises LANG="aa_ET.utf8"; no surprises LANG="af_ZA.utf8"; no surprises LANG="ak_GH.utf8"; no surprises LANG="am_ET.utf8"; no surprises LANG="an_ES.utf8"; no surprises LANG="anp_IN.utf8"; no surprises LANG="ar_AE.utf8"; no surprises LANG="ar_BH.utf8"; no surprises LANG="ar_DZ.utf8"; no surprises LANG="ar_EG.utf8"; no surprises LANG="ar_IN.utf8"; no surprises LANG="ar_IQ.utf8"; no surprises LANG="ar_JO.utf8"; no surprises LANG="ar_KW.utf8"; no surprises LANG="ar_LB.utf8"; no surprises LANG="ar_LY.utf8"; no surprises LANG="ar_MA.utf8"; no surprises LANG="ar_OM.utf8"; no surprises LANG="ar_QA.utf8"; no surprises LANG="ar_SA.utf8"; no surprises LANG="ar_SD.utf8"; no surprises LANG="ar_SS.utf8"; no surprises LANG="ar_SY.utf8"; no surprises LANG="ar_TN.utf8"; no surprises LANG="ar_YE.utf8"; no surprises LANG="as_IN.utf8"; no surprises LANG="ast_ES.utf8"; no surprises LANG="ayc_PE.utf8"; no surprises LANG="az_AZ.utf8"; 64 = @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ^ @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`ABCDEFGHiJKLMNOPQRSTUVWXYZ{|}~ v @abcdefghIjklmnopqrstuvwxyz[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ? ........*................. ''''''''*''''''''''''''''' LANG="be_BY.utf8"; no surprises LANG="be_BY.utf8@latin"; no surprises LANG="bem_ZM.utf8"; no surprises LANG="ber_DZ.utf8"; no surprises LANG="ber_MA.utf8"; no surprises LANG="bg_BG.utf8"; no surprises LANG="bh_IN.utf8"; no surprises LANG="bho_IN.utf8"; no surprises LANG="bn_BD.utf8"; no surprises LANG="bn_IN.utf8"; no surprises LANG="bo_CN.utf8"; no surprises LANG="bo_IN.utf8"; no surprises LANG="br_FR.utf8"; no surprises LANG="brx_IN.utf8"; no surprises LANG="bs_BA.utf8"; no surprises LANG="byn_ER.utf8"; no surprises LANG="ca_AD.utf8"; no surprises LANG="ca_ES.utf8"; no surprises LANG="ca_FR.utf8"; no surprises LANG="ca_IT.utf8"; no surprises LANG="ce_RU.utf8"; no surprises LANG="cmn_TW.utf8"; no surprises LANG="crh_UA.utf8"; 64 = @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ^ @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`ABCDEFGHiJKLMNOPQRSTUVWXYZ{|}~ v @abcdefghIjklmnopqrstuvwxyz[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ? ........*................. ''''''''*''''''''''''''''' LANG="csb_PL.utf8"; no surprises LANG="cs_CZ.utf8"; no surprises LANG="cv_RU.utf8"; no surprises LANG="cy_GB.utf8"; no surprises LANG="da_DK.utf8"; no surprises LANG="de_AT.utf8"; no surprises LANG="de_BE.utf8"; no surprises LANG="de_CH.utf8"; no surprises LANG="de_DE.utf8"; no surprises LANG="de_LU.utf8"; no surprises LANG="doi_IN.utf8"; no surprises LANG="dv_MV.utf8"; no surprises LANG="dz_BT.utf8"; no surprises LANG="el_CY.utf8"; no surprises LANG="el_GR.utf8"; no surprises LANG="en_AG.utf8"; no surprises LANG="en_AU.utf8"; no surprises LANG="en_BW.utf8"; no surprises LANG="en_CA.utf8"; no surprises LANG="en_DK.utf8"; no surprises LANG="en_GB.utf8"; no surprises LANG="en_HK.utf8"; no surprises LANG="en_IE.utf8"; no surprises LANG="en_IN.utf8"; no surprises LANG="en_NG.utf8"; no surprises LANG="en_NZ.utf8"; no surprises LANG="en_PH.utf8"; no surprises LANG="en_SG.utf8"; no surprises LANG="en_US.utf8"; no surprises LANG="en_ZA.utf8"; no surprises LANG="en_ZM.utf8"; no surprises LANG="en_ZW.utf8"; no surprises LANG="es_AR.utf8"; no surprises LANG="es_BO.utf8"; no surprises LANG="es_CL.utf8"; no surprises LANG="es_CO.utf8"; no surprises LANG="es_CR.utf8"; no surprises LANG="es_CU.utf8"; no surprises LANG="es_DO.utf8"; no surprises LANG="es_EC.utf8"; no surprises LANG="es_ES.utf8"; no surprises LANG="es_GT.utf8"; no surprises LANG="es_HN.utf8"; no surprises LANG="es_MX.utf8"; no surprises LANG="es_NI.utf8"; no surprises LANG="es_PA.utf8"; no surprises LANG="es_PE.utf8"; no surprises LANG="es_PR.utf8"; no surprises LANG="es_PY.utf8"; no surprises LANG="es_SV.utf8"; no surprises LANG="es_US.utf8"; no surprises LANG="es_UY.utf8"; no surprises LANG="es_VE.utf8"; no surprises LANG="et_EE.utf8"; no surprises LANG="eu_ES.utf8"; no surprises LANG="fa_IR.utf8"; no surprises LANG="ff_SN.utf8"; no surprises LANG="fi_FI.utf8"; no surprises LANG="fil_PH.utf8"; no surprises LANG="fo_FO.utf8"; no surprises LANG="fr_BE.utf8"; no surprises LANG="fr_CA.utf8"; no surprises LANG="fr_CH.utf8"; no surprises LANG="fr_FR.utf8"; no surprises LANG="fr_LU.utf8"; no surprises LANG="fur_IT.utf8"; no surprises LANG="fy_DE.utf8"; no surprises LANG="fy_NL.utf8"; no surprises LANG="ga_IE.utf8"; no surprises LANG="gd_GB.utf8"; no surprises LANG="gez_ER.utf8"; no surprises LANG="gez_ER.utf8@abegede"; no surprises LANG="gez_ET.utf8"; no surprises LANG="gez_ET.utf8@abegede"; no surprises LANG="gl_ES.utf8"; no surprises LANG="gu_IN.utf8"; no surprises LANG="gv_GB.utf8"; no surprises LANG="hak_TW.utf8"; no surprises LANG="ha_NG.utf8"; no surprises LANG="he_IL.utf8"; no surprises LANG="hi_IN.utf8"; no surprises LANG="hne_IN.utf8"; no surprises LANG="hr_HR.utf8"; no surprises LANG="hsb_DE.utf8"; no surprises LANG="ht_HT.utf8"; no surprises LANG="hu_HU.utf8"; no surprises LANG="hy_AM.utf8"; no surprises LANG="ia_FR.utf8"; no surprises LANG="id_ID.utf8"; no surprises LANG="ig_NG.utf8"; no surprises LANG="ik_CA.utf8"; no surprises LANG="is_IS.utf8"; no surprises LANG="it_CH.utf8"; no surprises LANG="it_IT.utf8"; no surprises LANG="iu_CA.utf8"; no surprises LANG="iw_IL.utf8"; no surprises LANG="ja_JP.utf8"; no surprises LANG="ka_GE.utf8"; no surprises LANG="kk_KZ.utf8"; no surprises LANG="kl_GL.utf8"; no surprises LANG="km_KH.utf8"; no surprises LANG="kn_IN.utf8"; no surprises LANG="kok_IN.utf8"; no surprises LANG="ko_KR.utf8"; no surprises LANG="ks_IN.utf8"; no surprises LANG="ks_IN.utf8@devanagari"; no surprises LANG="ku_TR.utf8"; 64 = @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ^ @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`ABCDEFGHiJKLMNOPQRSTUVWXYZ{|}~ v @abcdefghIjklmnopqrstuvwxyz[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ? ........*................. ''''''''*''''''''''''''''' LANG="kw_GB.utf8"; no surprises LANG="ky_KG.utf8"; no surprises LANG="lb_LU.utf8"; no surprises LANG="lg_UG.utf8"; no surprises LANG="li_BE.utf8"; no surprises LANG="lij_IT.utf8"; no surprises LANG="li_NL.utf8"; no surprises LANG="lo_LA.utf8"; no surprises LANG="lt_LT.utf8"; no surprises LANG="lv_LV.utf8"; no surprises LANG="lzh_TW.utf8"; no surprises LANG="mag_IN.utf8"; no surprises LANG="mai_IN.utf8"; no surprises LANG="mg_MG.utf8"; no surprises LANG="mhr_RU.utf8"; no surprises LANG="mi_NZ.utf8"; no surprises LANG="mk_MK.utf8"; no surprises LANG="ml_IN.utf8"; no surprises LANG="mni_IN.utf8"; no surprises LANG="mn_MN.utf8"; no surprises LANG="mr_IN.utf8"; no surprises LANG="ms_MY.utf8"; no surprises LANG="mt_MT.utf8"; no surprises LANG="my_MM.utf8"; no surprises LANG="nan_TW.utf8"; no surprises LANG="nan_TW.utf8@latin"; no surprises LANG="nb_NO.utf8"; no surprises LANG="nds_DE.utf8"; no surprises LANG="nds_NL.utf8"; no surprises LANG="ne_NP.utf8"; no surprises LANG="nhn_MX.utf8"; no surprises LANG="niu_NU.utf8"; no surprises LANG="niu_NZ.utf8"; no surprises LANG="nl_AW.utf8"; no surprises LANG="nl_BE.utf8"; no surprises LANG="nl_NL.utf8"; no surprises LANG="nn_NO.utf8"; no surprises LANG="nr_ZA.utf8"; no surprises LANG="nso_ZA.utf8"; no surprises LANG="oc_FR.utf8"; no surprises LANG="om_ET.utf8"; no surprises LANG="om_KE.utf8"; no surprises LANG="or_IN.utf8"; no surprises LANG="os_RU.utf8"; no surprises LANG="pa_IN.utf8"; no surprises LANG="pap_AN.utf8"; no surprises LANG="pap_AW.utf8"; no surprises LANG="pap_CW.utf8"; no surprises LANG="pa_PK.utf8"; no surprises LANG="pl_PL.utf8"; no surprises LANG="ps_AF.utf8"; no surprises LANG="pt_BR.utf8"; no surprises LANG="pt_PT.utf8"; no surprises LANG="quz_PE.utf8"; no surprises LANG="raj_IN.utf8"; no surprises LANG="ro_RO.utf8"; no surprises LANG="ru_RU.utf8"; no surprises LANG="ru_UA.utf8"; no surprises LANG="rw_RW.utf8"; no surprises LANG="sa_IN.utf8"; no surprises LANG="sat_IN.utf8"; no surprises LANG="sc_IT.utf8"; no surprises LANG="sd_IN.utf8"; no surprises LANG="sd_IN.utf8@devanagari"; no surprises LANG="se_NO.utf8"; no surprises LANG="shs_CA.utf8"; no surprises LANG="sid_ET.utf8"; no surprises LANG="si_LK.utf8"; no surprises LANG="sk_SK.utf8"; no surprises LANG="sl_SI.utf8"; no surprises LANG="so_DJ.utf8"; no surprises LANG="so_ET.utf8"; no surprises LANG="so_KE.utf8"; no surprises LANG="so_SO.utf8"; no surprises LANG="sq_AL.utf8"; no surprises LANG="sq_MK.utf8"; no surprises LANG="sr_ME.utf8"; no surprises LANG="sr_RS.utf8"; no surprises LANG="sr_RS.utf8@latin"; no surprises LANG="ss_ZA.utf8"; no surprises LANG="st_ZA.utf8"; no surprises LANG="sv_FI.utf8"; no surprises LANG="sv_SE.utf8"; no surprises LANG="sw_KE.utf8"; no surprises LANG="sw_TZ.utf8"; no surprises LANG="szl_PL.utf8"; no surprises LANG="ta_IN.utf8"; no surprises LANG="ta_LK.utf8"; no surprises LANG="te_IN.utf8"; no surprises LANG="tg_TJ.utf8"; no surprises LANG="the_NP.utf8"; no surprises LANG="th_TH.utf8"; no surprises LANG="ti_ER.utf8"; no surprises LANG="ti_ET.utf8"; no surprises LANG="tig_ER.utf8"; no surprises LANG="tk_TM.utf8"; no surprises LANG="tl_PH.utf8"; no surprises LANG="tn_ZA.utf8"; no surprises LANG="tr_CY.utf8"; 64 = @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ^ @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`ABCDEFGHiJKLMNOPQRSTUVWXYZ{|}~ v @abcdefghIjklmnopqrstuvwxyz[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ? ........*................. ''''''''*''''''''''''''''' LANG="tr_TR.utf8"; 64 = @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ^ @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`ABCDEFGHiJKLMNOPQRSTUVWXYZ{|}~ v @abcdefghIjklmnopqrstuvwxyz[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ? ........*................. ''''''''*''''''''''''''''' LANG="ts_ZA.utf8"; no surprises LANG="tt_RU.utf8"; no surprises LANG="tt_RU.utf8@iqtelif"; 64 = @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ^ @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`ABCDEFGHiJKLMNOPQRSTUVWXYZ{|}~ v @abcdefghIjklmnopqrstuvwxyz[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ? ........*................. ''''''''*''''''''''''''''' LANG="tu_IN.utf8"; no surprises LANG="ug_CN.utf8"; no surprises LANG="uk_UA.utf8"; no surprises LANG="unm_US.utf8"; no surprises LANG="ur_IN.utf8"; no surprises LANG="ur_PK.utf8"; no surprises LANG="uz_UZ.utf8"; no surprises LANG="uz_UZ.utf8@cyrillic"; no surprises LANG="ve_ZA.utf8"; no surprises LANG="vi_VN.utf8"; no surprises LANG="wa_BE.utf8"; no surprises LANG="wae_CH.utf8"; no surprises LANG="wal_ET.utf8"; no surprises LANG="wo_SN.utf8"; no surprises LANG="xh_ZA.utf8"; no surprises LANG="yi_US.utf8"; no surprises LANG="yo_NG.utf8"; no surprises LANG="yue_HK.utf8"; no surprises LANG="zh_CN.utf8"; no surprises LANG="zh_HK.utf8"; no surprises LANG="zh_SG.utf8"; no surprises LANG="zh_TW.utf8"; no surprises LANG="zu_ZA.utf8"; no surprises
#include <stdio.h> #include <stdlib.h> #include <locale.h> #include <string.h> #include <ctype.h> #include <iconv.h> #include <langinfo.h> /* Express shock if a character maps from the ASCII/ISO646 range into * the high-bit range, or visa versa, and then explicitly verify that * all ASCII values map as expected if it is in the lowest 128 char plane * This does not work for testing on an EBCDIC architecture, diff ranges * for alpha and 'well known posix characters' and u/l case are offset by 64. */ int surprise (int ch, int uc, int lc) { int lcsurprise = (ch > 127 && lc < 128) || (ch < 128 && lc > 127); int ucsurprise = (ch > 127 && uc < 128) || (ch < 128 && uc > 127); if (ch < 128) { if (ch >= 'A' && ch <= 'Z') { if (lc != ch + 32) lcsurprise = 1; if (uc != ch) ucsurprise = 1; } else if (ch >= 'a' && ch <= 'z') { if (lc != ch) lcsurprise = 1; if (uc != ch - 32) ucsurprise = 1; } else { if (lc != ch) lcsurprise = 1; if (uc != ch) ucsurprise = 1; } } return lcsurprise | ucsurprise; } int main (int argc, char *argv[]) { int verbose = 0; int colwidth = 64; int row, col, view, prelen; char buf[colwidth + 16], *bufch, *bufptr; char pbuf[colwidth * 4 + 16], *pbufptr; size_t bufsz, pbufsz; iconv_t convctx; char *locale; ++argv, --argc; if (*argv && (strcmp(*argv, "-v") == 0)) { verbose = 1; ++argv, --argc; } do { int viewed = 0; if (!*argv) locale = setlocale(LC_ALL, ""); else locale = setlocale(LC_ALL, *argv++); printf("LANG=\"%s\";\n", locale); convctx = iconv_open("UTF-8", nl_langinfo(CODESET)); if (!convctx) { printf("Failed to initialize \"%s\" to \"UTF-8\" iconv context\n", nl_langinfo(CODESET)); continue; } for (row = 0; row < (256 / colwidth); ++row) { for (col = 0, view = 0; col < colwidth; ++col) { int lcsurprise, ucsurprise; unsigned char ch = row * colwidth + col; unsigned char lc = tolower(ch), uc = toupper(ch); if (verbose && (lc != ch || uc != ch)) { view = 1; break; } if (surprise(ch, uc, lc)) { view = 1; break; } } if (!view) continue; bufch = buf + sprintf(buf, "%5d = ", row * colwidth); prelen = bufch - buf - 3; for (col = 0; col < colwidth; ++col) { unsigned char ch = row * colwidth + col; *(bufch++) = isprint(ch) ? ch : ' '; } bufptr = buf; bufsz = bufch - buf; pbufptr = pbuf; pbufsz = sizeof(pbuf); iconv(convctx, &bufptr, &bufsz, &pbufptr, &pbufsz); printf("%.*s\n", pbufptr - pbuf, pbuf); bufch = buf + sprintf(buf, "%.*s ^ ", prelen, " "); for (col = 0; col < colwidth; ++col) { unsigned char ch = row * colwidth + col; unsigned char uc = toupper(ch); *(bufch++) = isprint(uc) ? uc : ' '; } bufptr = buf; bufsz = bufch - buf; pbufptr = pbuf; pbufsz = sizeof(pbuf); iconv(convctx, &bufptr, &bufsz, &pbufptr, &pbufsz); printf("%.*s\n", pbufptr - pbuf, pbuf); bufch = buf + sprintf(buf, "%.*s v ", prelen, " "); for (col = 0; col < colwidth; ++col) { unsigned char ch = row * colwidth + col; unsigned char lc = tolower(ch); *(bufch++) = isprint(lc) ? lc : ' '; } bufptr = buf; bufsz = bufch - buf; pbufptr = pbuf; pbufsz = sizeof(pbuf); iconv(convctx, &bufptr, &bufsz, &pbufptr, &pbufsz); printf("%.*s\n", pbufptr - pbuf, pbuf); bufch = buf + sprintf(buf, "%.*s ? ", prelen, " "); bufch = buf + prelen + 3; for (col = 0; col < colwidth; ++col) { unsigned char ch = row * colwidth + col; unsigned char lc = tolower(ch); unsigned char uc = toupper(ch); *(bufch++) = surprise(ch, uc, lc) ? '*' : (uc != ch) ? '\'' : (lc != ch) ? '.' : ' '; } bufptr = buf; bufsz = bufch - buf; pbufptr = pbuf; pbufsz = sizeof(pbuf); iconv(convctx, &bufptr, &bufsz, &pbufptr, &pbufsz); printf("%.*s\n", pbufptr - pbuf, pbuf); viewed = 1; } if (!viewed) printf(" no surprises\n"); iconv_close(convctx); } while (*argv); }