Bug#1017435: debian-installer: georgian text mode fails to render all characters
Hi, Roland Clobus wrote (Mon, 9 Jan 2023 17:46:37 +0100): > Note the 'nearly'... :-) > In the translations the following code points are used, which are not in > your list yet: > > 10f2 (Georgian Letter Hie), used in 'retriever/media/loadnow' and > 'partman-auto-raid/notenoughparts' I noticed, that I had Hie (10f2) missing in my file compared to yours. However, I checked if it's used in Georgian translation, and my result was, that it's not used. I cannot explain what went wrong with that search, but checking again today shows, that it's indeed used, as you say. So I will add it to ka.utf now. Thanks for the heads up! Holger > 201c (Left Double Quotation Mark) 16x used > 201d (Right Double Quotation Mark) 38x used > 201e (Double Low-9 Quotation Mark) 56x used > 21b5 (Downwards Arrow With Corner Leftwards), used in > 'partman-md/deleteverify' and 'mdcfg/deleteverify' > > Note that 10f1 (Georgian Letter He) and 10f3-10fe are not used in any > translated text. > > The last four code points will be included by other languages, so it's > probably not important to have them missing in ka.utf. > > With kind regards, > Roland Clobus > > PS: The strings with the missing letters were found be enabling lines 36 > and 37 of the script. -- Holger Wansing PGP-Fingerprint: 496A C6E8 1442 4B34 8508 3529 59F1 87CA 156E B076
Bug#1017435: debian-installer: georgian text mode fails to render all characters
On 09/01/2023 18:01, Roland Clobus wrote: On 07/01/2023 11:59, Samuel Thibault wrote: Roland Clobus, le sam. 07 janv. 2023 11:31:29 +0100, a ecrit: Or... are additional udebs downloaded on demand? It seems from the list.gz files that udebs are on the first iso, and from the debian-cd exclude files that the only udebs which are not there are the ones which are already included in the d-i initrd. I doesn't seem that your current script looks at the udebs included in the initrd? I have a local build of a live image that downloads all udebs but did not remove any udeb (due to a bug, soon to be fixed), so I have looked at a total of 386 udeb files (which includes the udebs in the initrd) I'll take a closer look, because http://deb.debian.org/debian/dists/sid/main/debian-installer/binary-amd64/Packages.gz appears to mention 393 udeb files, so I'm missing 7. And found: I was comparing bookworm with sid... The ones that are extra in sid are: +depthcharge-tools-installer +libdns-export1110 +libirs-export161 +libisccc-export161 +libisccfg-export163 +libisc-export1105 +squid-deb-proxy-client-udeb With kind regards, Roland OpenPGP_signature Description: OpenPGP digital signature
Bug#1017435: debian-installer: georgian text mode fails to render all characters
On 07/01/2023 11:59, Samuel Thibault wrote: Roland Clobus, le sam. 07 janv. 2023 11:31:29 +0100, a ecrit: Or... are additional udebs downloaded on demand? It seems from the list.gz files that udebs are on the first iso, and from the debian-cd exclude files that the only udebs which are not there are the ones which are already included in the d-i initrd. I doesn't seem that your current script looks at the udebs included in the initrd? I have a local build of a live image that downloads all udebs but did not remove any udeb (due to a bug, soon to be fixed), so I have looked at a total of 386 udeb files (which includes the udebs in the initrd) I'll take a closer look, because http://deb.debian.org/debian/dists/sid/main/debian-installer/binary-amd64/Packages.gz appears to mention 393 udeb files, so I'm missing 7. With kind regards, Roland
Bug#1017435: debian-installer: georgian text mode fails to render all characters
On 08/01/2023 18:49, Holger Wansing wrote: Roland Clobus wrote (Sat, 7 Jan 2023 11:31:29 +0100): I agree. But the work for the translators can be reduced by automatically parsing the work they have already done (i.e. the translations). That would leave only some characters that have not been used in any translated text, but might be present i.e. on the keyboard. @Translators: It that something you might want to use? For a translator, it's just a matter of 1 minute, to write down the alphabet for his language, I guess. But anyway, we can still use your script, if we cannot get such info from translator (if there is no active translator, for example). Attached: 1) chars_v2.py: The updated script 2) sample.summary: Its output to the console 3) ka.utf: Automatically generated Georgian list of use characters The ka.utf you attached is conspicuously (nearly) identical to what I grabbed from Wikipedia, so: good work!!! Note the 'nearly'... :-) In the translations the following code points are used, which are not in your list yet: 10f2 (Georgian Letter Hie), used in 'retriever/media/loadnow' and 'partman-auto-raid/notenoughparts' 201c (Left Double Quotation Mark) 16x used 201d (Right Double Quotation Mark) 38x used 201e (Double Low-9 Quotation Mark) 56x used 21b5 (Downwards Arrow With Corner Leftwards), used in 'partman-md/deleteverify' and 'mdcfg/deleteverify' Note that 10f1 (Georgian Letter He) and 10f3-10fe are not used in any translated text. The last four code points will be included by other languages, so it's probably not important to have them missing in ka.utf. With kind regards, Roland Clobus PS: The strings with the missing letters were found be enabling lines 36 and 37 of the script. OpenPGP_signature Description: OpenPGP digital signature
Bug#1017435: debian-installer: georgian text mode fails to render all characters
Hi, Roland Clobus wrote (Sat, 7 Jan 2023 11:31:29 +0100): > I agree. But the work for the translators can be reduced by > automatically parsing the work they have already done (i.e. the > translations). > That would leave only some characters that have not been used in any > translated text, but might be present i.e. on the keyboard. > @Translators: It that something you might want to use? For a translator, it's just a matter of 1 minute, to write down the alphabet for his language, I guess. But anyway, we can still use your script, if we cannot get such info from translator (if there is no active translator, for example). > Attached: > 1) chars_v2.py: The updated script > 2) sample.summary: Its output to the console > 3) ka.utf: Automatically generated Georgian list of use characters The ka.utf you attached is conspicuously (nearly) identical to what I grabbed from Wikipedia, so: good work!!! Holger -- Holger Wansing PGP-Fingerprint: 496A C6E8 1442 4B34 8508 3529 59F1 87CA 156E B076
Bug#1017435: debian-installer: georgian text mode fails to render all characters
Roland Clobus, le sam. 07 janv. 2023 11:31:29 +0100, a ecrit: > On 06/01/2023 16:20, Samuel Thibault wrote: > > Roland Clobus, le ven. 06 janv. 2023 13:38:34 +0100, a ecrit: > > > With the attached script you can generate a list of all characters that > > > are > > > used in the translations. That can be used to determine the minimal set of > > > required characters. > > > > We already do that in the debian installer, but that is not enough to be > > reasonably sure that this covers all questions that might happen during > > installation, since questions could be asked by any udeb. That's why we > > rather request for a an explicit file from actual translators. > > I agree. But the work for the translators can be reduced by automatically > parsing the work they have already done (i.e. the translations). Ah ok I misread what you meant. Yes that can be a good base, which can then be proofread by translators. > In order not to miss any translated text, I've updated the script to parse > all .udeb files that are present on the installation medium and extract the > template from them. This ensures that all questions that might happen during > installation will be could. > Or... are additional udebs downloaded on demand? It seems from the list.gz files that udebs are on the first iso, and from the debian-cd exclude files that the only udebs which are not there are the ones which are already included in the d-i initrd. I doesn't seem that your current script looks at the udebs included in the initrd? Samuel
Bug#1017435: debian-installer: georgian text mode fails to render all characters
Control: tags -1 + pending Hi, Samuel Thibault wrote (Fri, 6 Jan 2023 16:20:38 +0100): > Roland Clobus, le ven. 06 janv. 2023 13:38:34 +0100, a ecrit: > > On 01/01/2023 20:49, Holger Wansing wrote: > > > Samuel Thibault wrote (Sun, 1 Jan 2023 20:14:36 > > > +0100): > > > > Hello, > > > > > > > > Holger Wansing, le mar. 16 août 2022 22:59:34 +0200, a ecrit: > > > > > Philip Hands wrote (Tue, 16 Aug 2022 11:22:30 +0200): > > > > > > openQA just noticed that the rendering of certain characters just > > > > > > changed, > > > > > > highlighting the fact that the rendering was already broken. > > ... > > > > The solution is simply to add the required characters in > > > > debian-installer/build/needed-characters/ka.utf: > > > > > So, we need a Georgian translator, providing us a file with all non-ascii > > > characters needed for the Georgian language. > > > > > > Can anyone help us, please? > > > > With the attached script you can generate a list of all characters that are > > used in the translations. That can be used to determine the minimal set of > > required characters. Roland: thanks for this tool. It may be useful at some point, however in this case I chose a different approach. > We already do that in the debian installer, but that is not enough to be > reasonably sure that this covers all questions that might happen during > installation, since questions could be asked by any udeb. That's why we > rather request for a an explicit file from actual translators. I have now generated such file from https://en.wikipedia.org/wiki/Georgian_language Tests have proven to fix the problem. Holger -- Holger Wansing PGP-Fingerprint: 496A C6E8 1442 4B34 8508 3529 59F1 87CA 156E B076
Processed: Re: Bug#1017435: debian-installer: georgian text mode fails to render all characters
Processing control commands: > tags -1 + pending Bug #1017435 [debian-installer] debian-installer: georgian text mode fails to render all characters Added tag(s) pending. -- 1017435: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1017435 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#1017435: debian-installer: georgian text mode fails to render all characters
On 06/01/2023 16:20, Samuel Thibault wrote: Roland Clobus, le ven. 06 janv. 2023 13:38:34 +0100, a ecrit: With the attached script you can generate a list of all characters that are used in the translations. That can be used to determine the minimal set of required characters. We already do that in the debian installer, but that is not enough to be reasonably sure that this covers all questions that might happen during installation, since questions could be asked by any udeb. That's why we rather request for a an explicit file from actual translators. I agree. But the work for the translators can be reduced by automatically parsing the work they have already done (i.e. the translations). That would leave only some characters that have not been used in any translated text, but might be present i.e. on the keyboard. @Translators: It that something you might want to use? If those additional characters would be listed in a second line in the file, it would even allow for friction-less merges. (I.e. the first line would be autogenerated from the provided translations and the second line would be a list of additional characters) In order not to miss any translated text, I've updated the script to parse all .udeb files that are present on the installation medium and extract the template from them. This ensures that all questions that might happen during installation will be could. Or... are additional udebs downloaded on demand? # How to run this script: # 1) cd path_of_git_workdirectory_of_debian-installer # 2) find mount_point_of_installer_image -name "*.udeb" | awk -e '{ print "dpkg-deb --control ", $1; print "if [ -e DEBIAN/templates ]; then cat DEBIAN/templates >> collect; fi"; print "rm -fr DEBIAN" }' | sh # 3) python3 chars_v2.py # Carefully evaluate the proposed modifications in build/needed-characters With kind regards, Roland Clobus Attached: 1) chars_v2.py: The updated script 2) sample.summary: Its output to the console 3) ka.utf: Automatically generated Georgian list of use characters import re # Generate a list of all characters that are used in translations in udeb files # # How to run this script: # 1) cd path_of_git_workdirectory_of_debian-installer # 2) find mount_point_of_installer_image -name "*.udeb" | awk -e '{ print "dpkg-deb --control ", $1; print "if [ -e DEBIAN/templates ]; then cat DEBIAN/templates >> collect; fi"; print "rm -fr DEBIAN" }' | sh # 3) python3 chars.py # Carefully evaluate the proposed modifications in build/needed-characters write_to_file = True dump_to_console = True file = open("collect", "r") content = file.read() file.close() lines = content.split("\n") language = "C" chars = dict(); for line in lines: # Sample: # Description-am.UTF-8: የሚጫኑ የተካይ አካሎች፦ match = re.split("\w+-([a-zA-Z@_]+).UTF-8: (.*)", line) if (len(match) > 2): # A translated text language = match[1] translation = match[2] elif line.startswith(" "): # Extended description translation = line[1:] else: # Not for translation -> reset language = "C" translation = "" for char in translation: # Debug part to find which translated text contains a specific character #if language == "nl" and char == 'ı': # print(line) if not language in chars: chars[language] = set(()); if ord(char) >= 128: # Add only non-ASCII characters chars[language].add(char) if write_to_file: for language in sorted(chars): file = open("build/needed-characters/" + language + ".utf", "w") file.write(''.join(sorted(chars[language]))) file.close() if dump_to_console: for language in sorted(chars): print(f"Language: {language}") print(f"Characters: {''.join(sorted(chars[language]))}") Language: C Characters: Language: am Characters: Åáãçéíôüሀሁሂሃሄህሆለሉሊላሌልሎሏሐሑሒሓሔሕመሙሚማሜምሞሟሠሡሢሣሤሥረሩሪራሬርሮሯሰሱሲሳሴስሶሷሸሹሺሻሼሽሾሿቀቁቂቃቄቅቆቋበቡቢባቤብቦቧቨቪቫቬቮተቱቲታቴትቶቷቸቹቺቻቼችቾኁኂኄኅኋነኑኒናኔንኖኗኘኙኚኛኝኞአኡኢኣኤእኦከኩኪካኬክኮኳኸኻኽወዉዊዋዌውዎዐዑዓዕዘዙዚዛዜዝዞዟዡዢዥየዩያይዮደዱዲዳዴድዶጀጁጂጃጄጅጆገጉጊጋጌግጎጐጒጓጔጕጘጠጡጢጣጤጥጦጧጨጪጫጭጮጵጸጹጻጽፁፃፄፅፈፉፊፋፌፍፎፐፑፒፓፔፕፖፘፙ፠፡።፣፤፥፦ Language: ar Characters: ·ü،؛؟ءآأؤإئابةتثجحخدذرزسشصضطظعغـفقكلمنهوىيًٌٍَُِّْ٣٥…ﻷ Language: ast Characters: ¡«º»¿ÁÅÉÍÚáãçéíñóôúüḤḥ… Language: be Characters: «»ЁІЎАБВГДЕЖЗКЛМНОПРСТУФХЦЧШЫЬЭЮЯабвгдежзийклмнопрстуфхцчшыьэюяёіў— Language: bg Characters: ü̆АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЮЯабвгдежзийклмнопрстуфхцчшщъьюяѝ–“„…№ Language: bn Characters: ü।ঁংঃঅআইউএঐওকখগঘঙচছজঝঞটঠডঢণতথদধনপফবভমযরলশষসহ়ািীুূৃেৈোৌ্ৎড়ঢ়য়০১২৩৪৫৬৮৯… Language: bo Characters: Åçéôü་།༢༣༤ཀཁགངཅཆཇཉཏཐདནཔཕབམཙཚཛཝཞཟའཡརལཤསཧཨིེོུྐྒྔྗྙྟྡྣྤྥྦྨྩྫྭྱྲླྷ“”、。盘键,: Language: bs Characters: ÅçéíôüĆćČč𩹮ž Language: ca Characters: «·»ÀÉÍÒÚàãçèéíïòóúü— Language: cs Characters: ÁÅÉÍáãçéíóôúüýČčďĚěňŘřŠšťŮůŽž“„ Language: cy Characters: ÂÅÔáâãçéêëíîïôüŵ Language: da Characters: «»ÅÆØáãåæçéíôøü Language: de Characters: «»ÄÅÖÜßáãäåçéíôöü… Language: dz Characters: çü་།༠༡༢༣༤༥༦༨༩༼༽ཀཁགངཅཆཇཉཊཌཏཐདནཔཕབམཙཚཛཝཞཟའཡརལཤསཧཨཪ྄ཱིེོུྐྒྔྕྗྙྟྡྣྤྦྩྫྭྱྲླྷ Language: el Characters: «·»áãíôǘΆΈΉΊΌΏΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΩάέήίαβγδεζηθικλμνξοπρςστυφχψωϊϋόύώ Language: en Characters:
Bug#1017435: debian-installer: georgian text mode fails to render all characters
Roland Clobus, le ven. 06 janv. 2023 13:38:34 +0100, a ecrit: > On 01/01/2023 20:49, Holger Wansing wrote: > > Samuel Thibault wrote (Sun, 1 Jan 2023 20:14:36 > > +0100): > > > Hello, > > > > > > Holger Wansing, le mar. 16 août 2022 22:59:34 +0200, a ecrit: > > > > Philip Hands wrote (Tue, 16 Aug 2022 11:22:30 +0200): > > > > > openQA just noticed that the rendering of certain characters just > > > > > changed, > > > > > highlighting the fact that the rendering was already broken. > ... > > > The solution is simply to add the required characters in > > > debian-installer/build/needed-characters/ka.utf: > > > So, we need a Georgian translator, providing us a file with all non-ascii > > characters needed for the Georgian language. > > > > Can anyone help us, please? > > With the attached script you can generate a list of all characters that are > used in the translations. That can be used to determine the minimal set of > required characters. We already do that in the debian installer, but that is not enough to be reasonably sure that this covers all questions that might happen during installation, since questions could be asked by any udeb. That's why we rather request for a an explicit file from actual translators. Samuel
Bug#1017435: debian-installer: georgian text mode fails to render all characters
On 01/01/2023 20:49, Holger Wansing wrote: Hi, Samuel Thibault wrote (Sun, 1 Jan 2023 20:14:36 +0100): Hello, Holger Wansing, le mar. 16 août 2022 22:59:34 +0200, a ecrit: Philip Hands wrote (Tue, 16 Aug 2022 11:22:30 +0200): openQA just noticed that the rendering of certain characters just changed, highlighting the fact that the rendering was already broken. ... The solution is simply to add the required characters in debian-installer/build/needed-characters/ka.utf: So, we need a Georgian translator, providing us a file with all non-ascii characters needed for the Georgian language. Can anyone help us, please? With the attached script you can generate a list of all characters that are used in the translations. That can be used to determine the minimal set of required characters. With kind regards, Roland Clobus import re # Generate a list of all characters that are used in translations # # Generate the file templates.dat as follows: # 1) Boot an image with the debian-installer # 2) Proceed until the end, but do not 'Finish' yet # 3) Open a console # 4) cp /var/lib/cdebconf/templates.dat /target # 5) chroot /target # 6) scp templates.dat some_u...@example.com:/path_of_git_workdirectory_of_debian-installer # # Run this script: # 1) cd path_of_git_workdirectory_of_debian-installer # 2) python3 chars.py # # Carefully evaluate the proposed modifications in build/needed-characters write_to_file = True dump_to_console = True file = open("templates.dat", "r") content = file.read() file.close() lines = content.split("\n") active_language = "C" chars = dict(); for line in lines: # Sample: # Description-am.UTF-8: የሚጫኑ የተካይ አካሎች፦ match = re.split("\w+-(\w+).UTF-8: (.*)", line) if (len(match) > 2): # A translated text language = match[1] translation = match[2] for char in translation: if not language in chars: chars[language] = set(()); if ord(char) >= 128: # Add only non-ASCII characters chars[language].add(char) if write_to_file: for language in sorted(chars): file = open("build/needed-characters/" + language + ".utf", "w") file.write(''.join(sorted(chars[language]))) file.close() if dump_to_console: for language in sorted(chars): print(f"Language: {language}") print(f"Characters: {''.join(sorted(chars[language]))}") OpenPGP_signature Description: OpenPGP digital signature
Bug#1017435: debian-installer: georgian text mode fails to render all characters
Hi, Samuel Thibault wrote (Sun, 1 Jan 2023 20:14:36 +0100): > Hello, > > Holger Wansing, le mar. 16 août 2022 22:59:34 +0200, a ecrit: > > Philip Hands wrote (Tue, 16 Aug 2022 11:22:30 +0200): > > > openQA just noticed that the rendering of certain characters just changed, > > > highlighting the fact that the rendering was already broken. > > > > To be more precise here: > > - this only affects the text installer; rendering in graphical installer > > seems fine; > > - Georgian started to be available for the text installer in debian 10, and > > some > > research showed, that this issue is there from that beginning (debian 10). > > > > Since noone noticed this for years, maybe we can go with a workaround to > > disable Georgian for the text installer (if noone finds the real solution > > for this). > > The solution is simply to add the required characters in > debian-installer/build/needed-characters/ka.utf: > > https://d-i.debian.org/doc/i18n-guide/ch03s06.html > > I'm surprised that this was missed. Was > > https://d-i.debian.org/doc/i18n-guide/ch03.html > > not followed? Apparently not :-( So, we need a Georgian translator, providing us a file with all non-ascii characters needed for the Georgian language. Can anyone help us, please? Thanks Holger -- Holger Wansing PGP-Fingerprint: 496A C6E8 1442 4B34 8508 3529 59F1 87CA 156E B076
Bug#1017435: debian-installer: georgian text mode fails to render all characters
Hello, Holger Wansing, le mar. 16 août 2022 22:59:34 +0200, a ecrit: > Philip Hands wrote (Tue, 16 Aug 2022 11:22:30 +0200): > > openQA just noticed that the rendering of certain characters just changed, > > highlighting the fact that the rendering was already broken. > > To be more precise here: > - this only affects the text installer; rendering in graphical installer > seems fine; > - Georgian started to be available for the text installer in debian 10, and > some > research showed, that this issue is there from that beginning (debian 10). > > Since noone noticed this for years, maybe we can go with a workaround to > disable Georgian for the text installer (if noone finds the real solution > for this). The solution is simply to add the required characters in debian-installer/build/needed-characters/ka.utf: https://d-i.debian.org/doc/i18n-guide/ch03s06.html I'm surprised that this was missed. Was https://d-i.debian.org/doc/i18n-guide/ch03.html not followed? Samuel
Bug#1017435: debian-installer: georgian text mode fails to render all characters
Hi, Holger Wansing wrote (Tue, 16 Aug 2022 22:59:34 +0200): > > Philip Hands wrote (Tue, 16 Aug 2022 11:22:30 +0200): > > openQA just noticed that the rendering of certain characters just changed, > > highlighting the fact that the rendering was already broken. > > > To be more precise here: > - this only affects the text installer; rendering in graphical installer > seems fine; > - Georgian started to be available for the text installer in debian 10, and > some > research showed, that this issue is there from that beginning (debian 10). > > Since noone noticed this for years, maybe we can go with a workaround to > disable Georgian for the text installer (if noone finds the real solution > for this). As a workaround, I have now disabled Georgian in the text installer for now: https://salsa.debian.org/installer-team/localechooser/-/commit/6391b845540667d649a5ee118410867f11b0ce81 Holger -- Holger Wansing PGP-Fingerprint: 496A C6E8 1442 4B34 8508 3529 59F1 87CA 156E B076
Bug#1017435: debian-installer: georgian text mode fails to render all characters
Package: debian-installer Severity: normal openQA just noticed that the rendering of certain characters just changed, highlighting the fact that the rendering was already broken. For example the prompt for the hostname: msgid "Please enter the hostname for this system." msgstr "გთხოვთ შეიყვანოთ კომპიუტერის სახელი." gets displayed as something like: გთ_ოვთ შეი_ვანოთ კომპიუტერის სა_ელი." so it looks like it's failing to deal with at least these characeters: https://www.compart.com/en/unicode/U+10EE and https://www.compart.com/en/unicode/U+10E7 In the past the _'s would have appeared as blanks, whereas now they are inverted and show as blue squares (which is an improvement, since this makes it obvious that something's wrong, and also caused openQA to notice the change) The old look can be seen here: https://openqa.debian.net/tests/69846#step/hostname/1 The new look here: https://openqa.debian.net/tests/69954#step/hostname/2 and I'll attach screenshots of the same (for when the links rot) Cheers, Phil.