[issue35628] Allow lazy loading of translations in gettext.
New submission from s-ball : When working on i18n, I realized that msgfmt.py did not generate any hash table. One step further, I realized that the gettext.py would not have used it because it unconditionnaly loads the whole translation files and contains the following TODO message: TODO: - Lazy loading of .mo files. Currently the entire catalog is loaded into memory, but that's probably bad for large translated programs. Instead, the lexical sort of original strings in GNU .mo files should be exploited to do binary searches and lazy initializations. Or you might want to use the undocumented double-hash algorithm for .mo files with hash tables, but you'll need to study the GNU gettext code to do this. I have studied the code, and found that it should not be too complex to implement it in pure Python. I have posted a message on python-ideas about it and here are my conclusion: Features: The gettext module should be allowed to load lazily the catalogs from mo file. This lazy load should be optional and make use of the hash tables from mo files when they are present or revert to a binary search. The translation strings should be cached for better performances. API changes: 3 functions from the gettext module will have 2 new optional parameter named caching, and keepopen: gettext.bindtextdomain(domain, localedir=None) would become gettext.bindtextdomain(domain, localedir=None, caching=None, keepopen=False) gettext.translation(domain, localedir=None, languages=None, class_=None, fallback=False, codeset=None) would become gettext.translation(domain, localedir=None, languages=None, class_=None, fallback=False, codeset=None, caching=None, keepopen=False) gettext.install(domain, localedir=None, codeset=None, names=None) would become gettext.install(domain, localedir=None, codeset=None, names=None, caching=None, keepopen=False) The new caching parameter could receive the following values: caching=None: revert to the previour eager loading of the full catalog. It will be the default to allow previous application to see no change caching=1: lazy loading with unlimited cache caching=n where n is a positive (>=0) integer value: lazy loading with a LRU cache limited to n strings The keepopen parameter would be a boolean: keepopen=False (default): the mo file is only opened before loading a translation string and closed immediately after - it is also opened once when the GNUTranslation class is initialized to load the file description keepopen=True: the mo file is kept open during the lifetime of the GNUTranslation object. This parameter is ignored and not used if caching is None Implementation: == The current GNUTranslation class loads the content of the mo file to build a dictionnary where the original strings are the keys and the translated keys the values. Plural forms use a special processing: the key is a 2 tuple (singular original string, order), and the value is the corresponding translated string - order=0 is normally for the singular translated string. The proposed implementation would simply replace this dictionary with a special mapping subclass when caching is not None. That subclass would use same keys as the original directory and would: - first search in its cache - if not found in cache and if the hashtable has not a zero size search the original string by hash - if not found in cache and if the hashtable has a zero size, search the original string with a binary search algorithm. - if a string is found, it should feed the LRU cache, eventually throwing away the oldest entry (entries) That should allow to implement the new feature with minimal refactoring for the gettext module. But I also propose to change msgfmt.py to build the hashtable. IMHO, the function should lie in the standard library probably as a submodule of gettext to allow various Python projects (pybabel, django) to directly use it instead of developping their own ones. I will probably submit a PR in a while but it will will require some time to propose a full implementation with a correct test coverage. -- components: Library (Lib) messages: 332815 nosy: s-ball priority: normal severity: normal status: open title: Allow lazy loading of translations in gettext. type: enhancement versions: Python 3.8 ___ Python tracker <https://bugs.python.org/issue35628> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35335] msgfmt should be able to merge more than one po file
s-ball added the comment: Currently my main use case is to be able to compile one or more po file(s) from a Python script, so I just need to be able to repeatedly call make from that script - which was broken per issue 9741 To be honest, I must acknowledge that I initially thought that compiling more than one po file was a common use case, and I only later realized that it was not. But as it was already (partially) allowed by msgfmt.py, I have just fixed the problems and added tests for it. BTW, I am also the author of last commit, but I have written it on a box where I had forgotten to correctly initialize git -- ___ Python tracker <https://bugs.python.org/issue35335> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9741] msgfmt.py generates invalid mo because msgfmt.make() does not clear dictionary
Change by s-ball : -- pull_requests: +10113 stage: test needed -> patch review ___ Python tracker <https://bugs.python.org/issue9741> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35335] msgfmt should be able to merge more than one po file
Change by s-ball : -- keywords: +patch pull_requests: +10112 stage: -> patch review ___ Python tracker <https://bugs.python.org/issue35335> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35335] msgfmt should be able to merge more than one po file
s-ball added the comment: After some more thinking about it, my opinion is that the proposed path for issue 9741 does not address at all my requirements. So I will try to propose a pull request addressing both issues here. -- ___ Python tracker <https://bugs.python.org/issue35335> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35335] msgfmt should be able to merge more than one po file
s-ball added the comment: I have followed SilentGhost's advice and begun by thoroughly testing the current behaviour. I then realized that I was wrong, and that (probably by chance) msgfmt.py partially followed my requirements, but pybabel did not and GNU gettext msgfmt did not really. I wrote 2 tiny po files (joined) and played with them, meaning I tried to combine them with pybabel, msgfmt.py and GNU gettext msg. Current behaviour (Windows shell syntax): > pybabel compile -o .\file12-fr.mo -l fr -i file1-fr.po -i file2-fr.po only uses second file (file2-fr.po) > msgfmt -o file12-fr.mo --no-hash file1-fr.po file2-fr.po chokes on a repeated key on file2 (the header has "" for key...). It works fine anyway after commenting out the header in any of the files > python "path\to\Tools\i18n\msgfmt.py" -o file12py-fr.mo file1-fr.po > file2-fr.po unexpectedly produces the expected result and successfully combines both po files into one single mo file BUT: > python "path\to\Tools\i18n\msgfmt.py" file1-fr.po file2-fr.po Produces file1-fr.mo which is the compiled version of file1-fr.po and file2-fr.mo which combines both input files. Definitely not an expected result! This is caused by the problem identified in issue 9741 (https://bugs.python.org/issue9741) My initial goal was to be able to use the make function from msgfmt.py in an external script. I then realize that combining multiple po files is not a good idea because the resulting mo file can only contain one single header and the best behaviour is GNU gettext msgfmt one. I now wonder whether this issue should not be closed because the requirement is not relevant, and it would probably better to propose a fix (including tests and code improvement) for issue 9741. -- Added file: https://bugs.python.org/file47966/files.zip ___ Python tracker <https://bugs.python.org/issue35335> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35335] msgfmt should be able to merge more than one po file
s-ball added the comment: Ok, I have created a fork, and started coding on a local branch. But it will take some time, because I assume that I am supposed to write tests for the msgfmt module... -- ___ Python tracker <https://bugs.python.org/issue35335> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35335] msgfmt should be able to merge more than one po file
New submission from s-ball : GNU gettext msgfmt can merge several po files into one single mo file. The current version of msgfmt can only compile one po file to one mo file. After looking at the code, the enhancement should be simple to implement. Command line: if one output file is given (option -o) and several input files, then all the input files should be combined. Implementation: - main should pass all the parameters to make (*args) - make should accept one single string for compatibility or an iterable of string. In that latter case, the current processing should be repeated on all input files. I could propose a patch (but I am afraid it ends being rather large) or a pull request. As a new user here, I do not know what is the best way... -- components: Demos and Tools messages: 330575 nosy: s-ball priority: normal severity: normal status: open title: msgfmt should be able to merge more than one po file type: enhancement versions: Python 2.7, Python 3.4, Python 3.5, Python 3.6, Python 3.7, Python 3.8 ___ Python tracker <https://bugs.python.org/issue35335> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com