[issue35628] Allow lazy loading of translations in gettext.

2018-12-31 Thread s-ball


New submission from s-ball :

When working on i18n, I realized that msgfmt.py did not generate any hash 
table.  One step further, I realized that the gettext.py would not have used it 
because it unconditionnaly loads the whole translation files and contains the 
following TODO message: 

TODO:
- Lazy loading of .mo files.  Currently the entire catalog is loaded into
memory, but that's probably bad for large translated programs.  Instead,
the lexical sort of original strings in GNU .mo files should be exploited
to do binary searches and lazy initializations.  Or you might want to use
the undocumented double-hash algorithm for .mo files with hash tables, but
you'll need to study the GNU gettext code to do this.

I have studied the code, and found that it should not be too complex to 
implement it in pure Python. I have posted a message on python-ideas about it 
and here are my conclusion:

Features:

The gettext module should be allowed to load lazily the catalogs from mo 
file. This lazy load should be optional and make use of the hash tables 
from mo files when they are present or revert to a binary search. The 
translation strings should be cached for better performances.

API changes:

3 functions from the gettext module will have 2 new optional parameter 
named caching, and keepopen:

gettext.bindtextdomain(domain, localedir=None) would become
gettext.bindtextdomain(domain, localedir=None, caching=None, keepopen=False)

gettext.translation(domain, localedir=None, languages=None, class_=None, 
fallback=False, codeset=None) would become
gettext.translation(domain, localedir=None, languages=None, class_=None, 
fallback=False, codeset=None, caching=None, keepopen=False)

gettext.install(domain, localedir=None, codeset=None, names=None) would 
become
gettext.install(domain, localedir=None, codeset=None, names=None, 
caching=None, keepopen=False)

The new caching parameter could receive the following values:
caching=None: revert to the previour eager loading of the full catalog. 
It will be the default to allow previous application to see no change
caching=1: lazy loading with unlimited cache
caching=n where n is a positive (>=0) integer value: lazy loading with a 
LRU cache limited to n strings

The keepopen parameter would be a boolean:
keepopen=False (default): the mo file is only opened before loading a 
translation string and closed immediately after - it is also opened once 
when the GNUTranslation class is initialized to load the file description
keepopen=True: the mo file is kept open during the lifetime of the 
GNUTranslation object.
This parameter is ignored and not used if caching is None

Implementation:
==
The current GNUTranslation class loads the content of the mo file to 
build a dictionnary where the original strings are the keys and the 
translated keys the values. Plural forms use a special processing: the 
key is a 2 tuple (singular original string, order), and the value is the 
corresponding translated string - order=0 is normally for the singular 
translated string.

The proposed implementation would simply replace this dictionary with a 
special mapping subclass when caching is not None. That subclass would 
use same keys as the original directory and would:
- first search in its cache
- if not found in cache and if the hashtable has not a zero size search 
the original string by hash
- if not found in cache and if the hashtable has a zero size, search the 
original string with a binary search algorithm.
- if a string is found, it should feed the LRU cache, eventually 
throwing away the oldest entry (entries)

That should allow to implement the new feature with minimal refactoring 
for the gettext module.

But I also propose to change msgfmt.py to build the hashtable. IMHO, the 
function should lie in the standard library probably as a submodule of gettext 
to allow various Python projects (pybabel, django) to directly use it instead 
of developping their own ones.

I will probably submit a PR in a while but it will will require some time to 
propose a full implementation with a correct test coverage.

--
components: Library (Lib)
messages: 332815
nosy: s-ball
priority: normal
severity: normal
status: open
title: Allow lazy loading of translations in gettext.
type: enhancement
versions: Python 3.8

___
Python tracker 
<https://bugs.python.org/issue35628>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35335] msgfmt should be able to merge more than one po file

2018-12-03 Thread s-ball


s-ball  added the comment:

Currently my main use case is to be able to compile one or more po file(s) from 
a Python script, so I just need to be able to repeatedly call make from that 
script - which was broken per issue 9741

To be honest, I must acknowledge that I initially thought that compiling more 
than one po file was a common use case, and I only later realized that it was 
not. But as it was already (partially) allowed by msgfmt.py, I have just fixed 
the problems and added tests for it.

BTW, I am also the author of last commit, but I have written it on a box where 
I had forgotten to correctly initialize git

--

___
Python tracker 
<https://bugs.python.org/issue35335>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9741] msgfmt.py generates invalid mo because msgfmt.make() does not clear dictionary

2018-12-03 Thread s-ball


Change by s-ball :


--
pull_requests: +10113
stage: test needed -> patch review

___
Python tracker 
<https://bugs.python.org/issue9741>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35335] msgfmt should be able to merge more than one po file

2018-12-03 Thread s-ball


Change by s-ball :


--
keywords: +patch
pull_requests: +10112
stage:  -> patch review

___
Python tracker 
<https://bugs.python.org/issue35335>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35335] msgfmt should be able to merge more than one po file

2018-12-02 Thread s-ball


s-ball  added the comment:

After some more thinking about it, my opinion is that the proposed path for 
issue 9741 does not address at all my requirements. So I will try to propose a 
pull request addressing both issues here.

--

___
Python tracker 
<https://bugs.python.org/issue35335>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35335] msgfmt should be able to merge more than one po file

2018-12-01 Thread s-ball


s-ball  added the comment:

I have followed SilentGhost's advice and begun by thoroughly testing the 
current behaviour. I then realized that I was wrong, and that (probably by 
chance) msgfmt.py partially followed my requirements, but pybabel did not and 
GNU gettext msgfmt did not really. I wrote 2 tiny po files (joined) and played 
with them, meaning I tried to combine them with pybabel, msgfmt.py and GNU 
gettext msg. Current behaviour (Windows shell syntax):

> pybabel compile -o .\file12-fr.mo -l fr -i file1-fr.po -i file2-fr.po

only uses second file (file2-fr.po)

> msgfmt -o file12-fr.mo --no-hash file1-fr.po file2-fr.po

chokes on a repeated key on file2 (the header has "" for key...). It works fine 
anyway after commenting out the header in any of the files

> python "path\to\Tools\i18n\msgfmt.py" -o file12py-fr.mo file1-fr.po 
> file2-fr.po

unexpectedly produces the expected result and successfully combines both po 
files into one single mo file

BUT:

> python "path\to\Tools\i18n\msgfmt.py" file1-fr.po file2-fr.po

Produces file1-fr.mo which is the compiled version of file1-fr.po and 
file2-fr.mo which combines both input files. Definitely not an expected result!

This is caused by the problem identified in issue 9741 
(https://bugs.python.org/issue9741)

My initial goal was to be able to use the make function from msgfmt.py in an 
external script. I then realize that combining multiple po files is not a good 
idea because the resulting mo file can only contain one single header and the 
best behaviour is GNU gettext msgfmt one.

I now wonder whether this issue should not be closed because the requirement is 
not relevant, and it would probably better to propose a fix (including tests 
and code improvement) for issue 9741.

--
Added file: https://bugs.python.org/file47966/files.zip

___
Python tracker 
<https://bugs.python.org/issue35335>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35335] msgfmt should be able to merge more than one po file

2018-11-29 Thread s-ball


s-ball  added the comment:

Ok, I have created a fork, and started coding on a local branch. But it will 
take some time, because I assume that I am supposed to write tests for the 
msgfmt module...

--

___
Python tracker 
<https://bugs.python.org/issue35335>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35335] msgfmt should be able to merge more than one po file

2018-11-27 Thread s-ball


New submission from s-ball :

GNU gettext msgfmt can merge several po files into one single mo file.
The current version of msgfmt can only compile one po file to one mo file.

After looking at the code, the enhancement should be simple to implement.

Command line: if one output file is given (option -o) and several input files, 
then all the input files should be combined.

Implementation:
- main should pass all the parameters to make (*args)
- make should accept one single string for compatibility or an iterable of 
string. In that latter case, the current processing should be repeated on all 
input files.

I could propose a patch (but I am afraid it ends being rather large) or a pull 
request. As a new user here, I do not know what is the best way...

--
components: Demos and Tools
messages: 330575
nosy: s-ball
priority: normal
severity: normal
status: open
title: msgfmt should be able to merge more than one po file
type: enhancement
versions: Python 2.7, Python 3.4, Python 3.5, Python 3.6, Python 3.7, Python 3.8

___
Python tracker 
<https://bugs.python.org/issue35335>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com