The recent removal of the extension prereg mechanism revealed a problem with how we select which dictionaries (which come in the form of bundled extensions) are included in a given installation.

At least with the "official" (<>) Linux and Mac OS X installation sets, the base installation set contains en-US localization and only contains dictionaries "related" to that locale (dict-en, dict-es, dict-fr; see below for details of what "related" means). The additional per-language langpacks contain dictionaries "related" to the given langpack (e.g., langpack_de contains dict-de).

However, on Windows, the base installation set contains all available localizations and all available dictionaries. During msi installation, some code apparently determines a default selection of only a subset of the "Additional user interface languages" entries (presumably based on the current system locale settings), but all of the available "Optional Components - Dictionaries" entries are selected by default. This now causes per-user generation of data about all those bundled dictionary extensions at per-user first-start of LO, leading to noticeable time and space requirements (see <> "Large UserInstallation's user/extensions/bundled/ tree").

Hence, one suggestion to address that problem would be to reduce the amount of "Optional Components - Dictionaries" entries selected by default during Windows msi installation, similar to how a certain combination of base installation set plus langpack(s) on the other platforms also only installs a subset of all the available dictionaries. (That is, the code that apparently now determines a default selection of "Additional user interface languages" entries would need to be extended to also determine a default selection of "related" "Optional Components - Dictionaries" entries.)

Initial reactions on IRC (see below) were that (a) the status quo on Windows was to avoid "political issues" (though that would be inconsistent with the status quo on the other platforms), and (b) to rethink having dictionaries as bundled extensions (though I would prefer to keep things simple, solving the problem by harmonizing behavior across platforms now and leaving anything more ambitious for the future).

Any further thoughts?


PS1: The way dictionaries "related" to a given locale are determined appears to be the the list at setup_native/source/packinfo/spellchecker_selection.txt. That's why the en-US base installation set for Linux and Mac OS X contains dict-en, dict-es, and dict-fr, for example. However, an apparent inconsistency is that langpack_de only contains dict-de, and not also dict-fr and dict-it, as that list would suggest.

PS2: At least the Mac OS X LO 3.6.1 en-US base installation set contains share/extension/dict-* directories for all available dictionaries, not just dict-en, dict-es, dict-fr, but the additional ones are effectively empty and their existence is a bug.

PS3: For the record, the relevant log of yesterday's #libreofifice-dev:

Aug 29 12:50:57 <sberg> timar, do you know anything about our msi by default installing all 
"Optional Components - Dictionaries" entries, but only selected (at installation time, I 
presume?) "Additional user interface languages"?
Aug 29 12:51:59 <timar> sberg: yes, we always install all dictionaries on Windows in 
order to avoid "political issues"
Aug 29 12:52:26 <tml_> is this the old "omg, I waste SEVERAL MEGABYTES on 
dictionaries for languages I don't even like" discussion?
Aug 29 12:53:41 <sberg> timar, but that causes one part of the problems of 
fdo#53009, so I had hoped we could fix that
Aug 29 12:53:44 <IZBot> LibreOffice-Libreoffice normal/medium ASSIGNED Large 
UserInstallation's user/extensions/bundled/ tree
Aug 29 12:54:41 <tml_> wouldn't the best solution then be to stop treating these as 
Aug 29 12:55:12 <tml_> don't we have too much optionality in the installer 
Aug 29 12:55:40 <tml_> hmm, those are orthogonal issues, sorry
Aug 29 12:58:36 <timar> sberg: what is your suggestion?
Aug 29 13:02:55 <sberg> timar, assuming that there is code in our msi to default-enable some subset X of 
"Additional user interface languages" entries: extend that code to also default-enable only a 
"matching" subset of "Optional Components - Dictionaries" entries
Aug 29 13:03:44 <tml_> that assumes people would prefer to use software 
(including the OS) in the same language as they write/edit documents it. not true
Aug 29 13:03:46 <sberg> ...for some suitable definition of "matching"
Aug 29 13:05:01 <timar> sberg: tml_ there is
 that we still use for creating Linux langpacks IMHO (not sure)
Aug 29 13:05:11 <sberg> tml_, no, but it might be a better approximation to typical 
users' needs than the current "install everything" approach (after all, users /can/ 
install additional dics -- its only about the defaults)
Aug 29 13:06:45 <sberg> timar, yes, that list I had on my mind
Aug 29 13:06:56 <tml_> sberg: one person's good approximation is another 
person's grave insult to the XXX people ;)
Aug 29 13:07:26 <sberg> tml_, we already use that approximation on other 
Aug 29 13:07:45 <tml_> so that is broken, then? ;)
Aug 29 13:09:16 <sberg> tml_, do you have a better suggestion?
Aug 29 13:10:01 <tml_> sberg: is that there are lots of *extensions* that is 
causing problems, or lots of *dictionaries* ?
Aug 29 13:11:03 <tml_> or, wait, am I smoking crack with this talk about 
Aug 29 13:11:25 <tml_> (I somehow had the impression that many dictionaires are 
technically packaged as "extensions", are they?)
Aug 29 13:11:51 <timar> tml_: dictionaries are extensions
Aug 29 13:12:15 <sberg> tml_, dictionaries come as bundled extensions, and 
every bundled extension increases the per-user space reqs and per-user--first-start 
time reqs (though some do more than others)
Aug 29 13:12:20 <tml_> ok, so then the question above to sberg still holds
Aug 29 13:12:52 <tml_> sberg: ok, so wouldn't the solution then be to stop 
packaging dictionaries as extensions? or do they *have* to be such for some obscure 
technical reason?
Aug 29 13:13:05 <tml_> I mean, they could still be optional in the installer 
even if they weren't extensions
Aug 29 13:13:29 <tml_> just like lots of other things are optional but aren't 
Aug 29 13:16:28 <sberg> tml_, I think the origin of having dicts as exts is so 
that (a) people can install additional ones (OOo traditionally did not come with such 
a large number of bundled dicts as LO does at least on Windows, IIUC), and (b) people 
can update dicts independently from updating the app itself (as the dicts were 
traditionally provided by 3rd parties, IIUC)
Aug 29 13:17:38 <tml_> but having the bundled ones not be extensions wouldn't 
stop (a), and (b) is made unnecessary by our time-based frequent releases
Aug 29 13:22:54 <sberg> tml_, I'm not arguing that having dicts as exts is 
necessarily good; what I'm not sure about is whether turning a given dict from ext to 
non-ext could cause technical problems, if a user installed an ext variant of that 
dict into a LO that contains that dict as non-ext
Aug 29 13:24:24 <tml_> that is something to check (and fix) then, if the 
bundled dictionaries would not be extensions any more
Aug 29 13:24:31 <sberg> maybe makes sense to put this on the ESC agenda
Aug 29 13:27:11 <caolan> some of the code for the old pre-extension mechanism 
for dictionaries still exists in lingucomponent/source/lingutil/lingutil.cxx now used 
for the system dictionary case
Aug 29 13:27:30 <caolan> its *supposed* to prefer extensions IIRC over system 
Aug 29 13:27:41 <caolan> *shrug*
Aug 29 13:28:43 <caolan> the removed pre-extension code had a dictionary.lst in 
some dir or other that listed the dicts and languages they were for
Aug 29 13:29:47 <caolan> but that was back in pre language tool days, not sure 
if that makes some of our bundled dicts no longer just simple hunspell/hyphen/mythes 
Aug 29 13:30:10 <tml_> sberg: but anyway, I am not opposed to making the 
installer by default select only a (somewhat arbitrary) subset of dictionaries to 
install, if that fixes a problem for most people
Aug 29 13:30:37 <tml_> and even if I was opposed, that could be ignored;)
Aug 29 13:32:23 <caolan> throw the net wide enough, dict for langpack + top X 
languages always installed + langs also in use in territory + Y neighbouring langs :-)
Aug 29 13:36:46 <tml_> caolan: but isn't it so that exactly selecting "neighbouring 
langs" (but not langs from some country a few borders away) can cause immense irritation. "why 
would we proud Freedonians want to write in the language of those dogs of Elbonia. what we need is the 
language of our beloved friends from Bulvania"
Aug 29 13:37:36 <tml_> but whatever
Aug 29 13:40:20 <caolan> including Russian in a shortlist of dicts for the 
Latvian langpack is a potential contender for that problem
Aug 29 13:41:58 <tml_> which is why when including *all* one can always say "we 
don't make any judgements"
Aug 29 13:42:29 <caolan> Bosnian/Serbian/Croatian, *shudder*
Aug 29 13:45:12 <tml_> caolan: Serbian/Albanian/Russian was the real-world example I had 
in mind. even if Albanian seems to be a "recognized minority language" in Serbia, so 
at least officially they couldn't oppose it that heavily
Aug 29 13:46:33 <tml_> caolan: and what do I know, maybe I am too pessimistic, 
and only a very small minority of people would take stuff like this so seriously
Aug 29 13:46:43 <tml_> caolan: after all, it isn't *maps* ;)
Aug 29 13:47:34 <caolan> tml_: RH has a utility to search for possible maps in 
software packages :-)
LibreOffice mailing list

Reply via email to