Re: Subject: LyX 2.0beta3: Spell Checking + Multilingualism

Stephan Witt Thu, 27 Jan 2011 22:50:56 -0800

Am 27.01.2011 um 14:05 schrieb Walter:

>>> Whilst using LyX 2.0beta1 [since verified on LyX 2.0beta3] I recently ran
>>> a spell check for the first time.
>>> 
>>> The interface is good and no doubt an improvement on previous eras, however
>>> the following struck me as possible to improve.
>>> 
>>> Those items marked with "[*]" I consider a bug in LyX. Those items marked
>>> with "[X]" I consider a bug elsewhere.
>>> 
>>> 
>>> 1. Preferences|Language Settings|Spellchecker [*]
>>>   ----------------------------------------------
>>>   Fields lack a description.  Faced with having used non-US spelling
>>>   in my document ("for shame!"), I do not want to manually set hundreds
>>>   of individual words to be 'English (UK)', which using the inbuilt right
>>>   sidebar interface appears to be the default way forward.  (For some
>>>   reason, 'English (AU)' is not even an option on my system, though that's
>>>   probably my fault.)
>> 
>> The following refers to the field "Alternative language" I'd guess.
> 
> Correct!
> 
>>>   Thus driven to the preferences dialog, I was unsure of which mystical
>>>   value to enter in to the great LyX machine.  Assuming 'man aspell' would
>>>   clear it up, indeed some text was located that made the expected format
>>>   for the entry of a single language value probable:
>>> 
>>>    "It follows the same format of the  LANG  environmental variable on
>>>     most systems. It consists of the two letter ISO 639 language code and
>>>     an optional two letter ISO 3166 country code after a dash or 
>>> underscore."
>>> 
>>>   I tried this ("en_AU"), and it did work.  However, there are two problems:
>>>    - Even the first step would be a challenge for some users
>>>    - I would like to add multiple values to the field, since otherwise even
>>>      at this early stage of my document still hundreds of words and place
>>>      names in French, German, Greek (+romanised Greek), Chinese (+romanised
>>>      Chinese), etc. trip up the spell checker. (Use of these languages
>>>      is frequent and scattered right throughout the document.)
>> 
>> Ok, with this use case - mixed language documents - you are requested to
>> mark the text appropriately. Here LyX does no guessing and there are no
>> plans to change that.
> 
> On the contrary, dear fellow: the opposite has already come to pass!
> Demand hath begat a plan!  One man, I understand, begat that plan: a
> module fan, Michiel Kamermans!
> http://www.mail-archive.com/lyx-users@lists.lyx.org/msg83713.html


Ok, AFAIU this refers to the XeLaTeX engine and not to LyX.
Of course, if someone wants to develop a solid algorithm for language
guessing and can convince the LyX developer community of it and has the
resources to implement and test it - it may happen. Another option
would be to have a spell checker backend including this feature.

> <http://www.ctan.org/tex-archive/macros/xetex/latex/fontwrap/>
> "So that fonts are *autoselected* based on UNICODE range unless
> otherwise overridden."
> 
> But alas, the user is still utterly laboured with tedious repetition
> of language specification (also text style selection, with the hack i
> use), and will remain so until LyX UI changes.

Do you have an example for such a document?

> 
>> But it can be tricky to make it right. It heavily depends on the spell 
>> checker -
>> aspell e. g. accepts completely different "alternative" language settings as
>> hunspell or apples spell checker do. And it depends on the 
>> runtime-environment -
>> what dictionaries are available for the user on the current machine.
>> And we have the feature to switch between the spell checker back ends at 
>> runtime.
> 
> This sounds ugly.  Is there any similarity between spell checking APIs?  Is
> there a cross platform, spell checking library unification / abstraction layer
> available? Would it be worth developing one? How difficult is it to detect
> known dictionaries and spell checkers on a cross-platform basis?

I'll cite my own investigation about similarity between spell checking APIs.
The focus was the management of personal word lists.

> We have support for different spell checker backends.
> All of them are able to check words, of course.
> But the capabilities with personal word lists differs horrible.
> The following table presents the results of my investigation.
> 
> Feature     | aspell | native (mac) | enchant | hunspell
> ========================================================
> check       | +      | +            | +       | +
> suggest     | +      | +            | +       | +
> accept      | +      | +            | +       | +
> insert      | +      | +            | o (2)   | o (3)
> ispersonal? | o (1)  | +            | -       | -
> remove      | -      | +            | + (4)   | -
> 
> Legend:
> + feature is supported
> - feature is not supported
> o there are limitations:
> 1) aspell has the interface to enumerate the personal word list.
>   So it's possible to implement, I have a patch for LyX at hand.
> 2) The versions below 1.6.0 are truncating the personal word list
>   on open - effectively no personal word list available after restart.
> 3) There is no persistent state for personal word lists.

(4) Enchant manages it's own personal word lists.

> There is some rumor on the net already to consolidate the spelling
> for the whole desktop. 
> https://wiki.ubuntu.com/ConsolidateSpellingLibs
> I don't know how long it would last to get some result.

>>>   Thus, as a relatively easy half-way fix, could we please have some
>>>   increased on-screen documentation?  Something like "eg: 'en_GB' for
>>>   aspell." may suffice for 95% of users.
>> 
>> Until the field gets replaced or removed a tooltip may help.
> 
> Great!
> 
>>> 2. Right click to set spellchecker language on a highlighted word fails [*]
>>>   ------------------------------------------------------------------------
>>>   It appears that when 'Tools|Preferences|Language Settings|Spellchecker|
>>>   Spellcheck continuously' is set, and red-wavy (Note: LyX 2.0.0beta1 was
>>>   wavy, LyX 2.0.0beta3 is straight and thicker) underlined words are right
>>>   clicked, there is an option to set their language for spellchecking
>>>   purposes.  However, this does not appear to actually do anything!
>>>   This makes it necessary for the user to select the word then use 'Edit|
>>>   Language|Whatever language' to actually perform the change - pointless
>>>   tedium.
>> 
>> You propose to auto extend the selection to word boundaries when setting
>> the language at a given position and no selection exists. That sounds
>> sensible...
> 
> Hurrah!
> 
>>> 3. Wider problem of spellchecking and multilingual support
>>>   -------------------------------------------------------
>>>   Regarding points 1 and 2, really there is a wider problem of multilingual
>>>   support being a little 'all over the place', with a bunch of different
>>>   "solutions" in use.  In terms of LyX, none of these are really "solutions"
>>>   as even with LyX 2.0beta1 it appears to be demonstrably impossible to link
>>>   the manual language markup made in conjunction with a font-linked solution
>>>   to the manual language markup required for spellchecking purposes.
>> 
>> Sorry, I cannot follow you. The language you assigned a word, phrase or 
>> paragraph
>> is used for spell checking of the given words in this area. Do you refer to 
>> the
>> fact that it's possible to mark two parts of a word with two languages?
> 
> Sorry for my lack of clarity.  Let me try again.
> 
> Right now there are three concepts, none of them really linked by LyX
> without customisation:
> - text style
> - language
> - font
> 
> Only partial solutions exist for relating these together.
> 
> None of them seem to provide for particularly good user experience.
> 
> IMHO the number of hacks developed in this area show clear community
> frustration with the status quo.
> 
> In short, whilst LyX is a great tool, it is in areas like this that it
> occurs to me that perhaps LyX can go so much further by tackling some of "the 
> hard
> problems" such as complex multilingual use cases.

In recent years, I read every now and then mails that proposed the elimination
of manual text markup - at least to hide them better... but it didn't happen.

People want to have the text styles to be able to "finger paint" the text.

>>>   As per previous posts whereby I suggested revising the user interface to
>>>   make proper use of available databases and let the user assign fonts
>>>   to unicode blocks and/or languages and/or custom defined text-types for
>>>   font selection purposes, a forward-looking, integrated solution should
>>>   also take in to account spellchecker requirements.
>>> 
>>>   Otherwise, we poor users are laboured with having to make 1000 manual
>>>   markups just to include a short bit of text!  

This sounds a bit exaggeratedly...

>>>   This is exemplified if,
>>>   for instance, one wishes to quote a place name with translations and their
>>>   romanised equivalents in situ at many points throughout a document
>>>   (my unfortunate situation, and before anyone asks: no I cannot switch to
>>>   compiling a reference table, for reasons of readership and readability)

You may copy the place names to many points in your document.

>>>   In summary, a short list of user-side 'wants' for such a future upgrade
>>>   to multilingual support would be:
>>>    - works with unicode TeX systems (XeTeX)
>>>    - works with TTF
>>>    - provides dialog based font selection (see previous post)
>>>    - provides dialog based language selection (see previous post)
>>>    - does not require duplicate language markup for the font subsystem
>>>      and the spellchecker subsystem
>>>    - upgrades the spellchecker subsystem to be more multilingual aware
>>> 
>>>   Please do reference the previous message which included a UI mockup for
>>>   further details on the proposed genre of solution:
>>>    http://www.mail-archive.com/lyx-users@lists.lyx.org/msg83635.html
>>>    http://pratyeka.org/unicode-font-mockup.png (hosted copy of mockup)
>> 
>> I didn't follow this in detail - sorry...
> 
> Can I clarify?

I meant, this is not the area I have time and energy to spend for.

>> But your wish list above seems a little bit too general.
>> E. g. "upgrades the spellchecker subsystem to be more multilingual aware"...
>> What do you have in mind exactly?
> 
> Again this falls back to the establishment of a functional semantic link
> between the three beasts:
> - text style
> - language
> - font
> 
> So, right now you might spellcheck your document and wind up with a
> single language being applied throughout. A more useful goal would be
> to spellcheck all languages used throughout your document, even
> mixing spellcheck engines with disparate languages as per dictionary
> availability, against appropriate language portions of the document.
> 
>>> 4. Weird behaviour with common prefixes and specialist compounds [X]
>>>   -----------------------------------------------------------------
>>>   Common prefixes such as micro and proto seem to confuse aspell.  Not sure
>>>   if this is somehow related to how it is linked from LyX, but I assume the
>>>   issue is with them.  For example, 'proto-<known word>' does not seem to
>>>   be accepted, forcing 'proto' to be added manually as a valid word.
>>>   Unfortunately, the LyX interface does not offer a proper workaround.
>>>   (Please see point 5.)
>>>   (Note: Upon further investigation, actually a lot of words appear to be
>>>    missing from the default dictionary, including "hewn", "proven",
>>>    "romanised". A scrabble player would be dismayed: for many points!)
>>>   (PS: Did anyone ever wonder about the etymology of 'hardscrabble'? I think
>>>    aspell's default English dictionary could be involved in at least one
>>>    definition...)
>> 
>> This one I have to investigate, cannot comment on this now.
>> 
>> But, AFAIK there is no default aspell dictionary. It depends on the
>> software packager what gets distributed. You may have an installation
>> with german dictionary only. And there are different english dictionaries
>> available...
>> 
>> This is, what my aspell installation has to offer for english:
>> * en, en-w_accents, en-wo_accents
>> * en-variant_0, en-variant_1, en-variant_2
>> * en_CA, en_CA-w_accents, en_CA-wo_accents
>> * en_GB, en_GB-w_accents, en_GB-wo_accents
>> * en_GB-ise, en_GB-ise-w_accents, en_GB-ise-wo_accents
>> * en_GB-ize, en_GB-ize-w_accents, en_GB-ize-wo_accents
>> * en_US, en_US-w_accents, en_US-wo_accents
>> 
>> Some of them are combined dictionaries.
> 
> Perhaps a summary could be made available of dictionary contents,
> either through built-in descriptions and/or the proposed pan-spellchecker-
> engine abstraction/unification library, hmmm?

Perhaps. Currently I don't know of a usable pan-spellchecker-engine.

> 
>> Another option is to switch to hunspell and use the openoffice dictionaries.
>> It is said that these dictionaries are superior.
> 
> Thanks, I will try it. An excellent tip!
> (Of course it would be better if the UI suggested this or even
> detected availability...)

If you cannot see it in your UI you cannot use it. Then it's not available.
BTW, what version of LyX you're using?

> 
>>> 5. Right sidebar spellchecker interface: word addition [*]
>>>   -------------------------------------------------------
>>>   At various points throughout my document I use accepted phrases within
>>>   the sphere of my writing such as "Proto-Austro-Tai" and "Tai-Kadai".
>>> 
>>>   Whilst "Tai" and "Kadai" are also used as individual words, "Proto" and
>>>   "Austro" are not.  With the present spellchecker interface, when such
>>>   'word portions' occur, I am only given two options:
>>> 
>>>    1. Adding these 'word portions' as words in their own right
>>>    2. Ignoring them as words in their own right
>>> 
>>>   Both options are less than ideal because they will subsequently allow
>>>   the individual words to occur alone, ie: such that human input could
>>>   conceivably render "Come hither, pronto!" as "Come hither, proto!" and
>>>   the spellchecker would consider this to be correct, despite the fact
>>>   that proto should possibly not occur as a word in its own right.
>>>   (OK well that's probably arguable, but you still see the point!)
>>> 
>>>   The best option for resolving this would be to modify the LyX spellchecker
>>>   sidebar interface to allow adding arbitrary words or entire words rather
>>>   than simply word portions thereof that have been identified by aspell as
>>>   unknown.  (ie: When presented with "Proto-Austro-Tai", and "Proto" is
>>>   highlighted, then the user should be able to add "Proto-Austro-Tai" as
>>>   a word in its own right rather than only the 'word portion' "Proto" 
>>> itself.)
>>>   (If I recall, 'other' word processing solutions include this feature.)
>> 
>> Here LyX relies on the spell checker interface. Most checkers are able to do
>> the checks at word level only. Consequently you cannot add compound words to
>> your personal dictionary, AFAIK. Here I want to wait for an improvement of 
>> the
>> spell checker libraries. It's possible to check complete sentences -
>> the apple spell checker has this capability. It's even able to auto-dectect
>> the language...
> 
> (Well "they" do say that Apple is very good at usability, and that open source
> generally isn't.  Perhaps "they" are correct in this assertion, sometimes...)

Apple is using the hunspell spell checker engine internally for the spell 
service
of the OS (since Snow Leopard). But obviously they added some code to make the
grammar checking and multi-language detection possible. The spell checker on mac
does what you want, AFAIK. It uses heuristics to choose the language if you 
don't
provide the information. But then - I have seen it in Apples Mail tool - a 
spanish
word in a french sentence will be marked as misspelled. Perhaps you would be
disappointed again by the fact being forced to mark that spanish word manually.

I've heard they returned some contribution to the community (OpenSpell)...

> I will make a note to research aspell dictionaries and capabilities further 
> with
> the intention of issuing a goodly whinge on our collective behalf to the 
> spellcheck
> library people, if indeed this functionality is unavailable. (No ETA...)
> 
>>> 6. Dictionary Re-Use Support [*]
>>>   -----------------------------
>>>   Another point is that of re-use.  Which is to say that, when someone uses
>>>   for example 'BibTeX' to compile a biliographic database, that database
>>>   may easily be used with other projects and is considered portable.  So
>>>   for all physics papers I can use one bibliography, and I may have another
>>>   for history papers.  Whilst this is presently handled adequately by LyX,
>>>   the equivalent functionality is not present for dictionary databases.
>>>   It should be.  This means both adding a 'manage multiple dictionaries in
>>>   this project' feature-set, and adding a 'which dictionary do you want to
>>>   add the word to' drop-down in the right hand spellchecker sidebar.
>> 
>> This is a good idea (already mentioned on developers list, AFAICR).
>> The idea is to incorporate a personal dictionary into the document.
>> But it definitively will not happen tomorrow.
> 
> Great, as long as the personal dictionary "in" the document is saved "outside"
> the document and as a file that can be:
> a) shared between multiple documents
> b) used with zero or more additional personal dictionaries within a single 
> file
> c) identified as the target dictionary (vs. other active personal
> dictionaries) when spell-checking the document and adding words to
> personal dictionaries

The externally saved personal dictionary is shared between multiple documents 
per se.

My idea was to provide the dictionary "inside" the document as an alternative.
If you are sure a word is correctly spelled you want to transfer this know how
with the document.

To have multiple private dictionaries is a third option with - like the second 
one -
a much higher demand on the usability of the spell checker interface.

Stephan

Re: Subject: LyX 2.0beta3: Spell Checking + Multilingualism

Reply via email to