Re: [Mediawiki-i18n] Please view and comment CAPTCHA images in 154 languages

Federico Leva (Nemo) Tue, 01 Apr 2014 15:31:25 -0700

Today I made a couple patches that should address most of the problemsreported as well as handle RTL languages and multilingual blacklist. I'mmostly using some Unicode magic which is quite well hidden in someobscure libraries, we'll see if it works. :)

In case it's not clear, for now I'm focusing on the *MediaWiki* side ofthe matter; the Wikimedia side, i.e. where to use what and how, issomething we'll worry about when we actually have this option (orothers) available in the codebase.


A couple questions below.

P. Blissenbach, 31/03/2014 17:13:
> captchas having two lines
> of identcal text [...] and accept either input.

This would need to be filed as separate enhancement request.

Shimmin, 31/03/2014 20:02:

If you actually want the captchas to make any sense in terms of word
combination and construction, that would be a whole different issue.
There's inflection, rules on what happens when words are run together
(spelling changes for one), and so on.

I suppose you're only talking of the morphological side here, right? Thecurrent patch contains a couple lines to handle hyphenation for Finnish,because it was originally provided by Nikerabbit, but we're definitelynot going to build a universal grammar of univerbation in a MediaWikiscript. Unless someone comes up with a general solution I think we'lldrop that part.

If this turns out to be confusing, I'd rather just show the two (or N)words as separate words, what do you think? This can be done in aseparate patch; once we introduce some other security improvements, Ithink the challenge of identifying where one word ends and the nextstarts may be redundant.


Quite a few of the l look like i in this font, which seems problematic.

This is indeed a problem with sans serif fonts but the broad majoritythinks they are better. We can try to pick clearer fonts but most helpwill come from words being familiar to humans. I may upload more testswith this font, though: https://commons.wikimedia.org/wiki/File:AndBasR.pdf

Should this be "leigh"?


Yes. If incorrect, please edit: https://en.wiktionary.org/?oldid=23059687


Looks like "neuscanshoil" with a random -y added, a hangover from
English behaviour?

Same problem as with Malayam and others; the last version will avoidcombining single letters to other words.


[...]
(though Aaue is a proper name) [...]

Perick is also a proper name  [...]

Do others think proper names are a problem? If yes they might be easyenough to remove, usually they're tagged as such on Wiktionary.Otherwise, this adds some cheap variety in our dictionaries.


The form "vaayl" is a rare grammar-induced form of an unusual word

In this case it's again a proper noun, no idea how correct or howcurrent: <https://en.wiktionary.org/?oldid=21902154>


Hard to read, could be "hiu shee" or "niu shee"

It was "hiu": no "niu" in our dictionary. If the latter is a valid word,you should add it to Wiktionary and then we can try to figure outsomething to exclude confusable words.

Once again, the proposed approach is to rely on a mix of Unicode magicand self-healing (wiki) dictionary. Neither is enough alone.


This one means "arctic castration" (spoiy = castration).  Not obscene,
but maybe not for everyone?

Well, it could fall under "obscene" for some definition of the word. I'mnow blacklisting also "pejorative" and "offensive" words, those who carecan try and see if their label edits survive on the wiki.

https://en.wiktionary.org/wiki/Wiktionary:Context_labels

Nemo

_______________________________________________
Mediawiki-i18n mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n

Re: [Mediawiki-i18n] Please view and comment CAPTCHA images in 154 languages

Reply via email to