> On Jan 26, 2016, at 1:20 PM, Antoine Pitrou <solip...@pitrou.net> wrote: > > On Tue, 26 Jan 2016 12:16:16 -0500 > Donald Stufft <don...@stufft.io> wrote: >> >> As many of you are aware there has been an effort to replace the current >> PyPI with a new, improved PyPI. This project has been codenamed Warehouse >> and has been progressing nicely. However we’ve run into a bit of an issue >> when deciding what to support that we’re not feeling super qualified to make >> an informed decision on. >> >> The new PyPI is going to support translated content (for the UI elements, >> not for what people upload to there), although we will not launch with any >> translations actually added besides English. Currently the translation >> engine we’re using (l20n.js) does not support anything but “Evergreen” >> browsers (browsers that constantly and automatically update) which means we >> don’t have support for older versions of IE. My question to anyone who is, >> or is familiar with places where English isn’t the native language, how big >> of a deal is this if we only support newer browsers for translations? >> >> If you can weigh in on the issue for this >> (https://github.com/pypa/warehouse/issues/881) that would be great! If you >> know someone who might have a good insight, please pass this along to them >> as well. > > Not answering your question, but needing Javascript on the > client to support L10n sounds like a weird decision (although Mozilla > seems to be pushing this... how surprising). Every bit of client-side > Javascript tends to make Web pages slower and it tends to accumulate > into the current bloated mess that is the modern Web. For static text > this really doesn't sound warranted. > > (not to mention that mutating the body text after the HTML has loaded > may also produce a poor user experience, depending on various > conditions. And the native English speakers who develop the software > on top-grade machines will probably not notice it, thinking everything > is fine.) > > As for your question, though, I would expect some of the less proficient > English speakers to also have outdated hardware or software installs, > especially in poor countries or very humble social environments. >
So the reason for wanting to use L20n (and forgive me, English is the only language I speak so a lot of this is based off of my, possibly wrong, understanding of how other languages work) is because it is a lot more powerful than the traditional gettext based solutions. One such problem I believe is the lack of variants in the older tools like gettext. I think at best you can get singular/plural but I think that other languages have a whole host of different things that they need to vary their grammar based on. An example from the L20n website is: <brandShortName { *nominative: "Aurora", genitive: "Aurore", dative: "Aurori", accusative: "Auroro", locative: "Aurori", instrumental: "Auroro" }> <aboutOld "O brskalniku {{ brandShortName }}"> <about "O {{ brandShortName.locative }}"> Where that would allow you to have the brand name (in this exampel) translated based on the items in that first list. This is powerful enough to support choosing it based on something you pass into the translation engine from the application being translated as well. Another example of this is when you need to adjust the translation based on the gender of the subject (though I don't tihnk we'll use this on PyPI since we're unlikely to ever collect that information), but L20n makes this possible if you pass the gender into the translation engine like: # Thing that gets passed in { "user": { "name": "Jane", "followers": 1337, "gender": "feminine" } } # Translation Snippet <shared[$user.gender] { masculine: "{{ $user.name }} shared your post to his {{ $user.followers }} follower(s).", feminine: "{{ $user.name }} shared your post to her {{ $user.followers }} follower(s).", *default: "{{ $user.name }} shared your post to their {{ $user.followers }} follower(s)." }> In addition to that, L20n also natively understands HTML which makes it a bit easier to work with. In a traditional gettext based system, if you wanted to do something like translate a string of text that contains a link to something you'd need to do something like this: 'This is a sentence that has an embedded <a href="%(url)s">link</a>' Then you need to expect your translators to correctly generate that HTML, including and classes or style information that is in it (and if you alter that all translations need to be updated to fix it). However, in L20n you can simple do something like this: <p data-l10n-id="mySentence">This is a sentence that has an embedded <a href="https://../">link</a></p> Then when your translators go to translate it, they only need to do: <mySentence "This is a translated sentence with a <a>link</a>"> They never need to worry about matching the exact HTML, they just need to worry about marking the structure correctly. The one downside to this, is that there is not currently any way for them to *reorder* the HTML elements (like if you have two links) which I'm not sure how big of a deal that is. Finally, on the L20n vs gettext side, L20n forces you to define IDs for your translations instead of reusing the source (generally English) text as your ID. This means that people can be free to tweak the English text in ways that do not alter the semantics of the statement without having that affect the other translations, as long as the ID stays the same all existing translations will continue to be used. Now, all of the above could be written serverside in Python and not require anything of the end user's browser. We're currently using L20n.js instead of something serverside for a few reasons. The biggest and most obvious reason is because L20n.js is currently the only implementation of L20n that exists, so moving to L20n at the server side would require us to devote time to writing that instead of working on Warehouse itself. Another reason is that L20n.js also allows you to do what they call "responsive translations". Essentially it allows you to have variants of your translation based on properties of the end user's machine such as operating system, window size, time of day, etc. This makes it easy to have a translation say, switch between a longer form of a translation when running full screen in a large browser or a shorter form when running on a small smart phone screen without having a fairly common (I think?) problem where the source text and the translated text are vastly different in length. The final reason is that by moving translation into the client side we can increase the chance that for any particular page a user visits they will be served directly out of a Fastly POP located closely to them instead of needing to round trip from the Fastly POP located closely to them, to the Fastly POP located in Ashburn, Virgnia, USA, to the PyPI servers located in another DC in Virgnia. Instead of needing to cache and serve a different variant of the page for every single language, we can instead have a single variant of the page for all languages, and a single language file for each language that is used for all pages (much like CSS) which will increase the cache hit ration. This will also make it more likely that if the PyPI origin servers go down, that the user won't get an error response (since serving out of cache never hits the backend servers, and Fastly is configured to serve stale responses from the cache on the case of an Error). Even for users of lesser used languages that might not have their language file cached, if the PyPI servers are down, they'll still get the English variant served from cache but if their language file isn't cache and PyPI is down, they just won't get it translated, falling back to the english version in that case. Anyways, I don't know if all of these reasons are good enough reasons to impose the browser requirement on our users. On paper it certainly appears to me like they are, giving people better, higher quality translations seems to be a good thing to me. However, I don't know how bad the limitations of gettext are in practice. I will say that under 1% of our views (not users, views) on PyPI currently come from browser set to something other than English AND which l20n.js does not support but it's possible that this is a chicken/egg situation where we're not getting a lot of traffic from those users because PyPI isn't translated in a way that they can use. I am also unsure how the fact that the majority of PyPI's content will still be in English (since this is just for the UI, not the content) affects all of this, but there are projects out there where the content is definitely not English (though I am unsure what languages they are, the ones I've seen use some sort of Asian looking lettering). Hopefully this was useful information! ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig