Thanks for the thorough explanation and practical suggestion again.
I would like to know if there's any traffic numbers and response time 
expectation for the Parsing wikitext API,[1] considering the use case 
suggested[2][3] which can handle the conversion. To be more specific, I am 
expecting 900 rps at peak and 15 rps in average. Please let me know if the 
traffic will cause any issue or any other API more suitable for the purpose.
Thanks again for your time and support.
-- Ben Yeh
 2017年2月6日 星期一 下午4:33:23 [台北],<> 寫道:Thank 
you so much for your kind help and clear explanation. 
The use case is indeed a Chinese-language project, and the examples provided 
posed a nice illustration of how three kinds of possible outcomes will show in 
different versions. Here I would like to add another scenario in hope for a 
further understanding to Wikimedia's language conversion. If you search for 
"川普" ( Donald Trump in Traditional Chinese ) at OpenSearch API layer with 
redirects=resolve, [1] the first description should be "唐納·約翰·川普(英語:Donald John 
 which is Traditional Chinese; however, if the profile parameter is set to 
restrict,[2] the first description should become "唐纳德·约翰·特朗普(英语:Donald John 
 which is Simplified Chinese. 

This scenario indicates that language conversion happens not just in display 
time, but also at API layer. To add another interesting point, the API use 
cases mentioned above [1][2] can even have various outcomes in different 
The ultimate issue now should be how such language conversions can happen at 
API layers and how can it be controlled?
-- Ben Yeh
在 2017年1月26日 星期四 上午4:24:05 [台北], Trey Jones<> 寫道:Let's see 
if I can help, either directly, or indirectly via Cunningham's Law.[1]
I'm reading this as you are searching a Chinese-language project (like, and getting results that are mixed Traditional and 
Simplified Chinese. If that's not the case, please elaborate!

My understanding, which is admittedly incomplete, is that the text for 
Chinese-language projects is stored however it was entered (Traditional or 
Simplified), and is converted at display time. If you look at the main page of[2] today without being logged in (or in a private browsing 
window), the featured article link has this text: "2007年欧洲冠军联赛決賽", which uses 
both 赛 and 賽, with 赛 being the Simplified version of Traditional 賽.[3] If you 
request the zh-cn version of the page,[4] the text is "2007年欧洲冠军联赛决赛", and both 
are Simplified "赛". If you request the zh-tw version of the page[5], the text 
is "2007年歐洲冠軍聯賽決賽", and both are Traditional "賽". So, I believe that explains 
why you are seeing mixed Traditional and Simplified results.
What to do about it? I can't get the Opensearch API to do the conversion in 
place, but there is a separate API that does the conversion: Parsing 
wikitext.[6] Unfortunately, I can only get the API to do the conversion (which 
is based on the uselang parameter) when I submit the text as wikitext,[7][8] 
which adds some additional tags and a long comment to the results. \u-formatted 
input doesn't work, and I can't get the conversion to work for json input 
(i.e., the result of the Opensearch call). That doesn't mean it isn't possible, 
just that I haven't figured it out.

I hope that points you in the right direction, and maybe inspires someone who 
knows this stuff better than me to help out.

Trey Jones
Software Engineer, Discovery
Wikimedia Foundation

On Wed, Jan 25, 2017 at 11:22 AM, Adam Baso <> wrote:

+discovery list
On Wed, Jan 25, 2017 at 10:15 AM, Brad Jorsch (Anomie) <> 

On Wed, Jan 25, 2017 at 2:09 AM, <> wrote:

While I was developing some services based on API:Opensearch, I found that the 
response of the same url request can be either Simplified Chinese or 
Traditional Chinese. To be more specific, I would love to know how can I 
determine the response language form from API layer ( Or other factors that may 
have impact ) ? Since the document of API:Opensearch doesn't seem to take 
language into consideration,

The OpenSearch Suggestions extension specification[1] does not allow for 
returning additional metadata such as language with the response. You may want 
to look at the prefixsearch query module[2] instead which allows for returning 
the same results in a different format, although I don't know the details of 
how language variants are handled in the search output.

 [1]: ifications/OpenSearch/Extensio 
 [2]: /API:Prefixsearch 

Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
______________________________ _________________
Mediawiki-api mailing list
Mediawiki-api@lists.wikimedia. org ilman/listinfo/mediawiki-api

______________________________ _________________
discovery mailing list mailman/listinfo/discovery

Mediawiki-api mailing list

Reply via email to