Re: [Toolserver-l] interwiki.py
Hello, At Saturday 21 January 2012 15:23:42 DaB. wrote: I have only one issue: in spanish projects we don't have approved a global bot rule (wikipedia, wikinews, wiki*). Is any problem with the plan if we don't have this flag status? the new bot (or bots) run by the MMP has of course to respect the local bot- rules of a project. So if a project demands a local approval first, the MMP has to request the approval before running a bot there. Sincerly, DaB. -- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885 signature.asc Description: This is a digitally signed message part. ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] interwiki.py
Sounds great annd I agree with it. I have only one issue: in spanish projects we don't have approved a global bot rule (wikipedia, wikinews, wiki*). Is any problem with the plan if we don't have this flag status? Dennis Tobar El 15/01/2012 13:39, DaB. w...@daniel.baur4.info escribió: Hello, At Sunday 15 January 2012 17:13:26 DaB. wrote: Isn't this a bit too many interwiki bots? yes, there are, although not the cpu-load is the problem but the memory-usage. The best solution would be if the mediawiki-devs finaly get rid of interwiki- links in the article-text of course, but I have the fealing thta will not happen soon. The second best solution would be, if the interwiki.py would fix their code, but there I have also the fealing that will take some time. So here is my plan to fix the problem on our (the TS) side: 1.) I create a MMP called interwiki-bot (or something). 2.) YOU (the ts-users) choose (by election, by appointing, by playing Trip to Jerusalem, I don't care) 5 of you who will become member of that MMP until 15th February. Only rule: 1 of the 5 has to be an active user of a non- wikipedia-project (like wikisource or wiktionary or so). 3.) The members of the MMP create a wikimedia-project-account (like ts- interwikibot or something) and request global-bot-status until 1. April. 4.) After 2. April no-one is allowed to run a interwiki-bot except the MMP. Any problems with my plan? Sincerly, DaB. -- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885 ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] interwiki.py
On 01/16/2012 09:03 AM, Hydriz Wikipedia wrote: 4.) After 2. April no-one is allowed to run a interwiki-bot except the MMP. Sounds great to me. If I discover that interwiki links between two languages are not updated, I also need a place to report this to the MMP group, rather than starting my own interwiki.py job. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] interwiki.py
As an interwiki bot runner myself, I find this plan a little too constrained. Many of us run the interwiki bots on many different configuration and with this MMP project created, some configurations that we use would then not be available with this new plan. I can't think of much, but seeing from top -c, I can tell that the other bot runners run their bots differently from mine. Personally, I rather we wait for the Pywikipedia devs to fix that script, install more memory for interwiki bots, or create another custom login server just for running interwiki bots. Your plan is generally okay, just about having only 5 people to run this project, from many many bot operators, its quite hard to choose. Its best if people don't run multiple interwiki bots for one project (especially enwiktionary, which has an overload of interwiki bots). Regards, Hydriz From: w...@daniel.baur4.info To: toolserver-l@lists.wikimedia.org Date: Sun, 15 Jan 2012 17:38:40 +0100 Subject: Re: [Toolserver-l] interwiki.py Hello, At Sunday 15 January 2012 17:13:26 DaB. wrote: Isn't this a bit too many interwiki bots? yes, there are, although not the cpu-load is the problem but the memory-usage. The best solution would be if the mediawiki-devs finaly get rid of interwiki- links in the article-text of course, but I have the fealing thta will not happen soon. The second best solution would be, if the interwiki.py would fix their code, but there I have also the fealing that will take some time. So here is my plan to fix the problem on our (the TS) side: 1.) I create a MMP called interwiki-bot (or something). 2.) YOU (the ts-users) choose (by election, by appointing, by playing Trip to Jerusalem, I don't care) 5 of you who will become member of that MMP until 15th February. Only rule: 1 of the 5 has to be an active user of a non- wikipedia-project (like wikisource or wiktionary or so). 3.) The members of the MMP create a wikimedia-project-account (like ts- interwikibot or something) and request global-bot-status until 1. April. 4.) After 2. April no-one is allowed to run a interwiki-bot except the MMP. Any problems with my plan? Sincerly, DaB. -- Userpage: [[:w:de:User:DaB.]] ― PGP: 2B255885 ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] interwiki.py
Perhaps you should look at why people are running with different settings then standardize. Interwiki bots should all be doing the roughly the same job, shouldn't they? so whats with the different settings? ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] interwiki.py
Sometimes these bot operators might just want to run the bot on a few pages, or some special settings that I don't know of. But still entirely blocking people from running interwiki bots is quite ridiculous. Regards, Hydriz From: p858sn...@gmail.com Date: Mon, 16 Jan 2012 18:09:12 +1000 To: toolserver-l@lists.wikimedia.org Subject: Re: [Toolserver-l] interwiki.py Perhaps you should look at why people are running with different settings then standardize. Interwiki bots should all be doing the roughly the same job, shouldn't they? so whats with the different settings? ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] interwiki.py
On Sun, Jan 15, 2012 at 5:38 PM, DaB. w...@daniel.baur4.info wrote: So here is my plan to fix the problem on our (the TS) side: +1 :) ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] interwiki.py
2012/1/16 Hydriz Wikipedia ad...@wikisorg.tk: Personally, I rather we wait for the Pywikipedia devs to fix that script, This is not going to happen anytime soon. Considering the state of the code base (two hundred exceptions for three hunderd wikis, long functions and no automated testing - and thus practically untestable), and the state of the InterLanguage extension ('will be installed soon'), so-one is really willing to invest a lot of time in tracking memory usage and reducing it. The only reasonable action we can take to reduce the memory consumption is to let the OS do its job in freeing memory: using one process to track pages that have to be corrected (using the database, if possible), and one process to do the actual fixing (interwiki.py). This should be reasonably easy to implement (i.e. use a pywikibot page generator to generate a list of pages, use a database layer to track interlanguage links and popen('interwiki.py page') if this is a fixable situation) Best, Merlijn ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] interwiki.py
Yes, so probably our issues here are the lack of coordination of bot owners and memory usage issue. Shouldn't we write some simple script which can automatically remove old memory used by the interwiki script? DaB's idea was okay for me, just that one of the points was that no one else can run the interwiki script anymore, which is ridiculous to me. Maybe the MMP can be used to ensure that there is no overlapping bots? All interwiki bot owners should join this project, check an available wiki that no one has taken up and start asking for clearance to run their own interwiki bot there. Regards, Hydriz From: valhall...@arctus.nl Date: Mon, 16 Jan 2012 09:19:19 +0100 To: toolserver-l@lists.wikimedia.org Subject: Re: [Toolserver-l] interwiki.py 2012/1/16 Hydriz Wikipedia ad...@wikisorg.tk: Personally, I rather we wait for the Pywikipedia devs to fix that script, This is not going to happen anytime soon. Considering the state of the code base (two hundred exceptions for three hunderd wikis, long functions and no automated testing - and thus practically untestable), and the state of the InterLanguage extension ('will be installed soon'), so-one is really willing to invest a lot of time in tracking memory usage and reducing it. The only reasonable action we can take to reduce the memory consumption is to let the OS do its job in freeing memory: using one process to track pages that have to be corrected (using the database, if possible), and one process to do the actual fixing (interwiki.py). This should be reasonably easy to implement (i.e. use a pywikibot page generator to generate a list of pages, use a database layer to track interlanguage links and popen('interwiki.py page') if this is a fixable situation) Best, Merlijn ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] interwiki.py
2012/1/16 Hydriz Wikipedia ad...@wikisorg.tk: Yes, so probably our issues here are the lack of coordination of bot owners and memory usage issue. Shouldn't we write some simple script which can automatically remove old memory used by the interwiki script? DaB's idea was okay for me, just that one of the points was that no one else can run the interwiki script anymore, which is ridiculous to me. Maybe the MMP can be used to ensure that there is no overlapping bots? All interwiki bot owners should join this project, check an available wiki that no one has taken up and start asking for clearance to run their own interwiki bot there. Even bots on different wikis will have a large overlap. Perhaps we should restrict not-'selected' interwiki bots to running with -back set (for autonomous runs on the main namespace of Wikipedia, because I think that that's where most bots are active)? -- André Engels, andreeng...@gmail.com ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] interwiki.py
I think that the most important diffenrence is the home wiki for the launch. 2012/1/16 K. Peachey p858sn...@gmail.com Perhaps you should look at why people are running with different settings then standardize. Interwiki bots should all be doing the roughly the same job, shouldn't they? so whats with the different settings? ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] interwiki.py
Hello, At Monday 16 January 2012 13:41:28 DaB. wrote: As an interwiki bot runner myself, I find this plan a little too constrained. It is a little bit drastic, yes but it will work. There were some other ideas in the past (see Chris Grant's mail), but they didn't work at the end. The new plan has the following advantages: - YOU (the ts-users) make the rules and decide who should be in the MMP (I never said BTW that the people in the MMP should make the rules), -The roots can contact the group quite easily instead of speaking to douzends of users, -The Wikimedia-Project-People (Wikipedians, Wikisourclers, etc.) have only 1 contact-adress too, -The cases of bot a removes a link and bot b put it in again 5 minutes later will reduce very much. Many of us run the interwiki bots on many different configuration and with this MMP project created, some configurations that we use would then not be available with this new plan. I can't think of much, but seeing from top -c, I can tell that the other bot runners run their bots differently from mine. Like Hercule said, that should be the homewiki for most times; and it should be no problem of the MMP-people to switch the homewiki now and then (e.g. if they run 5 instances of their bot and change the homewiki ever hour, then every project is the homewiki every 3 days). The MMP should also only be for interwiki-bots which run permantly; if an user let run a bot because a wiki needs to change 100 interwiki-links on a one- time-base, that's no problem. Personally, I rather we wait for the Pywikipedia devs to fix that script, install more memory for interwiki bots, or create another custom login server just for running interwiki bots. Throwing more hardware at a problem doesn't fix the problem at all and like Merlijn wrote already, I doubt that the pywikipedia-devs will fix the problem soon (they know about it for years, and don't seems to care that after some time a simple python-script needs more memory than a java-programm *including* the virtual maschine!). Your plan is generally okay, just about having only 5 people to run this project, from many many bot operators, its quite hard to choose. Its best if people don't run multiple interwiki bots for one project (especially enwiktionary, which has an overload of interwiki bots). In theory, 1 user would be enough to run a interwiki-bots (or serveral instances of it) for all wikis. I increased the number to 5 to make sure that there is always somebody to controll the bots. Regards, Hydriz Sincerly, DaB. -- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885 signature.asc Description: This is a digitally signed message part. ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] interwiki.py
Merlijn van Deen valhall...@arctus.nl wrote: Personally, I rather we wait for the Pywikipedia devs to fix that script, This is not going to happen anytime soon. Considering the state of the code base (two hundred exceptions for three hunderd wikis, long functions and no automated testing - and thus practically untestable), and the state of the InterLanguage extension ('will be installed soon'), so-one is really willing to invest a lot of time in tracking memory usage and reducing it. The only reasonable action we can take to reduce the memory consumption is to let the OS do its job in freeing memory: using one process to track pages that have to be corrected (using the database, if possible), and one process to do the actual fixing (interwiki.py). This should be reasonably easy to implement (i.e. use a pywikibot page generator to generate a list of pages, use a database layer to track interlanguage links and popen('interwiki.py page') if this is a fixable situation) We could also move the pressure: Labs' bot running infra- structure doesn't seem to be /that/ far from opening. If interwiki bots were running there, it would allow the foun- dation to judge whether pushing for the deployment of Inter- Language isn't worth it in the end. Meanwhile I think DaB.'s proposal is very adequate. Tim ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] interwiki.py
On Tue, Jan 17, 2012 at 3:02 AM, Tim Landscheidt t...@tim-landscheidt.de wrote: We could also move the pressure: Labs' bot running infra- structure doesn't seem to be /that/ far from opening. If interwiki bots were running there, it would allow the foun- dation to judge whether pushing for the deployment of Inter- Language isn't worth it in the end. Labs isn't a fix all solution for situations like these, Since the issue is interwiki,py has memory management problems amongst others apparently I would be guessing ryan would be hesitant to have it running that labs platform even though labs is designed to do more more virtual containers than a shared system like how the toolserver operates unless those issues were resolved. ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] interwiki.py
On 01/16/2012 01:39 PM, K. Peachey wrote: On Tue, Jan 17, 2012 at 3:02 AM, Tim Landscheidt t...@tim-landscheidt.de wrote: We could also move the pressure: Labs' bot running infra- structure doesn't seem to be /that/ far from opening. If interwiki bots were running there, it would allow the foun- dation to judge whether pushing for the deployment of Inter- Language isn't worth it in the end. Labs isn't a fix all solution for situations like these, Since the issue is interwiki,py has memory management problems amongst others apparently I would be guessing ryan would be hesitant to have it running that labs platform even though labs is designed to do more more virtual containers than a shared system like how the toolserver operates unless those issues were resolved. Since is this something labs could do? has come up, please feel free to add features and functionality you'd like in Labs at https://www.mediawiki.org/wiki/Wikimedia_Labs/Toolserver_features_wanted -- Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] interwiki.py
Hello, At Sunday 15 January 2012 17:13:26 DaB. wrote: Isn't this a bit too many interwiki bots? yes, there are, although not the cpu-load is the problem but the memory-usage. The best solution would be if the mediawiki-devs finaly get rid of interwiki- links in the article-text of course, but I have the fealing thta will not happen soon. The second best solution would be, if the interwiki.py would fix their code, but there I have also the fealing that will take some time. So here is my plan to fix the problem on our (the TS) side: 1.) I create a MMP called interwiki-bot (or something). 2.) YOU (the ts-users) choose (by election, by appointing, by playing Trip to Jerusalem, I don't care) 5 of you who will become member of that MMP until 15th February. Only rule: 1 of the 5 has to be an active user of a non- wikipedia-project (like wikisource or wiktionary or so). 3.) The members of the MMP create a wikimedia-project-account (like ts- interwikibot or something) and request global-bot-status until 1. April. 4.) After 2. April no-one is allowed to run a interwiki-bot except the MMP. Any problems with my plan? Sincerly, DaB. -- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885 signature.asc Description: This is a digitally signed message part. ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
[Toolserver-l] interwiki.py
Hi everyone, Nightshade was a bit slow so I typed top -c. I was amazed to see that almost all the top processes seem to be interwiki related (interwiki.py). Same seems to be the case at willow. Normally I wouldn't really care, we have the servers so we should use them, but now the login servers seem to be overloaded. Isn't this a bit too many interwiki bots? Maarten ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] interwiki.py
In my opinion, the problem is not how many bots are working, but that interwiki.py seems to use to overuse memory space. For example, my interwiki.py was running, it was spending 982 megabyte and it was killed. You may find a verbose log of its work at http://toolserver.org/~nickanc/interwiki.log . Nickanc 2012/1/14 Maarten Dammers maar...@mdammers.nl: Hi everyone, Nightshade was a bit slow so I typed top -c. I was amazed to see that almost all the top processes seem to be interwiki related (interwiki.py). Same seems to be the case at willow. Normally I wouldn't really care, we have the servers so we should use them, but now the login servers seem to be overloaded. Isn't this a bit too many interwiki bots? Maarten ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] interwiki.py
On 15 January 2012 00:32, Nickanc Wikipedia nickanc.w...@gmail.com wrote: In my opinion, the problem is not how many bots are working, but that interwiki.py seems to use to overuse memory space. For example, my interwiki.py was running, it was spending 982 megabyte and it was killed. You may find a verbose log of its work at http://toolserver.org/~nickanc/interwiki.log . No, that is not the problem. Multichill was referring to CPU usage, not memory usage. And although interwiki.py in general is using a large amount of memory, your specific case has a different origin (being the use of the ReferringPageGenerator, which results in a memory *leak*). You may be able to partially mitigate your problem by using interwiki_contents_on_disk = True, but this will not solve the actual memory leak - it will only release memory used by the page contents. If you'd like to discuss details on your problem, please mail to the pywikipedia mailing list pywikipedi...@lists.wikimedia.org or visit on IRC. Merlijn ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] interwiki.py
There has been some similar discussion in the past, on how to deal with interwiki bots. http://lists.wikimedia.org/pipermail/toolserver-l/2010-November/003660.html http://lists.wikimedia.org/pipermail/toolserver-l/2010-December/003698.html http://lists.wikimedia.org/pipermail/toolserver-l/2011-January/003847.html - Chris ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette