Re: [Foundation-l] Wikimedia and Environment
Дана Saturday 12 December 2009 17:41:44 jamesmikedup...@googlemail.com написа: On Sat, Dec 12, 2009 at 5:32 PM, Teofilo teofilow...@gmail.com wrote: Do we have an idea of the energy consumption related to the online access to a Wikipedia article ? Some people say that a few minutes long search on a search engine costs as much energy as boiling water for a cup of tea : is that story true in the case of Wikipedia (4) ? my 2 cents : this php is cooking more cups of tea than an optimized program written in c. But think of all the coffee developers would have to cook while coding and optimizing in C! ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Wikimedia and Environment
On Sat, Dec 12, 2009 at 5:32 PM, Teofilo teofilow...@gmail.com wrote: How about moving the servers (5) from Florida to a cold country (Alaska, Canada, Finland, Russia) so that they can be used to heat offices or homes ? It might not be unrealistic as one may read such things as the solution was to provide nearby homes with our waste heat (6). I don't think that's a practical solution. It's not because they need to be cooled that computers cost so much energy - rather the opposite: they use much energy, and because energy cannot be created or destroyed, this energy has to go out some way - and that way is heat. -- André Engels, andreeng...@gmail.com ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Wikimedia and Environment
On Sun, Dec 13, 2009 at 10:30 AM, Nikola Smolenski smole...@eunet.rs wrote: Дана Saturday 12 December 2009 17:41:44 jamesmikedup...@googlemail.com написа: On Sat, Dec 12, 2009 at 5:32 PM, Teofilo teofilow...@gmail.com wrote: Do we have an idea of the energy consumption related to the online access to a Wikipedia article ? Some people say that a few minutes long search on a search engine costs as much energy as boiling water for a cup of tea : is that story true in the case of Wikipedia (4) ? my 2 cents : this php is cooking more cups of tea than an optimized program written in c. But think of all the coffee developers would have to cook while coding and optimizing in C! But that is a one off expense. That is why we programmers can earn a living, because we can work on many projects. Also we drink coffee while playing UrbanTerror as well. 1. Php is very hard to optimize. 2. The mediawiki has a pretty nonstandard syntax. The best that I have seen is the python implementation of the wikibook parser. But given that each plugin can change the syntax as it will, it will get more complex. 3. Even python is easier to optimize than php. 4. The other questions are, does it make sense to have such a centralized client server architecture? We have been talking about using a distributed vcs for mediawiki. 5. Well, now even if the mediawiki is fully distributed, it will cost CPU, but that will be distributed. Each edit that has to be copied will cause work to be done. In a distributed system even more work in total. 6. Now, I have been wondering anyway who is the benefactor of all these millions spend on bandwidth, where do they go to anyway? What about making a wikipedia network and have the people who want to access it pay instead of having us pay to give it away? With these millions you can buy a lot of routers and cables. 7. Now, back to the optimization. Lets say you were able to optimize the program. We would identify the major cpu burners and optimize them out. That does not solve the problem. Because I would think that the php program is only a small part of the entire issue. The fact that the data is flowing in a certain wasteful way is the cause of the waste, not the program itself. Even if it would be much more efficient and moving around data that is not needed, the data is not needed. This would eventually lead, in an optimal world to updates not even being distributed at all. Not all changes have to be centralized. Lets say that there is one editor who would pull the changes from others and make a public version. That would mean that only they would need to have all data for that one topic. I think that you could optimize the wikipedia along the lines of data travelling only to the people who need it (editors versus viewers) and you would optimize first a way to route edits into special interest groups and create smaller virtual subnetworks of the editors CPUs working together in a peer to peer direct network. So if you have 10 people collaborating on a topic, only the results of that work will be checked into the central server. the decentralized communication would be between fewer parties and reduce the resources used. see also : http://strategy.wikimedia.org/wiki/Proposal:A_MediaWiki_Parser_in_C mike ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Wikimedia and Environment
Hi!!! 1. Php is very hard to optimize. No, PHP is much easier to optimize (read - performance oriented refactoring). 3. Even python is easier to optimize than php. Python's main design idea is readability. What is readable, is easier to refactor too, right? :) 4. The other questions are, does it make sense to have such a centralized client server architecture? We have been talking about using a distributed vcs for mediawiki. Lunatics without any idea of stuff being done inside the engine talk about distribution. Let them! 5. Well, now even if the mediawiki is fully distributed, it will cost CPU, but that will be distributed. Each edit that has to be copied will cause work to be done. In a distributed system even more work in total. Indeed, distribution raises costs. 6. Now, I have been wondering anyway who is the benefactor of all these millions spend on bandwidth, where do they go to anyway? What about making a wikipedia network and have the people who want to access it pay instead of having us pay to give it away? With these millions you can buy a lot of routers and cables. LOL. There's quite some competition in network department, and it has become economy of scale (or of serving youtube) long ago. 7. Now, back to the optimization. Lets say you were able to optimize the program. We would identify the major cpu burners and optimize them out. That does not solve the problem. Because I would think that the php program is only a small part of the entire issue. The fact that the data is flowing in a certain wasteful way is the cause of the waste, not the program itself. Even if it would be much more efficient and moving around data that is not needed, the data is not needed. We can have new kind of Wikipedia. The one where we serve blank pages, and people imagine content in it. We\ve done that with moderate success quite often. So if you have 10 people collaborating on a topic, only the results of that work will be checked into the central server. the decentralized communication would be between fewer parties and reduce the resources used. Except that you still need tracker to handle all that, and resolve conflicts, as still, there're no good methods of resolving conflicts with small number of untrusted entities. see also : http://strategy.wikimedia.org/wiki/Proposal:A_MediaWiki_Parser_in_C How much would that save? Domas ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Wikimedia and Environment
Let me sum this up, The basic optimization is this : You don't need to transfer that new article in every revision to all users at all times. The central server could just say : this is the last revision that has been released by the editors responsible for it, there are 100 edits in process and you can get involved by going to this page here (hosted on a server someplace else). There is no need to transfer those 100 edits to all the users on the web and they are not interesting to everyone. On Sun, Dec 13, 2009 at 12:10 PM, Domas Mituzas midom.li...@gmail.com wrote: 4. The other questions are, does it make sense to have such a centralized client server architecture? We have been talking about using a distributed vcs for mediawiki. Lunatics without any idea of stuff being done inside the engine talk about distribution. Let them! I hope you are serious here, Lets take a look at what the engine does, it allows editing of text. It renders the text. It serves the text. The wiki from ward cunningham is a perl script of the most basic form. There is not much magic involved. Of course you need search tools, version histories and such. There are places for optimizing all of those processes. It is not lunacy, it is a fact that such work can be done, and is done without a central server in many places. Just look at for example how people edit code in an open source software project using git. It is distributed, and it works. There are already wikis based on git available. There are other peer to peer networks such as TOR or freenet that would be possible to use. If you were to split up the editing of wikipedia articles into a network of git servers across the globe and the rendering and distribution of the resulting data would be the job of the WMF. Now the issue of resolving conflicts is pretty simple in the issue of git, everyone has a copy and can do what they want with it. If you like the version from someone else, you pull it. In terms of wikipedia as having only one viewpoint, the NPOV that is reflected by the current revision at any one point in time, that version would be one pushed from its editors repositories. It is imaginable that you would have one senior editor for each topic who has their own repository of of pages who pull in versions from many people. 7. Now, back to the optimization. Lets say you were able to optimize the program. We would identify the major cpu burners and optimize them out. That does not solve the problem. Because I would think that the php program is only a small part of the entire issue. The fact that the data is flowing in a certain wasteful way is the cause of the waste, not the program itself. Even if it would be much more efficient and moving around data that is not needed, the data is not needed. We can have new kind of Wikipedia. The one where we serve blank pages, and people imagine content in it. We\ve done that with moderate success quite often. Please lets be serious here! I am talking about the fact that not all people need all the centralised services at all times. So if you have 10 people collaborating on a topic, only the results of that work will be checked into the central server. the decentralized communication would be between fewer parties and reduce the resources used. Except that you still need tracker to handle all that, and resolve conflicts, as still, there're no good methods of resolving conflicts with small number of untrusted entities. A tracker to manage what server is used for what group of editors can be pretty efficient. Essentially it is a form of DNS. A tracker need only show you the current repositories that are registered for a certain topic. Resolving conflicts is important, but you only need so many people for that. The entire community does not get involved in all the conflicts. There are only a certain number of people that are deeply involved in any one section of the wikipedia at any given time. Imagine that you had, lets say 1000 conference rooms available for discussion and working together spread around the world and the results of those rooms would be fed back into the Wikipedia. These rooms or servers would be for processing the edits and conflicts any given set of pages. My idea is that you don't need to have a huge server to resolve conflicts. many pages don't have many conflicts, there are certain areas which need constant arbitration of course. Even if you split up the groups into different viewpoints where the arbitration team only deals with the output of two teams (pro and contra). Even if you look at the number of editors in a highly contested page, they are not unlimited. From the retrospective you would be able to identify what groups of editors are collaborating (enhancing each other) and conflicting (overwriting each other). If you split them up into different rooms when they should be collaborating and reduce the conflicts, then you will win alot. Even in Germany,
Re: [Foundation-l] Wikimedia and Environment
2009/12/12, Geoffrey Plourde geo.p...@yahoo.com: With regards to Florida, if the servers are in an office building, one way to decrease costs might be to reconfigure the environmental systems to use the energy from the servers to heat/cool the building. Wikimedia would then be able to recoup part of the utility bills from surrounding tenants. I am not sure the laws of thermodynamics (1) would allow to use that heat to cool a building. You would need a cold source like a river to convert heat back into electricity. But it might be more cost efficient to have the water from the river circulate directly into the building, so that your extra heat is still remaining unused. This is why I think it is more difficult to find solutions in a hot country like Florida than in a cold country (as long as you don't question the very existence of heated homes in cold countries, leaving aside the possibility of moving people and their homes from cold to warm countries). (1) http://en.wikipedia.org/wiki/Laws_of_thermodynamics#Second_law ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Wikimedia and Environment
2009/12/13, Andre Engels andreeng...@gmail.com: I don't think that's a practical solution. It's not because they need to be cooled that computers cost so much energy - rather the opposite: they use much energy, and because energy cannot be created or destroyed, this energy has to go out some way - and that way is heat. In cold countries, energy can have two lives : a first life making calculations in a computer, or transforming matter (ore into metal, trees into books), and a second life heating homes. But the best is to use no energy at all : see the OLPC project in Afghanistan (A computer with pedals, like the sewing machines of our great-great-great-grand-mothers) (1) (1) http://www.olpcnews.com/countries/afghanistan/updates_from_olpc_afghanistan_1.html ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Wikimedia and Environment
2009/12/13 Teofilo teofilow...@gmail.com: But the best is to use no energy at all : see the OLPC project in Afghanistan (A computer with pedals, like the sewing machines of our great-great-great-grand-mothers) (1) (1) http://www.olpcnews.com/countries/afghanistan/updates_from_olpc_afghanistan_1.html That's the answer! Distributed serving by each volunteer's pedal power! - d. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Wikimedia and Environment
Dude, I need that strong stuff you're having. Let me sum this up, The basic optimization is this : You don't need to transfer that new article in every revision to all users at all times. There's not much difference between transferring every revision and just some 'good' revisions. The central server could just say : this is the last revision that has been released by the editors responsible for it, there are 100 edits in process and you can get involved by going to this page here (hosted on a server someplace else). Editing is miniscule part of our workload. There is no need to transfer those 100 edits to all the users on the web and they are not interesting to everyone. Well, we may not transfer them, in case of flagged revisions, we can transfer in case of pure wiki. Point is, someone has to transfer. Lets take a look at what the engine does, it allows editing of text. That includes conflict resolution, cross-indexing, history tracking, abuse filtering, full text indexing, etc. It renders the text. It means building the output out of many individual assets (templates, anyone?), embed media, transform based on user options, etc. It serves the text. And not only text - it serves complex aggregate views like 'last related changes', 'watchlist', 'contributions by new users', etc. The wiki from ward cunningham is a perl script of the most basic form. That is probably one of reasons why we're not using wiki from Ward Cunningham anymore, and have something else, called Mediawiki. There is not much magic involved. Not much use at multi-million article wiki with hundreds of millions of revisions. Of course you need search tools, version histories and such. There are places for optimizing all of those processes. And we've done that with MediaWiki ;-) It is not lunacy, it is a fact that such work can be done, and is done without a central server in many places. Name me a single website with distributed-over-internet backend. Just look at for example how people edit code in an open source software project using git. It is distributed, and it works. Git is limited and expensive for way too many of our operations. Also, you have to have whole copy of GIT, it doesn't have on-demand-remote-pulls nor any caching layer attached to that. I appreciate your will of cloning Wikipedia. It works if you want expensive accesses, of course. We're talking about serving a website here, not a case which is very nicely depicted at: http://xkcd.com/303/ There are already wikis based on git available. Anyone tried putting Wikipedia content on them, and try simulating our workload? :) I understand that Git's semantics are usable for Wikipedia's basic revision storage, but it's data would still have to be replicated to other types of storages, that would allow various cross-indexing and cross-reporting. How well does Git handle parallelism internally? How can it be parallelized over multiple machines? etc ;-) It lacks engineering. Basic stuff is nice, but it isn't what we need. There are other peer to peer networks such as TOR or freenet that would be possible to use. How? These are just transports. If you were to split up the editing of wikipedia articles into a network of git servers across the globe and the rendering and distribution of the resulting data would be the job of the WMF. And how would that save any money? By adding much more complexity to most of processes, and by having major cost item untouched? Now the issue of resolving conflicts is pretty simple in the issue of git, everyone has a copy and can do what they want with it. If you like the version from someone else, you pull it. Who's revision does Wikimedia merge? In terms of wikipedia as having only one viewpoint, the NPOV that is reflected by the current revision at any one point in time, that version would be one pushed from its editors repositories. It is imaginable that you would have one senior editor for each topic who has their own repository of of pages who pull in versions from many people. Go to Citizendium, k, thx. Please lets be serious here! I am talking about the fact that not all people need all the centralised services at all times. You have absolute misunderstanding on what our technology platform is doing. You're wasting your time, you're wasting my time, you're wasting time of everyone who has to read your or my emails. A tracker to manage what server is used for what group of editors can be pretty efficient. Essentially it is a form of DNS. A tracker need only show you the current repositories that are registered for a certain topic. Seriously, need that stuff you're on. Have you ever been involved in building anything remotely similar? The entire community does not get involved in all the conflicts. There are only a certain number of people that are deeply involved in any one section of the wikipedia at any given
Re: [Foundation-l] Wikimedia and Environment
Hi! In cold countries, energy can have two lives : a first life making calculations in a computer, or transforming matter (ore into metal, trees into books), and a second life heating homes. One needs to build-out quite static-energy-output datacenters (e.g. deploy 10MW at once, and don't grow) for that. Not our business. But the best is to use no energy at all : see the OLPC project in Afghanistan (A computer with pedals, like the sewing machines of our great-great-great-grand-mothers) (1) Do you realize that in terms of carbon footprint that is much much less efficient? Look at the title of the thread. Domas ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Wikimedia and Environment
On Sun, Dec 13, 2009 at 1:22 PM, David Gerard dger...@gmail.com wrote: 2009/12/13 Teofilo teofilow...@gmail.com: But the best is to use no energy at all : see the OLPC project in Afghanistan (A computer with pedals, like the sewing machines of our great-great-great-grand-mothers) (1) (1) http://www.olpcnews.com/countries/afghanistan/updates_from_olpc_afghanistan_1.html That's the answer! Distributed serving by each volunteer's pedal power! And you automatically become an admin after 5MW! Magnus ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Wikimedia and Environment
Teofilo wrote: You have probably heard about CO2 and the conference being held these days in Copenhagen (1). You have probably heard about the goal of carbon neutrality at the Wikimania conference in Gdansk in July 2010 (2). You may want to discuss the basic and perhaps naive wishes I have written down on the strategy wiki about paper consumption (3). Paper production has a net negative impact on atmospheric CO2 concentration if the wood comes from a sustainably managed forest or plantation. As long as people keep their PediaPress books for a long time, or dispose of them in a way that does not produce methane, then I don't see a problem. Do we have an idea of the energy consumption related to the online access to a Wikipedia article ? Some people say that a few minutes long search on a search engine costs as much energy as boiling water for a cup of tea : is that story true in the case of Wikipedia (4) ? No, it is not true, which makes what I'm about to suggest somewhat more affordable. Given the lack of political will to make deep cuts to greenhouse gas emissions, and the pitiful excuses politicians make for inaction; given the present nature of the debate, where special interests fund campaigns aimed at stalling any progress by appealing to the ignorance of the public; given the nature of the Foundation, an organisation which raises its funds and conducts most of its activities in the richest and most polluting country in the world: I think there is an argument for voluntary reduction of emissions by the Foundation. I don't mean by buying tree-planting or efficiency offsets, of which I am deeply skeptical. I think the best way for Wikimedia to take action on climate change would be by buying renewable energy certificates (RECs). Buying RECs from new wind and solar electricity generators is a robust way to reduce CO2 emissions, with minimal danger of double-counting, forward-selling, outright fraud, etc., problems which plague the offset industry. If Domas's figure of 100 kW is correct, then buying a matching number of RECs would be a small portion of our hosting budget. If funding is nevertheless a problem, then we could have a restricted donation drive, and thereby get a clear mandate from our reader community. Our colocation facilities would not need to do anything, such as changing their electricity provider. We would, however, need monitoring of our total electricity usage, so that we would know how many RECs to buy. I'm not appealing to the PR benefits here, or to the way this action would promote the climate change cause in general. I'm just saying that as an organisation composed of rational, moral people, Wikimedia has as much responsibility to act as does any other organisation or individual. Ultimately, the US will need to reduce its per-capita emissions by around 90% by 2050 to have any hope of avoiding catastrophe (see e.g. [1]). Nature doesn't have exemptions or loopholes, we can't continue emitting by moving economic activity from corporations to charities. [1] http://www.garnautreview.org.au/chp9.htm#tab9_3, and see chapter 4.3 for the impacts of 550 case. -- Tim Starling ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l