Re: [Foundation-l] Wikimedia and Environment

2009-12-13 Thread Nikola Smolenski
Дана Saturday 12 December 2009 17:41:44 jamesmikedup...@googlemail.com написа:
 On Sat, Dec 12, 2009 at 5:32 PM, Teofilo teofilow...@gmail.com wrote:
  Do we have an idea of the energy consumption related to the online
  access to a Wikipedia article ? Some people say that a few minutes
  long search on a search engine costs as much energy as boiling water
  for a cup of tea : is that story true in the case of Wikipedia (4) ?

 my 2 cents : this php is cooking more cups of tea than an optimized
 program written in c.

But think of all the coffee developers would have to cook while coding and 
optimizing in C!

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Wikimedia and Environment

2009-12-13 Thread Andre Engels
On Sat, Dec 12, 2009 at 5:32 PM, Teofilo teofilow...@gmail.com wrote:

 How about moving the servers (5) from Florida to a cold country
 (Alaska, Canada, Finland, Russia) so that they can be used to heat
 offices or homes ? It might not be unrealistic as one may read such
 things as the solution was to provide nearby homes with our waste
 heat (6).

I don't think that's a practical solution. It's not because they need
to be cooled that computers cost so much energy - rather the opposite:
they use much energy, and because energy cannot be created or
destroyed, this energy has to go out some way - and that way is heat.

-- 
André Engels, andreeng...@gmail.com

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Wikimedia and Environment

2009-12-13 Thread jamesmikedup...@googlemail.com
On Sun, Dec 13, 2009 at 10:30 AM, Nikola Smolenski smole...@eunet.rs wrote:
 Дана Saturday 12 December 2009 17:41:44 jamesmikedup...@googlemail.com написа:
 On Sat, Dec 12, 2009 at 5:32 PM, Teofilo teofilow...@gmail.com wrote:
  Do we have an idea of the energy consumption related to the online
  access to a Wikipedia article ? Some people say that a few minutes
  long search on a search engine costs as much energy as boiling water
  for a cup of tea : is that story true in the case of Wikipedia (4) ?

 my 2 cents : this php is cooking more cups of tea than an optimized
 program written in c.

 But think of all the coffee developers would have to cook while coding and
 optimizing in C!

But that is a one off expense. That is why we programmers can earn a
living, because we can work on many projects. Also we drink coffee
while playing UrbanTerror as well.

1. Php is very hard to optimize.
2. The mediawiki has a pretty nonstandard syntax. The best that I have
seen is the python implementation of the wikibook parser. But given
that each plugin can change the syntax as it will, it will get more
complex.
3. Even python is easier to optimize than php.
4. The other questions are, does it make sense to have such a
centralized client server architecture? We have been talking about
using a distributed vcs for mediawiki.
5. Well, now even if the mediawiki is fully distributed, it will cost
CPU, but that will be distributed. Each edit that has to be copied
will cause work to be done. In a distributed system even more work in
total.
6. Now, I have been wondering anyway who is the benefactor of all
these millions spend on bandwidth, where do they go to anyway?  What
about making a wikipedia network and have the people who want to
access it pay instead of having us pay to give it away? With these
millions you can buy a lot of routers and cables.
7. Now, back to the optimization. Lets say you were able to optimize
the program. We would identify the major cpu burners and optimize them
out. That does not solve the problem. Because I would think that the
php program is only a small part of the entire issue. The fact that
the data is flowing in a certain wasteful way is the cause of the
waste, not the program itself. Even if it would be much more efficient
and moving around data that is not needed, the data is not needed.

This would eventually lead, in an optimal world to updates not even
being distributed at all. Not all changes have to be centralized. Lets
say that there is one editor who would pull the changes from others
and make a public version. That would mean that only they would need
to have all data for that one topic. I think that you could optimize
the wikipedia along the lines of data travelling only to the people
who need it (editors versus viewers) and you would optimize first a
way to route edits into special interest groups and create smaller
virtual subnetworks of the editors CPUs working together in a peer to
peer direct network.

So if you have 10 people collaborating on a topic, only the results of
that work will be checked into the central server. the decentralized
communication would be between fewer parties and reduce the resources
used.

see also :
http://strategy.wikimedia.org/wiki/Proposal:A_MediaWiki_Parser_in_C


mike

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Wikimedia and Environment

2009-12-13 Thread Domas Mituzas
Hi!!!

 1. Php is very hard to optimize.

No, PHP is much easier to optimize (read - performance oriented refactoring). 

 3. Even python is easier to optimize than php.

Python's main design idea is readability. What is readable, is easier to 
refactor too, right? :) 

 4. The other questions are, does it make sense to have such a
 centralized client server architecture? We have been talking about
 using a distributed vcs for mediawiki.

Lunatics without any idea of stuff being done inside the engine talk about 
distribution. Let them!

 5. Well, now even if the mediawiki is fully distributed, it will cost
 CPU, but that will be distributed. Each edit that has to be copied
 will cause work to be done. In a distributed system even more work in
 total.

Indeed, distribution raises costs. 

 6. Now, I have been wondering anyway who is the benefactor of all
 these millions spend on bandwidth, where do they go to anyway?  What
 about making a wikipedia network and have the people who want to
 access it pay instead of having us pay to give it away? With these
 millions you can buy a lot of routers and cables.

LOL. There's quite some competition in network department, and it has become 
economy of scale (or of serving youtube) long ago. 

 7. Now, back to the optimization. Lets say you were able to optimize
 the program. We would identify the major cpu burners and optimize them
 out. That does not solve the problem. Because I would think that the
 php program is only a small part of the entire issue. The fact that
 the data is flowing in a certain wasteful way is the cause of the
 waste, not the program itself. Even if it would be much more efficient
 and moving around data that is not needed, the data is not needed.

We can have new kind of Wikipedia. The one where we serve blank pages, and 
people imagine content in it. We\ve done that with moderate success quite 
often. 

 So if you have 10 people collaborating on a topic, only the results of
 that work will be checked into the central server. the decentralized
 communication would be between fewer parties and reduce the resources
 used.

Except that you still need tracker to handle all that, and resolve conflicts, 
as still, there're no good methods of resolving conflicts with small number of 
untrusted entities. 

 see also :
 http://strategy.wikimedia.org/wiki/Proposal:A_MediaWiki_Parser_in_C

How much would that save? 

Domas
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Wikimedia and Environment

2009-12-13 Thread jamesmikedup...@googlemail.com
Let me sum this up, The basic optimization is this :
You don't need to transfer that new article in every revision to all
users at all times.
The central server could just say  : this is the last revision that
has been released by the editors responsible for it, there are 100
edits in process and you can get involved by going to this page here
(hosted on a server someplace else). There is no need to transfer
those 100 edits to all the users on the web and they are not
interesting to everyone.


On Sun, Dec 13, 2009 at 12:10 PM, Domas Mituzas midom.li...@gmail.com wrote:
 4. The other questions are, does it make sense to have such a
 centralized client server architecture? We have been talking about
 using a distributed vcs for mediawiki.

 Lunatics without any idea of stuff being done inside the engine talk about 
 distribution. Let them!

I hope you are serious here,
Lets take a look at what the engine does, it allows editing of text.
It renders the text. It serves the text. The wiki from ward cunningham
is a perl script of the most basic form. There is not much magic
involved. Of course you need search tools, version histories and such.
There are places for optimizing all of those processes.

It is not lunacy, it is a fact that such work can be done, and is done
without a central server in many places.

Just look at for example how people edit code in an open source
software project using git. It is distributed, and it works.

There are already wikis based on git available.
There are other peer to peer networks such as TOR or freenet that
would be possible to use.

If you were to split up the editing of wikipedia articles into a
network of git servers across the globe and the rendering and
distribution of the resulting data would be the job of the WMF.

Now the issue of resolving conflicts is pretty simple in the issue of
git, everyone has a copy and can do what they want with it. If you
like the version from someone else, you pull it.

In terms of wikipedia as having only one viewpoint, the NPOV that is
reflected by the current revision at any one point in time, that
version would be one pushed from its editors repositories. It is
imaginable that you would have one senior editor for each topic who
has their own repository of of pages who pull in versions from many
people.

 7. Now, back to the optimization. Lets say you were able to optimize
 the program. We would identify the major cpu burners and optimize them
 out. That does not solve the problem. Because I would think that the
 php program is only a small part of the entire issue. The fact that
 the data is flowing in a certain wasteful way is the cause of the
 waste, not the program itself. Even if it would be much more efficient
 and moving around data that is not needed, the data is not needed.

 We can have new kind of Wikipedia. The one where we serve blank pages, and 
 people imagine content in it. We\ve done that with moderate success quite 
 often.

Please lets be serious here!
I am talking about the fact that not all people need all the
centralised services at all times.


 So if you have 10 people collaborating on a topic, only the results of
 that work will be checked into the central server. the decentralized
 communication would be between fewer parties and reduce the resources
 used.

 Except that you still need tracker to handle all that, and resolve conflicts, 
 as still, there're  no good methods of resolving conflicts with small number 
 of untrusted entities.

A tracker to manage what server is used for what group of editors can
be pretty efficient. Essentially it is a form of DNS. A tracker need
only show you the current repositories that are registered for a
certain topic.

Resolving conflicts is important, but you only need so many people for that.

The entire community does not get involved in all the conflicts. There
are only a certain number of people that are deeply involved in any
one section of the wikipedia at any given time.

Imagine that you had, lets say 1000 conference rooms available for
discussion and working together spread around the world and the
results of those rooms would be fed back into the Wikipedia. These
rooms or servers would be for processing the edits and conflicts any
given set of pages.

My idea is that you don't need to have a huge server to resolve
conflicts. many pages don't have many conflicts, there are certain
areas which need constant arbitration of course. Even if you split up
the groups into different viewpoints where the arbitration team only
deals with the output of two teams (pro and contra).

Even if you look at the number of editors in a highly contested page,
they are not unlimited.

From the retrospective you would be able to identify what groups of
editors are collaborating (enhancing each other) and conflicting
(overwriting each other). If you split them up into different rooms
when they should be collaborating and reduce the conflicts, then you
will win alot.

Even in Germany, 

Re: [Foundation-l] Wikimedia and Environment

2009-12-13 Thread Teofilo
2009/12/12, Geoffrey Plourde geo.p...@yahoo.com:
 With regards to Florida, if the servers are in an office building, one way to 
 decrease costs might be to reconfigure the environmental systems to use the 
 energy from the servers to heat/cool the building. Wikimedia would then be 
 able to recoup part of the utility bills from surrounding tenants.

I am not sure the laws of thermodynamics (1) would allow to use that
heat to cool a building. You would need a cold source like a river to
convert heat back into electricity. But it might be more cost
efficient to have the water from the river circulate directly into the
building, so that your extra heat is still remaining unused.

This is why I think it is more difficult to find solutions in a hot
country like Florida than in a cold country (as long as you don't
question the very existence of heated homes in cold countries, leaving
aside the possibility of moving people and their homes from cold to
warm countries).

(1) http://en.wikipedia.org/wiki/Laws_of_thermodynamics#Second_law

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Wikimedia and Environment

2009-12-13 Thread Teofilo
2009/12/13, Andre Engels andreeng...@gmail.com:
 I don't think that's a practical solution. It's not because they need
 to be cooled that computers cost so much energy - rather the opposite:
 they use much energy, and because energy cannot be created or
 destroyed, this energy has to go out some way - and that way is heat.

In cold countries, energy can have two lives : a first life making
calculations in a computer, or transforming matter (ore into metal,
trees into books), and a second life heating homes.

But the best is to use no energy at all : see the OLPC project in
Afghanistan (A computer with pedals, like the sewing machines of our
great-great-great-grand-mothers) (1)

(1) 
http://www.olpcnews.com/countries/afghanistan/updates_from_olpc_afghanistan_1.html

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Wikimedia and Environment

2009-12-13 Thread David Gerard
2009/12/13 Teofilo teofilow...@gmail.com:

 But the best is to use no energy at all : see the OLPC project in
 Afghanistan (A computer with pedals, like the sewing machines of our
 great-great-great-grand-mothers) (1)
 (1) 
 http://www.olpcnews.com/countries/afghanistan/updates_from_olpc_afghanistan_1.html


That's the answer! Distributed serving by each volunteer's pedal power!


- d.

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Wikimedia and Environment

2009-12-13 Thread Domas Mituzas
Dude, I need that strong stuff you're having. 

 Let me sum this up, The basic optimization is this :
 You don't need to transfer that new article in every revision to all
 users at all times.

There's not much difference between transferring every revision and just some 
'good' revisions. 

 The central server could just say  : this is the last revision that
 has been released by the editors responsible for it, there are 100
 edits in process and you can get involved by going to this page here
 (hosted on a server someplace else).

Editing is miniscule part of our workload. 

 There is no need to transfer
 those 100 edits to all the users on the web and they are not
 interesting to everyone.

Well, we may not transfer them, in case of flagged revisions, we can transfer 
in case of pure wiki. Point is, someone has to transfer. 

 Lets take a look at what the engine does, it allows editing of text.

That includes conflict resolution, cross-indexing, history tracking, abuse 
filtering, full text indexing, etc. 

 It renders the text.

It means building the output out of many individual assets (templates, 
anyone?), embed media, transform based on user options, etc. 

 It serves the text.

And not only text - it serves complex aggregate views like 'last related 
changes', 'watchlist', 'contributions by new users', etc. 

 The wiki from ward cunningham
 is a perl script of the most basic form.

That is probably one of reasons why we're not using wiki from Ward Cunningham 
anymore, and have something else, called Mediawiki. 

 There is not much magic
 involved.

Not much use at multi-million article wiki with hundreds of millions of 
revisions.  

 Of course you need search tools, version histories and such.
 There are places for optimizing all of those processes.

And we've done that with MediaWiki ;-) 

 It is not lunacy, it is a fact that such work can be done, and is done
 without a central server in many places.

Name me a single website with distributed-over-internet backend. 

 Just look at for example how people edit code in an open source
 software project using git. It is distributed, and it works.

Git is limited and expensive for way too many of our operations. Also, you have 
to have whole copy of GIT, it doesn't have on-demand-remote-pulls nor any 
caching layer attached to that. 
I appreciate your will of cloning Wikipedia. 

It works if you want expensive accesses, of course. We're talking about serving 
a website here, not a case which is very nicely depicted at: 
http://xkcd.com/303/

 There are already wikis based on git available.

Anyone tried putting Wikipedia content on them, and try simulating our 
workload? :) 
I understand that Git's semantics are usable for Wikipedia's basic revision 
storage, but it's data would still have to be replicated to other types of 
storages, that would allow various cross-indexing and cross-reporting. 

How well does Git handle parallelism internally? How can it be parallelized 
over multiple machines? etc ;-) It lacks engineering. Basic stuff is nice, but 
it isn't what we need. 

 There are other peer to peer networks such as TOR or freenet that
 would be possible to use.

How? These are just transports. 

 If you were to split up the editing of wikipedia articles into a
 network of git servers across the globe and the rendering and
 distribution of the resulting data would be the job of the WMF.

And how would that save any money? By adding much more complexity to most of 
processes, and by having major cost item untouched? 

 Now the issue of resolving conflicts is pretty simple in the issue of
 git, everyone has a copy and can do what they want with it. If you
 like the version from someone else, you pull it.

Who's revision does Wikimedia merge? 

 In terms of wikipedia as having only one viewpoint, the NPOV that is
 reflected by the current revision at any one point in time, that
 version would be one pushed from its editors repositories. It is
 imaginable that you would have one senior editor for each topic who
 has their own repository of of pages who pull in versions from many
 people.

Go to Citizendium, k, thx. 

 Please lets be serious here!
 I am talking about the fact that not all people need all the
 centralised services at all times.

You have absolute misunderstanding on what our technology platform is doing. 
You're wasting your time, you're wasting my time, you're wasting time of 
everyone who has to read your or my emails. 

 A tracker to manage what server is used for what group of editors can
 be pretty efficient. Essentially it is a form of DNS. A tracker need
 only show you the current repositories that are registered for a
 certain topic.

Seriously, need that stuff you're on. Have you ever been involved in building 
anything remotely similar? 

 The entire community does not get involved in all the conflicts. There
 are only a certain number of people that are deeply involved in any
 one section of the wikipedia at any given 

Re: [Foundation-l] Wikimedia and Environment

2009-12-13 Thread Domas Mituzas
Hi!

 In cold countries, energy can have two lives : a first life making
 calculations in a computer, or transforming matter (ore into metal,
 trees into books), and a second life heating homes.

One needs to build-out quite static-energy-output datacenters (e.g. deploy 10MW 
at once, and don't grow) for that. Not our business. 

 But the best is to use no energy at all : see the OLPC project in
 Afghanistan (A computer with pedals, like the sewing machines of our
 great-great-great-grand-mothers) (1)

Do you realize that in terms of carbon footprint that is much much less 
efficient? Look at the title of the thread. 

Domas
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Wikimedia and Environment

2009-12-13 Thread Magnus Manske
On Sun, Dec 13, 2009 at 1:22 PM, David Gerard dger...@gmail.com wrote:
 2009/12/13 Teofilo teofilow...@gmail.com:

 But the best is to use no energy at all : see the OLPC project in
 Afghanistan (A computer with pedals, like the sewing machines of our
 great-great-great-grand-mothers) (1)
 (1) 
 http://www.olpcnews.com/countries/afghanistan/updates_from_olpc_afghanistan_1.html


 That's the answer! Distributed serving by each volunteer's pedal power!

And you automatically become an admin after 5MW!

Magnus

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Wikimedia and Environment

2009-12-13 Thread Tim Starling
Teofilo wrote:
 You have probably heard about CO2 and the conference being held these
 days in Copenhagen (1).
 
 You have probably heard about the goal of carbon neutrality at the
 Wikimania conference in Gdansk in July 2010 (2).
 
 You may want to discuss the basic and perhaps naive wishes I have
 written down on the strategy wiki about paper consumption (3).

Paper production has a net negative impact on atmospheric CO2
concentration if the wood comes from a sustainably managed forest or
plantation. As long as people keep their PediaPress books for a long
time, or dispose of them in a way that does not produce methane, then
I don't see a problem.

 Do we have an idea of the energy consumption related to the online
 access to a Wikipedia article ? Some people say that a few minutes
 long search on a search engine costs as much energy as boiling water
 for a cup of tea : is that story true in the case of Wikipedia (4) ?

No, it is not true, which makes what I'm about to suggest somewhat
more affordable.

Given the lack of political will to make deep cuts to greenhouse gas
emissions, and the pitiful excuses politicians make for inaction;
given the present nature of the debate, where special interests fund
campaigns aimed at stalling any progress by appealing to the ignorance
of the public; given the nature of the Foundation, an organisation
which raises its funds and conducts most of its activities in the
richest and most polluting country in the world: I think there is an
argument for voluntary reduction of emissions by the Foundation.

I don't mean by buying tree-planting or efficiency offsets, of which I
am deeply skeptical. I think the best way for Wikimedia to take action
on climate change would be by buying renewable energy certificates
(RECs). Buying RECs from new wind and solar electricity generators is
a robust way to reduce CO2 emissions, with minimal danger of
double-counting, forward-selling, outright fraud, etc., problems which
plague the offset industry.

If Domas's figure of 100 kW is correct, then buying a matching number
of RECs would be a small portion of our hosting budget. If funding is
nevertheless a problem, then we could have a restricted donation
drive, and thereby get a clear mandate from our reader community.

Our colocation facilities would not need to do anything, such as
changing their electricity provider. We would, however, need
monitoring of our total electricity usage, so that we would know how
many RECs to buy.

I'm not appealing to the PR benefits here, or to the way this action
would promote the climate change cause in general. I'm just saying
that as an organisation composed of rational, moral people, Wikimedia
has as much responsibility to act as does any other organisation or
individual.

Ultimately, the US will need to reduce its per-capita emissions by
around 90% by 2050 to have any hope of avoiding catastrophe (see e.g.
[1]). Nature doesn't have exemptions or loopholes, we can't continue
emitting by moving economic activity from corporations to charities.


[1] http://www.garnautreview.org.au/chp9.htm#tab9_3, and see chapter
4.3 for the impacts of 550 case.

-- Tim Starling


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l