[Commons-l] Commons search function vs. Google

2011-10-11 Thread Andreas Kolbe
We are wondering on Meta[1] what criteria the Commons search function uses to 
establish the order of search results displayed.

To give some examples, searching for pearl necklace in Commons shows a woman 
with sperm on her neck as the first image result:

http://commons.wikimedia.org/w/index.php?title=Special%3ASearchsearch=pearl+necklacefulltext=Search


The same image is way down in a Google search (with safe search off) for pearl 
necklace on Commons:

http://www.google.co.uk/search?q=cucumber+site:commons.wikimedia.orgum=1hl=ensa=Ntbm=ischbav=on.2,or.r_gc.r_pw.,cf.osbbiw=bih=774uss=1#um=1hl=ensafe=offtbm=ischsa=1q=pearl+necklace+site:commons.wikimedia.orgoq=pearl+necklace+site:commons.wikimedia.orgaq=faqi=aql=gs_sm=egs_upl=113279l114967l0l115854l14l11l0l0l0l8l261l2003l0.8.3l11l0bav=on.2,or.r_gc.r_pw.,cf.osbfp=49f703222a617ecbiw=bih=774


Searching for electric toothbrushes in Commons shows a woman masturbating 
with a toothbrush as the second image result:


http://commons.wikimedia.org/w/index.php?title=Special%3ASearchsearch=electric+toothbrushesfulltext=Search


The same image turns up in Google as well (with safe search switched off), 
though not as one of the first results:

http://www.google.co.uk/search?q=cucumber+site:commons.wikimedia.orgum=1hl=ensa=Ntbm=ischbav=on.2,or.r_gc.r_pw.,cf.osbbiw=bih=774uss=1#um=1hl=ensafe=offtbm=ischsa=1q=electric+toothbrushes+site:commons.wikimedia.orgpbx=1oq=electric+toothbrushes+site:commons.wikimedia.orgaq=faqi=aql=gs_sm=egs_upl=341351l344565l0l345961l21l19l0l0l0l13l255l3528l0.11.8l19l0bav=on.2,or.r_gc.r_pw.,cf.osbfp=49f703222a617ecbiw=bih=774


Searching for cucumber in Commons shows a woman with a cucumber up her vagina 
on the first page of search results:

http://commons.wikimedia.org/w/index.php?title=Special%3ASearchsearch=cucumberfulltext=Search

Doing a Google search for cucumber on Commons (with safe search off) does not 
bring this image up among the first hundred or so results:

http://www.google.co.uk/search?q=cucumber+site:commons.wikimedia.orgum=1hl=ensa=Ntbm=ischbav=on.2,or.r_gc.r_pw.,cf.osbbiw=bih=774uss=1


Why is our listing so different from the one in Google, and why are sexual 
images so much higher up in our listing of search results?


Andreas


[1] http://meta.wikimedia.org/wiki/Controversial_content/Brainstorming ___
Commons-l mailing list
Commons-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/commons-l


Re: [Commons-l] Commons search function vs. Google

2011-10-11 Thread WereSpielChequers
 --

 Message: 5
 Date: Tue, 11 Oct 2011 16:22:37 +0100 (BST)
 From: Andreas Kolbe jayen...@yahoo.com
 Subject: [Commons-l] Commons search function vs. Google
 To: Wikimedia Commons Discussion List commons-l@lists.wikimedia.org
 Message-ID:
1318346557.48784.yahoomail...@web29620.mail.ird.yahoo.com
 Content-Type: text/plain; charset=iso-8859-1

 We are wondering on Meta[1]?what criteria the Commons search function uses
 to establish the order of search results displayed.

 To give some examples, searching for pearl necklace in Commons shows a
 woman with sperm on her neck as the first image result:


 http://commons.wikimedia.org/w/index.php?title=Special%3ASearchsearch=pearl+necklacefulltext=Search


 The same image is way down in a Google search (with safe search off) for
 pearl necklace on Commons:


 http://www.google.co.uk/search?q=cucumber+site:commons.wikimedia.orgum=1hl=ensa=Ntbm=ischbav=on.2,or.r_gc.r_pw
 .,cf.osbbiw=bih=774uss=1#um=1hl=ensafe=offtbm=ischsa=1q=pearl+necklace+site:
 commons.wikimedia.orgoq=pearl+necklace+site:commons.wikimedia.org
 aq=faqi=aql=gs_sm=egs_upl=113279l114967l0l115854l14l11l0l0l0l8l261l2003l0.8.3l11l0bav=on.2,or.r_gc.r_pw.,cf.osbfp=49f703222a617ecbiw=bih=774


 Searching for electric toothbrushes in Commons shows a woman masturbating
 with a toothbrush as the second image result:



 http://commons.wikimedia.org/w/index.php?title=Special%3ASearchsearch=electric+toothbrushesfulltext=Search


 The same image turns up in Google as well (with safe search switched off),
 though not as one of the first results:


 http://www.google.co.uk/search?q=cucumber+site:commons.wikimedia.orgum=1hl=ensa=Ntbm=ischbav=on.2,or.r_gc.r_pw
 .,cf.osbbiw=bih=774uss=1#um=1hl=ensafe=offtbm=ischsa=1q=electric+toothbrushes+site:
 commons.wikimedia.orgpbx=1oq=electric+toothbrushes+site:
 commons.wikimedia.org
 aq=faqi=aql=gs_sm=egs_upl=341351l344565l0l345961l21l19l0l0l0l13l255l3528l0.11.8l19l0bav=on.2,or.r_gc.r_pw.,cf.osbfp=49f703222a617ecbiw=bih=774


 Searching for cucumber in Commons shows a woman with a cucumber up her
 vagina on the first page of search results:


 http://commons.wikimedia.org/w/index.php?title=Special%3ASearchsearch=cucumberfulltext=Search

 Doing a Google search for cucumber on Commons (with safe search off) does
 not bring this image up among the first hundred or so results:


 http://www.google.co.uk/search?q=cucumber+site:commons.wikimedia.orgum=1hl=ensa=Ntbm=ischbav=on.2,or.r_gc.r_pw
 .,cf.osbbiw=bih=774uss=1


 Why is our listing so different from the one in Google, and why are sexual
 images so much higher up in our listing of search results?


 Andreas


 [1]?http://meta.wikimedia.org/wiki/Controversial_content/Brainstorming?


I don't know how Google does it, but I'd bet that our search prioritises by
word order in the description. So a description that starts Pearl Necklace
comes before A white pearl necklace. If you amend the description them I
suspect the search results will change.

WereSpielChequers
___
Commons-l mailing list
Commons-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/commons-l


Re: [Commons-l] Commons search function vs. Google

2011-10-11 Thread Andrew Gray
On 11 October 2011 16:53, WereSpielChequers werespielchequ...@gmail.com wrote:

 I don't know how Google does it, but I'd bet that our search prioritises by
 word order in the description. So a description that starts Pearl Necklace
 comes before A white pearl necklace. If you amend the description them I
 suspect the search results will change.

There's some notes on the internals of Lucene-search here:

http://www.mediawiki.org/wiki/User:Rainman/search_internals

Article content presumably is the same as the image description in
our context. I don't know quite what the rank metric would mean in
the Commons context - presumably, only links from local pages on
Commons count?

It may be that more controversial images provoke more meta-discussion,
with more links to them as a result (from talkpages, deletion
discussions, etc) and so are more likely to appear popular to the
search system, but that's just a guess.

-- 
- Andrew Gray
  andrew.g...@dunelm.org.uk

___
Commons-l mailing list
Commons-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/commons-l


Re: [Commons-l] Commons search function vs. Google

2011-10-11 Thread Maarten Dammers

Hi Andreas,

Op 11-10-2011 17:22, Andreas Kolbe schreef:


Why is our listing so different from the one in Google, and why are 
sexual images so much higher up in our listing of search results?
My assumption is that the popularity (either incoming links or number of 
clicks) might be taken into account. See 
http://stats.grok.se/commons.m/top to see what people like to click on 
on Commons and cross reference that with the images that show up high in 
the search results.


Maarten
___
Commons-l mailing list
Commons-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/commons-l


Re: [Commons-l] Commons search function vs. Google

2011-10-11 Thread Andreas Kolbe
Maarten,

That sounds like the most plausible answer to me to date. We know that sexual 
images are among the most popular in Commons.

Some similar searches:


Underwater:

http://commons.wikimedia.org/w/index.php?title=Special%3ASearchsearch=underwaterfulltext=Search


(The bondage image is not among the first 50 in Google with safe search off).

Jumping ball:

http://commons.wikimedia.org/w/index.php?title=Special%3ASearchsearch=Jumping+ballfulltext=Search


(That image is first in Google as well, even with strict safe search enabled.)

This is something the personal image filter would (in part) address. We could 
also have a look at our search algorithm.

Andreas







From: Maarten Dammers maar...@mdammers.nl
To: commons-l@lists.wikimedia.org
Sent: Tuesday, 11 October 2011, 21:04
Subject: Re: [Commons-l] Commons search function vs. Google


Hi Andreas,

Op 11-10-2011 17:22, Andreas Kolbe schreef:



Why is our listing so different from the one in Google, and why are sexual 
images so much higher up in our listing of search results?

My assumption is that the popularity (either incoming links or number of 
clicks) might be taken into account. See http://stats.grok.se/commons.m/top to 
see what people like to click on on Commons and cross reference that with the 
images that show up high in the search results.

Maarten

___
Commons-l mailing list
Commons-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/commons-l___
Commons-l mailing list
Commons-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/commons-l