[Commons-l] Commons search function vs. Google
We are wondering on Meta[1] what criteria the Commons search function uses to establish the order of search results displayed. To give some examples, searching for pearl necklace in Commons shows a woman with sperm on her neck as the first image result: http://commons.wikimedia.org/w/index.php?title=Special%3ASearchsearch=pearl+necklacefulltext=Search The same image is way down in a Google search (with safe search off) for pearl necklace on Commons: http://www.google.co.uk/search?q=cucumber+site:commons.wikimedia.orgum=1hl=ensa=Ntbm=ischbav=on.2,or.r_gc.r_pw.,cf.osbbiw=bih=774uss=1#um=1hl=ensafe=offtbm=ischsa=1q=pearl+necklace+site:commons.wikimedia.orgoq=pearl+necklace+site:commons.wikimedia.orgaq=faqi=aql=gs_sm=egs_upl=113279l114967l0l115854l14l11l0l0l0l8l261l2003l0.8.3l11l0bav=on.2,or.r_gc.r_pw.,cf.osbfp=49f703222a617ecbiw=bih=774 Searching for electric toothbrushes in Commons shows a woman masturbating with a toothbrush as the second image result: http://commons.wikimedia.org/w/index.php?title=Special%3ASearchsearch=electric+toothbrushesfulltext=Search The same image turns up in Google as well (with safe search switched off), though not as one of the first results: http://www.google.co.uk/search?q=cucumber+site:commons.wikimedia.orgum=1hl=ensa=Ntbm=ischbav=on.2,or.r_gc.r_pw.,cf.osbbiw=bih=774uss=1#um=1hl=ensafe=offtbm=ischsa=1q=electric+toothbrushes+site:commons.wikimedia.orgpbx=1oq=electric+toothbrushes+site:commons.wikimedia.orgaq=faqi=aql=gs_sm=egs_upl=341351l344565l0l345961l21l19l0l0l0l13l255l3528l0.11.8l19l0bav=on.2,or.r_gc.r_pw.,cf.osbfp=49f703222a617ecbiw=bih=774 Searching for cucumber in Commons shows a woman with a cucumber up her vagina on the first page of search results: http://commons.wikimedia.org/w/index.php?title=Special%3ASearchsearch=cucumberfulltext=Search Doing a Google search for cucumber on Commons (with safe search off) does not bring this image up among the first hundred or so results: http://www.google.co.uk/search?q=cucumber+site:commons.wikimedia.orgum=1hl=ensa=Ntbm=ischbav=on.2,or.r_gc.r_pw.,cf.osbbiw=bih=774uss=1 Why is our listing so different from the one in Google, and why are sexual images so much higher up in our listing of search results? Andreas [1] http://meta.wikimedia.org/wiki/Controversial_content/Brainstorming ___ Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
Re: [Commons-l] Commons search function vs. Google
-- Message: 5 Date: Tue, 11 Oct 2011 16:22:37 +0100 (BST) From: Andreas Kolbe jayen...@yahoo.com Subject: [Commons-l] Commons search function vs. Google To: Wikimedia Commons Discussion List commons-l@lists.wikimedia.org Message-ID: 1318346557.48784.yahoomail...@web29620.mail.ird.yahoo.com Content-Type: text/plain; charset=iso-8859-1 We are wondering on Meta[1]?what criteria the Commons search function uses to establish the order of search results displayed. To give some examples, searching for pearl necklace in Commons shows a woman with sperm on her neck as the first image result: http://commons.wikimedia.org/w/index.php?title=Special%3ASearchsearch=pearl+necklacefulltext=Search The same image is way down in a Google search (with safe search off) for pearl necklace on Commons: http://www.google.co.uk/search?q=cucumber+site:commons.wikimedia.orgum=1hl=ensa=Ntbm=ischbav=on.2,or.r_gc.r_pw .,cf.osbbiw=bih=774uss=1#um=1hl=ensafe=offtbm=ischsa=1q=pearl+necklace+site: commons.wikimedia.orgoq=pearl+necklace+site:commons.wikimedia.org aq=faqi=aql=gs_sm=egs_upl=113279l114967l0l115854l14l11l0l0l0l8l261l2003l0.8.3l11l0bav=on.2,or.r_gc.r_pw.,cf.osbfp=49f703222a617ecbiw=bih=774 Searching for electric toothbrushes in Commons shows a woman masturbating with a toothbrush as the second image result: http://commons.wikimedia.org/w/index.php?title=Special%3ASearchsearch=electric+toothbrushesfulltext=Search The same image turns up in Google as well (with safe search switched off), though not as one of the first results: http://www.google.co.uk/search?q=cucumber+site:commons.wikimedia.orgum=1hl=ensa=Ntbm=ischbav=on.2,or.r_gc.r_pw .,cf.osbbiw=bih=774uss=1#um=1hl=ensafe=offtbm=ischsa=1q=electric+toothbrushes+site: commons.wikimedia.orgpbx=1oq=electric+toothbrushes+site: commons.wikimedia.org aq=faqi=aql=gs_sm=egs_upl=341351l344565l0l345961l21l19l0l0l0l13l255l3528l0.11.8l19l0bav=on.2,or.r_gc.r_pw.,cf.osbfp=49f703222a617ecbiw=bih=774 Searching for cucumber in Commons shows a woman with a cucumber up her vagina on the first page of search results: http://commons.wikimedia.org/w/index.php?title=Special%3ASearchsearch=cucumberfulltext=Search Doing a Google search for cucumber on Commons (with safe search off) does not bring this image up among the first hundred or so results: http://www.google.co.uk/search?q=cucumber+site:commons.wikimedia.orgum=1hl=ensa=Ntbm=ischbav=on.2,or.r_gc.r_pw .,cf.osbbiw=bih=774uss=1 Why is our listing so different from the one in Google, and why are sexual images so much higher up in our listing of search results? Andreas [1]?http://meta.wikimedia.org/wiki/Controversial_content/Brainstorming? I don't know how Google does it, but I'd bet that our search prioritises by word order in the description. So a description that starts Pearl Necklace comes before A white pearl necklace. If you amend the description them I suspect the search results will change. WereSpielChequers ___ Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
Re: [Commons-l] Commons search function vs. Google
On 11 October 2011 16:53, WereSpielChequers werespielchequ...@gmail.com wrote: I don't know how Google does it, but I'd bet that our search prioritises by word order in the description. So a description that starts Pearl Necklace comes before A white pearl necklace. If you amend the description them I suspect the search results will change. There's some notes on the internals of Lucene-search here: http://www.mediawiki.org/wiki/User:Rainman/search_internals Article content presumably is the same as the image description in our context. I don't know quite what the rank metric would mean in the Commons context - presumably, only links from local pages on Commons count? It may be that more controversial images provoke more meta-discussion, with more links to them as a result (from talkpages, deletion discussions, etc) and so are more likely to appear popular to the search system, but that's just a guess. -- - Andrew Gray andrew.g...@dunelm.org.uk ___ Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
Re: [Commons-l] Commons search function vs. Google
Hi Andreas, Op 11-10-2011 17:22, Andreas Kolbe schreef: Why is our listing so different from the one in Google, and why are sexual images so much higher up in our listing of search results? My assumption is that the popularity (either incoming links or number of clicks) might be taken into account. See http://stats.grok.se/commons.m/top to see what people like to click on on Commons and cross reference that with the images that show up high in the search results. Maarten ___ Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
Re: [Commons-l] Commons search function vs. Google
Maarten, That sounds like the most plausible answer to me to date. We know that sexual images are among the most popular in Commons. Some similar searches: Underwater: http://commons.wikimedia.org/w/index.php?title=Special%3ASearchsearch=underwaterfulltext=Search (The bondage image is not among the first 50 in Google with safe search off). Jumping ball: http://commons.wikimedia.org/w/index.php?title=Special%3ASearchsearch=Jumping+ballfulltext=Search (That image is first in Google as well, even with strict safe search enabled.) This is something the personal image filter would (in part) address. We could also have a look at our search algorithm. Andreas From: Maarten Dammers maar...@mdammers.nl To: commons-l@lists.wikimedia.org Sent: Tuesday, 11 October 2011, 21:04 Subject: Re: [Commons-l] Commons search function vs. Google Hi Andreas, Op 11-10-2011 17:22, Andreas Kolbe schreef: Why is our listing so different from the one in Google, and why are sexual images so much higher up in our listing of search results? My assumption is that the popularity (either incoming links or number of clicks) might be taken into account. See http://stats.grok.se/commons.m/top to see what people like to click on on Commons and cross reference that with the images that show up high in the search results. Maarten ___ Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l___ Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l