Hi Jesse,

> On 26 Nov 2015, at 8:47 AM, Jesse de Vos <jd...@beeldengeluid.nl> wrote:
> 
> Hi everyone, 
> 
> We’re trying to get a clearer picture of what material we have on Wikimedia 
> Commons so that our next batch upload doesn’t duplicate with material that is 
> already on Commons. The category: Media from Open Beelden contains all the 
> files and we would like to have all the metadata on Commons for that category 
> (specifically the 'source' URL) to match against  our new content upload.
> 
> Does anyone know how to gather this using the Commons API? Basically, a call 
> to the API with the “File:title” field that would return a JSON object with 
> all the metadata is exactly what we need. Help would be much appreciated!
> 
> Best,
> 
> Jesse

You *can* get to that data via DBpedia — I came up with a quick SPARQL query 
(https://gist.github.com/gaurav/c9704c9b714e1e927140), which you can run at 
http://commons.dbpedia.org/sparql — here’s what the output looks like: 
http://commons.dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fcommons.dbpedia.org&query=SELECT+DISTINCT+%3Fpage+%3Fsource+WHERE+{+%3Fpage+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2Fsubject%3E+%3Chttp%3A%2F%2Fcommons.dbpedia.org%2Fresource%2FCategory%3AMedia_from_Open_Beelden%3E+.+%3Fpage+dcterms%3Asource+%3Fsource+}&format=text%2Fhtml&timeout=0&debug=on

Unfortunately, this based off of a dump of the Commons as of January 10, 2015, 
so it might be out of date for your purposes! If you’d like more frequent 
updates, I’d ask on the DBpedia mailing lists — I helped write the Commons 
extractors, but I don’t really know anything about their infrastructure. You 
can also run the DBpedia Extraction Framework on a local dump of the entire 
Commons or on a subset of pages (by using 
https://commons.wikimedia.org/wiki/Special:Export to export all the pages from 
the category of interest, say), but I’d definitely check with the DBpedia 
developers first to see if they have something in the works that might be 
helpful for you!

Hope this helps!

cheers,
Gaurav
_______________________________________________
Glamtools mailing list
Glamtools@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/glamtools

Reply via email to