Ottomata added a comment.
Not sure if this is relevant, but this seemed the best place to note. I just
came across:
https://github.com/yahoo/TensorFlowOnSpark/wiki/GetStarted_YARN
It seems relatively easy to package up (e.g. on a notebook host) and ship to
hdfs and then include it in a
Fuzheado added a comment.
FYI, some developments in the area of using image classification in the Wikiverse:
We now have a Wikidata Distributed Game - Depicts that uses image classification ML to generate candidates. This was done as a project I did with The Met Museum and Microsoft.
Miriam added a comment.
@Gilles thanks for this! Images and graphics have very different underlying image statistics: it is therefore fairly easy for a classifier to tell them a part. So it should be feasible.
If we can collect some training data, by finding one or more categories in Commons with
Isaac added a comment.
If we go down that pathway of trying to identify what images are photographs, we should look into work by a former colleague of mine on detecting visualizations on Commons (in some ways, the inverse task): http://brenthecht.com/publications/www18_vizbywiki.pdf
He (Allen