Hi Tom,
On 10/11/2012 05:24 PM, Tom Morris wrote:
On Thu, Oct 11, 2012 at 11:03 AM, Pablo N. Mendes
<pablomen...@gmail.com <mailto:pablomen...@gmail.com>> wrote:
Good point, Tom.
That article is in Category:Hindi_films, not
Category:Hindi_songs and it's a Film, not a song, so it's not
going to meet the requirements of your query.
But maybe the class hierarchy comes to the rescue (Work is a
supertype of Song and Film)?
The main point is that the extracted triples are semantic nonsense
because they conflate multiple subjects under a single URI.
You've got
<http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na> <http://dbpedia.org/property/runtime>
"9300.0"^^<http://dbpedia.org/datatype/second> .
<http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na> <http://dbpedia.org/property/length>
"276.0"^^<http://dbpedia.org/datatype/second> .
<http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na> <http://dbpedia.org/property/length>
"1908.0"^^<http://dbpedia.org/datatype/second> .
<http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na> <http://dbpedia.org/property/length>
"221.0"^^<http://dbpedia.org/datatype/second> .
Which is the length of what? They all refer to the same subject.
Similarly "Pappu Can't Dance" isn't a song. It's an (alternate?) title
for the film according to the RDF. A human knows it's a song because
of the
<http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na>
<http://dbpedia.org/property/title> "Pappu Can't Dance"@en .
<http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na>
<http://dbpedia.org/ontology/Work/runtime>
"155.0"^^<http://dbpedia.org/datatype/minute> .
To make what Venkatesh wants to work happen, you'd need to teach the
extractor to figure out what the "main" subject of a page was and then
have it mint new subject URIs for all related concepts represented on
the page which are different (and don't have their on Wikipedia page)
such as sound track album, songs on a sound track album, etc. Then
you'd also need to teach it that the physical proximity of the track
listing and the soundtrack infobox implies that they refer to the same
subject. Finally, you'd have to make this robust in the face of
different editing & structuring styles by different Wikipedians.
that's a good idea, I agree with you, that if those subtopics are also
taken into consideration, that would be a great achievement for DBpedia
I'd love to see the extractor get this smart, but I'm not holding my
breath.
Tom
--
Kind Regards
Mohamed Morsey
Department of Computer Science
University of Leipzig
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion