Hi Tom,

On 10/11/2012 05:24 PM, Tom Morris wrote:
On Thu, Oct 11, 2012 at 11:03 AM, Pablo N. Mendes <pablomen...@gmail.com <mailto:pablomen...@gmail.com>> wrote:


    Good point, Tom.

        That article is in Category:Hindi_films, not
        Category:Hindi_songs and it's a Film, not a song, so it's not
        going to meet the requirements of your query.

    But maybe the class hierarchy comes to the rescue (Work is a
    supertype of Song and Film)?


The main point is that the extracted triples are semantic nonsense because they conflate multiple subjects under a single URI.

You've got

<http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na>     <http://dbpedia.org/property/runtime> 
    "9300.0"^^<http://dbpedia.org/datatype/second> .
<http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na>     <http://dbpedia.org/property/length>  
    "276.0"^^<http://dbpedia.org/datatype/second> .
<http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na>     <http://dbpedia.org/property/length>  
    "1908.0"^^<http://dbpedia.org/datatype/second> .
<http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na>     <http://dbpedia.org/property/length>  
    "221.0"^^<http://dbpedia.org/datatype/second> .
Which is the length of what?  They all refer to the same subject.
Similarly "Pappu Can't Dance" isn't a song. It's an (alternate?) title for the film according to the RDF. A human knows it's a song because of the
<http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na>     
<http://dbpedia.org/property/title>       "Pappu Can't Dance"@en .
<http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na>     
<http://dbpedia.org/ontology/Work/runtime>        
"155.0"^^<http://dbpedia.org/datatype/minute> .
To make what Venkatesh wants to work happen, you'd need to teach the extractor to figure out what the "main" subject of a page was and then have it mint new subject URIs for all related concepts represented on the page which are different (and don't have their on Wikipedia page) such as sound track album, songs on a sound track album, etc. Then you'd also need to teach it that the physical proximity of the track listing and the soundtrack infobox implies that they refer to the same subject. Finally, you'd have to make this robust in the face of different editing & structuring styles by different Wikipedians.

that's a good idea, I agree with you, that if those subtopics are also taken into consideration, that would be a great achievement for DBpedia


I'd love to see the extractor get this smart, but I'm not holding my breath.

Tom



--
Kind Regards
Mohamed Morsey
Department of Computer Science
University of Leipzig

------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to